Last time, I discussed the two types of breakpoints that you’ll see in a debugger (hardware and sfotware) at a high level. I didn’t really explain when it was best to use one instead of the other, though, besides a couple hints relating to hardware breakpoints being limited in number and good for tracking down memory corruption issues at times.
By taking a closer look at how each of the two breakpoints work, we can get some idea as to when we’ll prefer one to another. Both types of breakpoints alter the target in some way, but to differing degrees.
The primary concern with software breakpoints is that they actually involve patching memory in the target to set the breakpoint. This is usually fine; the debugger uses it as its default breakpoint strategy when you give an end address to g, for instance. However, it begins to break down if the target both executes a region that you are setting a breakpoint on, and also reads that same region.
This particular concern is a real problem when you are dealing with self-modifying code, or certain protection schemes (such as code that attempts to checksum itself in memory). In these cases, you might accidentally break self-modifying code, or trip a protection scheme, simply by virtue of setting a breakpoint (since the act of setting a software breakpoint actually modifies the address space of the target).
In cases like this, hardware breakpoints can come to the rescue. Since setting an execute hardware breakpoint does not actually modify the underlying instruction, anything that reads the memory backing that instruction will not get back an 0xCC opcode instead of the real first byte of the instruction opcode. Granted, you can only have four enabled hardware breakpoints at a time, but usually you can get by with that many (or at least, a “sliding window” of hardware breakpoints, assuming you have breakpoints over a well-defined execution sequence. In this case, you could have breakpoints disable themselves and enable the next breakpoint, thus conserving the number of active breakpoints).
There is also the other obvious advantage to hardware breakpoints which I touched on earlier: the ability to set a breakpoint on a memory fetch for a particular address. This obviously has a great deal of uses, whether you’re reverse engineering something or are tracking down a corruption problem. Memory-access breakpoints are an excellent way to very quickly figure out which piece of code is modifying a variable, without having to trace through an arbitrarily large set of code to find the access that you were looking for. One thing to consider about memory-access breakpoints on x86 and x64, though, is that there is only support for setting memory-access breakpoints on regions that are 1) a power of 2 in length, and 2) have a length that is less than or equal to the native pointer size (8 for x64, or 4 for x86). (If you are lucky enough [or perhaps unlucky enough, as Itanium isn’t exactly the most friendly thing to view from an assembler perspective] to be debugging on an Itanium platform, this restriction does not exist; you can set a length of any power of 2 between 1 byte and two gigabytes). As a result, you’ll have to plan where to set your breakpoints carefully, as on x86, you can only cover at most 16 bytes with this kind of “memory guard” access. You might or might not be able to use the same kind of “sliding breakpoint window” idea I mentioned above, if the memory locations you are setting breakpoints on are accessed in a particular sequence (or at least, the accesses that you are interested in).
Hardware breakpoints are typically less invasive than software breakpoints, but there are still ways that they can be interfered with. The most common case for this happening is if you try to set a hardware breakpoint while DLL initializers are being called during process startup (such as at the initial create process breakpoint). If you try to do this, you’ll get a warning from the debugger advising you that your breakpoints won’t stick:
0:000> ba e1 kernel32!CreateFileA
^ Unable to set breakpoint error
The system resets thread contexts after the process
breakpoint so hardware breakpoints cannot be set.
Go to the executable's entry point and set it then.
'ba e1 kernel32!CreateFileA'
The reason why this is the case is that there is a context set that occurs between the initial process breakpoint being hit and the requested thread start address / process start address being executed. I’ll go into just how this works at process startup in a future posting, but to keep it simple, the basic idea is that an APC is queued to the new usermode thread that runs the loader component in NTDLL. One of the arguments to the APC is a context record describing the register context that was requested for the new thread by CreateProcess, CreateThread, and soforth. The loader component runs process (or thread) DLL initializers, and then calls NtContinue to continue execution at the specified context record, which kicks off execution at the user requested thread start address. We can see this in action easily by looking at the arguments that the APC dispatcher supplies to the loader initializer APC:
0:000> kv
ChildEBP RetAddr Args to Child
0013fb1c 7c93edc0 7ffdf000 ntdll!DbgBreakPoint
0013fc94 7c921639 0013fd30 ntdll!LdrpInitializeProcess+0xffa
0013fd1c 7c90eac7 0013fd30 ntdll!_LdrpInitialize+0x183
00000000 00000000 00000000 ntdll!KiUserApcDispatcher+0x7
0:000> .cxr 0013fd30
eax=4ad05056 ebx=7ffdd000 ecx=00f2faa8 edx=00090000
esi=7c9118f1 edi=00011970
eip=7c810665 esp=0013fffc ebp=7c910570 iopl=0
nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000
efl=00000200
kernel32!BaseProcessStartThunk:
7c810665 33ed xor ebp,ebp
If you have been paying attention so far, it should be clear why hardware breakpoints set at the initial process breakpoint do not appear to work like you might expect when you set them at the initial process breakin: When the APC that runs loader initializers returns, it restores a previously saved register context image via NtContinue. Since hardware breakpoints are part of the register context, they are wiped away after the context is restored, and so your breakpoints would appear to simply disappear after DLL initializers were finished.
This limitation also implies that calling SetThreadContext on a thread can interfere with hardware breakpoints if care is not taken to preserve the value of the Dr series of registers. Indeed, some protection schemes utilize such a trick in an attempt to defeat hardware breakpoints.
Fortunately, it is easy to work around such limitations using the debugger. There is a little-used command called “.apply_dbp” that allows you to instruct the debugger that it should re-apply hardware breakpoints, either to the current register context, or a saved register context image in-memory (supplied by the /m Context argument). With the use of this command, you can quickly restore your hardware breakpoints even after something attempts to trash them. Combined with a conventional breakpoint on, say, kernel32!SetThreadContext, this can be used to quickly re-enable the use of hardware breakpoints on such cases. You can also use this trick to persist hardware breakpoints in the process startup case, by using .apply_dbp /m <address-of-context-record-argument-from-APC-dispatcher> to enforce any hardware breakpoints you set in the register context image that will eventually be restored by NtContinue. For instance, in the case of the example that I gave above, you might use the following to apply hardware breakpoints to the context that NtContinue will restore:
0:000> .apply_dbp /m 0013fd30
Applied data breakpoint state
Next up, some more tricks that you can do to get the most out of controlling the target in the debugger.