Archive for the ‘Windows’ Category

Win32 calling conventions review

Friday, November 10th, 2006

Recently, I’ve posted about the Win32 calling conventions. Here’s a table of contents of the various different posts I’ve made.

  1. Win32 calling conventions: Concepts
  2. Win32 calling conventions: Usage cases
  3. Win32 calling conventions: __cdecl in assembler
  4. Win32 calling conventions: __stdcall in assembler
  5. Win32 calling conventions: __fastcall in assembler
  6. Win32 calling conventions: __thiscall in assembler

Remember that when picking a calling convention to use, there are a number of factors to consider. There is no one calling convention that fits all cases (however, __stdcall is a good default if you are not sure).

Hopefully, you’ll have found this series to be enlightening, useful, and practically applicable.

Debugger flow control: Using conditional breakpoints (part 3)

Thursday, November 9th, 2006

Previously, I had touched some more on the when and why’s as to where hardware breakpoints can be useful.

If you have been following along so far, you should already know the ups and downs of each flavor of breakpoint, and have at least a fair idea as to when you should prefer one to another. There is one other aspect of breakpoint management that I have yet to cover, though, and it is perhaps the most useful feature of breakpoints in WinDbg: conditional breakpoints.

Conditional breakpoints allow you to, as you might imagine, set conditions for breakpoints. That is, the debugger will only actually stop for you to investigate something when both the breakpoint is triggered, and its associated condition is met. These kinds of breakpoints are very useful if you need to stop on a certain function, but only if a certain argument has a certain value (for instance).

However, in WinDbg (and the other DTW/DbgEng debuggers), the support for conditional breakpoints allows you to do much more than that. Specifically, you are permitted to define arbitrary commands that are automatically executed any time a breakpoint is hit. Aside from allowing you to create conditional breakpoints, this also allows you to perform a number of other highly useful tasks in a quick and automated fashion with the debugger. For example, you might want to alter arguments passed to a particular function, or you might want to log arguments (or a stack trace) to a particular function for future analysis.

For example, let’s say that you wanted to see anyone who called CreateFileW, which filename they provided, and what the call stack for each caller might be, what the return value is, and then continue execution. Now, you could do this manually with a “plain” breakpoint and repeating a certain set of commands every time the breakpoint hits, but it would be far superior to automate the whole process.

With DbgEng’s conditional breakpoint support, this is easy. All you need to do in order to have a set of commands that are executed after a breakpoint is to follow the breakpoint statement with a set of commands enclosed in double quotes. (If the commands themselves require the use of double quotes, then you’ll have to escape those, using \”).

From looking at CreateFile on MSDN, we can see that the first argument is a unicode string describing the name of the file that we are to create.

Armed with this information, we can construct a breakpoint that will perform the logging that we are looking for.

Here’s what you might come up with:

0:001> bp kernel32!CreateFileW "du poi(@esp+4);kv;gu;? @eax; g"

Let’s break down this breakpoint a little bit. As usual, the semicolon character (;) is used to separate multiple debugger commands appearing on the same line. The first command is fairly straightforward; it makes use of the du command to display the first argument. The du command simply displays a zero-terminated Unicode string at a given address.

Next, we take a full backtrace (k). After that, we allow execution to continue until we reach the return address of CreateFileW with the gu command (“Go Up one call level”). Finally, we display the eax register (which happens to be the return value register for x86), and continue execution with g.

In action, you might see something like so. In this instance, I am attached to cmd.exe and have executed the command “type c:\config.sys”…

0013fa4c  "c:\\CONFIG.SYS"
ChildEBP RetAddr  
0013e6e4 4ad02f2a kernel32!CreateFileW
0013e730 4ad02e91 cmd!Copen_Work+0x157
0013e744 4ad0dbff cmd!Copen+0x12
0013f7a8 4ad0db62 cmd!TyWork+0x48
0013fc58 4ad0daac cmd!LoopThroughArgs+0x1dd
0013fc6c 4ad05aa2 cmd!eType+0x17
0013fe9c 4ad013eb cmd!FindFixAndRun+0x1f5
0013fee0 4ad0bbba cmd!Dispatch+0x137
0013ff44 4ad05164 cmd!main+0x216
0013ffc0 7c816fd7 cmd!mainCRTStartup+0x125
0013fff0 00000000 kernel32!BaseProcessStart+0x23
Evaluate expression: 132 = 00000084

Essentially, we have turned the debugger into something to inspect function calls and provide us with detailed information about what is happening, in a completely automated fashion.

You can also use this technique for more active intervention in the target as well – for instance, skipping over a function call entirely, modifying what a function does when it is called, or any number of things. Previous articles have, for instance, used conditional breakpoints to alter function behavior (such as making all new virtual memory allocations come from the high end of the address space instead of the low end).

Conditional breakpoints are an invaluable tool in your debugging (and reverse engineering) arsenal; you should absolutely consider them any time that you need any sort of automation to record or alter the behavior of the target, in situations where it is not practical to manually perform the work on every single breakpoint hit.

Next up in this series: Other flow control mechanisms with the debugger, such as stepping and tracing.

The Windows Vista SDK is RTM

Wednesday, November 8th, 2006

The release version of the Windows Vista SDK has been released to the world.

You can download it for free from Microsoft in the full installer image or web installer formats.

The full installer image is almost 1.2GB, so be prepared to have a large chunk of hard drive space to burn on the new SDK.

Debugger flow control: More on breakpoints (part 2)

Wednesday, November 8th, 2006

Last time, I discussed the two types of breakpoints that you’ll see in a debugger (hardware and sfotware) at a high level. I didn’t really explain when it was best to use one instead of the other, though, besides a couple hints relating to hardware breakpoints being limited in number and good for tracking down memory corruption issues at times.

By taking a closer look at how each of the two breakpoints work, we can get some idea as to when we’ll prefer one to another. Both types of breakpoints alter the target in some way, but to differing degrees.

The primary concern with software breakpoints is that they actually involve patching memory in the target to set the breakpoint. This is usually fine; the debugger uses it as its default breakpoint strategy when you give an end address to g, for instance. However, it begins to break down if the target both executes a region that you are setting a breakpoint on, and also reads that same region.

This particular concern is a real problem when you are dealing with self-modifying code, or certain protection schemes (such as code that attempts to checksum itself in memory). In these cases, you might accidentally break self-modifying code, or trip a protection scheme, simply by virtue of setting a breakpoint (since the act of setting a software breakpoint actually modifies the address space of the target).

In cases like this, hardware breakpoints can come to the rescue. Since setting an execute hardware breakpoint does not actually modify the underlying instruction, anything that reads the memory backing that instruction will not get back an 0xCC opcode instead of the real first byte of the instruction opcode. Granted, you can only have four enabled hardware breakpoints at a time, but usually you can get by with that many (or at least, a “sliding window” of hardware breakpoints, assuming you have breakpoints over a well-defined execution sequence. In this case, you could have breakpoints disable themselves and enable the next breakpoint, thus conserving the number of active breakpoints).

There is also the other obvious advantage to hardware breakpoints which I touched on earlier: the ability to set a breakpoint on a memory fetch for a particular address. This obviously has a great deal of uses, whether you’re reverse engineering something or are tracking down a corruption problem. Memory-access breakpoints are an excellent way to very quickly figure out which piece of code is modifying a variable, without having to trace through an arbitrarily large set of code to find the access that you were looking for. One thing to consider about memory-access breakpoints on x86 and x64, though, is that there is only support for setting memory-access breakpoints on regions that are 1) a power of 2 in length, and 2) have a length that is less than or equal to the native pointer size (8 for x64, or 4 for x86). (If you are lucky enough [or perhaps unlucky enough, as Itanium isn’t exactly the most friendly thing to view from an assembler perspective] to be debugging on an Itanium platform, this restriction does not exist; you can set a length of any power of 2 between 1 byte and two gigabytes). As a result, you’ll have to plan where to set your breakpoints carefully, as on x86, you can only cover at most 16 bytes with this kind of “memory guard” access. You might or might not be able to use the same kind of “sliding breakpoint window” idea I mentioned above, if the memory locations you are setting breakpoints on are accessed in a particular sequence (or at least, the accesses that you are interested in).

Hardware breakpoints are typically less invasive than software breakpoints, but there are still ways that they can be interfered with. The most common case for this happening is if you try to set a hardware breakpoint while DLL initializers are being called during process startup (such as at the initial create process breakpoint). If you try to do this, you’ll get a warning from the debugger advising you that your breakpoints won’t stick:

0:000> ba e1 kernel32!CreateFileA
        ^ Unable to set breakpoint error
The system resets thread contexts after the process
breakpoint so hardware breakpoints cannot be set.
Go to the executable's entry point and set it then.
 'ba e1 kernel32!CreateFileA'

The reason why this is the case is that there is a context set that occurs between the initial process breakpoint being hit and the requested thread start address / process start address being executed. I’ll go into just how this works at process startup in a future posting, but to keep it simple, the basic idea is that an APC is queued to the new usermode thread that runs the loader component in NTDLL. One of the arguments to the APC is a context record describing the register context that was requested for the new thread by CreateProcess, CreateThread, and soforth. The loader component runs process (or thread) DLL initializers, and then calls NtContinue to continue execution at the specified context record, which kicks off execution at the user requested thread start address. We can see this in action easily by looking at the arguments that the APC dispatcher supplies to the loader initializer APC:

0:000> kv
ChildEBP RetAddr  Args to Child              
0013fb1c 7c93edc0 7ffdf000 ntdll!DbgBreakPoint
0013fc94 7c921639 0013fd30 ntdll!LdrpInitializeProcess+0xffa
0013fd1c 7c90eac7 0013fd30 ntdll!_LdrpInitialize+0x183
00000000 00000000 00000000 ntdll!KiUserApcDispatcher+0x7
0:000> .cxr 0013fd30 
eax=4ad05056 ebx=7ffdd000 ecx=00f2faa8 edx=00090000
esi=7c9118f1 edi=00011970
eip=7c810665 esp=0013fffc ebp=7c910570 iopl=0
         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000
             efl=00000200
kernel32!BaseProcessStartThunk:
7c810665 33ed            xor     ebp,ebp

If you have been paying attention so far, it should be clear why hardware breakpoints set at the initial process breakpoint do not appear to work like you might expect when you set them at the initial process breakin: When the APC that runs loader initializers returns, it restores a previously saved register context image via NtContinue. Since hardware breakpoints are part of the register context, they are wiped away after the context is restored, and so your breakpoints would appear to simply disappear after DLL initializers were finished.

This limitation also implies that calling SetThreadContext on a thread can interfere with hardware breakpoints if care is not taken to preserve the value of the Dr series of registers. Indeed, some protection schemes utilize such a trick in an attempt to defeat hardware breakpoints.

Fortunately, it is easy to work around such limitations using the debugger. There is a little-used command called “.apply_dbp” that allows you to instruct the debugger that it should re-apply hardware breakpoints, either to the current register context, or a saved register context image in-memory (supplied by the /m Context argument). With the use of this command, you can quickly restore your hardware breakpoints even after something attempts to trash them. Combined with a conventional breakpoint on, say, kernel32!SetThreadContext, this can be used to quickly re-enable the use of hardware breakpoints on such cases. You can also use this trick to persist hardware breakpoints in the process startup case, by using .apply_dbp /m <address-of-context-record-argument-from-APC-dispatcher> to enforce any hardware breakpoints you set in the register context image that will eventually be restored by NtContinue. For instance, in the case of the example that I gave above, you might use the following to apply hardware breakpoints to the context that NtContinue will restore:

0:000> .apply_dbp /m 0013fd30 
Applied data breakpoint state

Next up, some more tricks that you can do to get the most out of controlling the target in the debugger.

Debugger flow control: Hardware breakpoints vs software breakpoints

Tuesday, November 7th, 2006

In debugging parlance, there are two kinds of breakpoints that you may run across – “hardware” breakpoints, and “software breakpoints”. While the two overlap to a certain degree, it is important to know the differences between the two, and when it is better to use a “hardware” or “software” breakpoint.

For the purposes of this discussion, I’ll stick to using WinDbg on an x86 target. The same general concepts apply to other architectures (especially x64, which works near identically), and the commands to set breakpoints are the same, but details such as where and how many hardware or software breakpoints you may set slightly vary from platform to platform.

In most debugging scenarios, you have probably just used software breakpoints exclusively. Software breakpoints are issued by the bp or bu commands (breakpoint and deferred breakpoint, respectively). These breakpoints are fairly simple and straightforward; they cause the processor to halt in the debugger whenever a thread attempts to execute a piece of code that you set a breakpoint on. Typically, you may set any number of software breakpoints that you want at the same time. Software breakpoints may only be targetted at code; there is no support for setting a “memory breakpoint” via a software breakpoint. Many features such as stepping over a call or going to the return address of a function also implicitly use a temporary software breakpoint that is removed once execution hits it the first time.

Hardware breakpoints, on the other hand, are much more powerful and flexible than software breakpoints. Unlike software breakpoints, you may use hardware breakpoints to set “memory breakpoints”, or a breakpoint that is fired when any instruction attempts to read, write, or execute (depending on how you configure the breakpoint) a specific address. (There is also support for setting breakpoints on I/O port access, but I’ll not cover that feature here, as it is typically of very limited applicibility for every-day debugging tasks.) Hardware breakpoints have some limitations, however; the main limit being that the number of hardware breakpoints that you may have active is extremely limited (on x86, you may only have four hardware breakpoints active at the same time).

Now that we have a basic overview of what the two breakpoint types are, let’s dig a bit deeper and see how they work under the hood, and when you might use them.

The way software breakpoints work is fairly simple. Speaking about x86 specifically, to set a software breakpoint, the debugger simply writes an int 3 instruction (opcode 0xCC) over the first byte of the target instruction. This causes an interrupt 3 to be fired whenever execution is transferred to the address you set a breakpoint on. When this happens, the debugger “breaks in” and swaps the 0xCC opcode byte with the original first byte of the instruction when you set the breakpoint, so that you can continue execution without hitting the same breakpoint immediately. There is actually a bit more magic involved that allows you to continue execution from a breakpoint and not hit it immediately, but keep the breakpoint active for future use; I’ll discuss this in a future posting.

Now, you might be tempted to say that this isn’t really how software breakpoints work, if you have ever tried to disassemble or dump the raw opcode bytes of anything that you have set a breakpoint on, because if you do that, you’ll not see an int 3 anywhere where you set a breakpoint. This is actually because the debugger tells a lie to you about the contents of memory where software breakpoints are involved; any access to that memory (through the debugger) behaves as if the original opcode byte that the debugger saved away was still there.

Now that we know how software breakpoints work at a high level, it’s time to talk about the other side of the story, hardware breakpoints.

Hardware breakpoints are, as you might imagine given the name, set with special hardware support. In particular, for x86, this involves a special set of perhaps little-known registers know as the “Dr” registers (for debug register). These registers allow you to set up to four (for x86, this is highly platform specific) addresses that, when either read, read/written, or executed, will cause the processor to throw a special exception that causes execution to stop and control to be transferred to the debugger.

Given that on x86, you can only have four hardware breakpoints active at once, why would anyone possibly want to use them?

Well, the main strength of hardware breakpoints is that you can use them to halt on non-execution accesses to memory locations. This is actually an extremely useful capability; for example, if you were debugging a memory corruption problem where an initial instance of corruption eventually causes a crash, your initial reaction would probably be something on the lines of “gee, if I know who caused the corruption in the first place, this would be much, much easier to debug” – and this is exactly what hardware breakpoints let you do. In essence, you can use a hardware breakpoint to tell the processor to stop when a specific variable (address) is read or read/written to. You can also use hardware breakpoints to break in on code execution as well, although in the typical case, it is more common to use software breakpoints for that purpose due to the relaxed restrictions on how many breakpoints you may have active at once.

That’s the high level overview of the two main types of breakpoints you’ll encounter in a debugger. In some upcoming postings, I’ll go into some specifics as to how certain edge cases (such as stepping over a call) are implemented, and describe other situations where you’ll find it very useful to use one kind of breakpoint instead of another. I am also planning on discussing how some of the other debugger flow control features are really implemented (such as tracing / single step), and what the consequences of using each flow control method are on the debuggee.

Don’t always trust the disassembler…

Friday, November 3rd, 2006

A coworker of mine just ran into a particularly strange bug in the WinDbg disassembler that turns out to be kind of nasty if you don’t watch out for it.

There is a bug with how WinDbg disassembles e9/imm32 jump instructions on x86. (The e9/imm32 instruction takes a 4-byte relative displacement from eip+5, which when combined with eip, is the new eip value after the jump. It is thus in the format e9XXXXXXXX, where XXXXXXXX is the relative displacement.)

Specifically, if the absolute value of the relative displacement immediate value for the e9 opcode is the virtual address of a named symbol in the target, WinDbg incorrectly displays the target of the jump as that symbol. This does not occur if it does not resolve to a named symbol, i.e. something that would just show up as a raw address and not as a name or name+offset in the debugger.

For example, here’s a quick repro that allocates some space in the target and builds an e9/imm32 instruction that is disassembled incorrectly:

0:001> .dvalloc 1000
Allocated 1000 bytes starting at 00980000
0:001> eb 00980000 e9;
ed 00980001 kernel32!CreateFileA;
u 00980000 l1
00980000 e9241a807c      jmp kernel32!CreateFileA (7c801a24)

Here, CreateFileA is not the target of the jump. Instead, 7d181a29 is the target.

You are most likely to see this problem when looking at e9-style jumps that are used with function hooks.

It is interesting to note that the uf (unassemble function with code analysis) is not fooled by the jump, and does properly recognize the real target for the jump. (Here, the target is bogus, so uf refuses to unassemble it. However, in testing, it has followed the jump correctly when the target is valid.)

This bug is appearing for me in WinDbg 6.6.7.5. I would assume that it is probably present in earlier versions.

Here’s why you do not normally see this bug:

  1. This bug only occurs if the relative displacement, when incorrectly taken as a raw absolute virtual address, matches the absolute virtual address of a named symbol. This means that if you tried to resolve an address as a symbol, you would get a name (or name+offset) back and not just a number.
  2. Most relative displacements are very small, except in atypical cases like function hooks. This means that as a general rule of thumb, it is very rare to see a relative displacement for an e9/imm32 jump that is more than a couple of megabytes.
  3. There are rarely any named symbols in the first few megabytes of the address space. The first 64K are reserved (in non-ntvdm processes) entirely, and after that, things like RTL_USER_PROCESS_PARAMETERS and the environmental variable block for the process, and soforth are typically present. These are allocated at the low end of the address space because by default, the memory manager allocates from low addresses and up. Therefore, low address ranges are very quickly used up by allocations.
  4. Most images do not have a preferred image base in the extreme low end of the address space.
  5. By the time most DLLs with conflicting base addresses are being loaded, the low end of the address space has been sufficiently consumed and/or fragmented such that there is not room to map the image section view in the extreme low end of the address space, continuing to prevent there being named symbols in this region.

Given this, it’s easy to see how this bug has gone unseen and unfixed for so long. Hopefully, it’ll be fixed in the next DTW drop.

Resolution to CreateIpForwardEntry failing on Vista

Friday, November 3rd, 2006

Previously, I had posted about a compatibility problem with Windows Vista if you used CreateIpForwardEntry to manage the IP routing table. In particular, if you call this routine on Vista with the intent to create a new route in the IP routing table, you may get an inexpicibly ERROR_BAD_ARGUMENTS error code returned.

There is an officially supported workaround, though it is not very well documented, and was in fact only recently made available in the Platform SDK documentation to my knowledge.

The official line is that you must call the GetIpInterfaceEntry function on Vista, if you wish to continue to be able to add routes.

(Yes, this does suck. It is a total breaking change for anyone who did route manipulation on OS’s prior to Vista, until you patch your programs out in the field. If this is unacceptable to you, I would encourage you to provide feedback to Microsoft about how this issue impacts customer experiences and your ability to deploy and use your product on Vista.)

I would encourage you to use this documented API instead of the undocumented solution that I posted about earlier, simply because it will (ostensibly) continue to work on future OS versions. (Although, given that the reason you have to call this is because of a breaking future-compatibility change in Vista, I am not sure that it is really justified to use that line here…)

Win32 calling conventions: __thiscall in assembler

Thursday, November 2nd, 2006

The final calling convention that I haven’t gone over in depth is __thiscall.

Unlike the other calling conventions that I have previously discussed, __thiscall is not typically explicitly decorated on functions (and in most cases, you cannot decorate it explicitly).

As the name might imply, __thiscall is used exclusively for functions that have a this pointer – that is, non-static C++ class member functions. In a non-static C++ class member function, the this pointer is passed as a hidden argument. In Microsoft C++, the hidden argument is the first actual argument to the routine.

When a function is __thiscall, you will typically see the this pointer passed in the ecx register. In this respect, __thiscall is rather similar to __fastcall, as the first argument (this) is passed via register in ecx. Unlike __fastcall, however, all remaining arguments are passed via the stack; edx is not supported as an additional argument register. Like __fastcall and __stdcall, when using __thiscall, the callee cleans the stack (and not the caller).

In some circumstances, with non-exported, internal-only-to-a-module functions, CL may use ebx instead of ecx for this. For any exported __thiscall function (or a function whose address escapes a module), the compiler must use ecx for this, however.

Continuing with the previous examples, consider a function implementation like so:

class C
{
public:
	int c;

	__declspec(noinline)
	int ThiscallFunction1(int a, int b)
	{
		return (a + b) * c;
	}
};

This function operates the same as the other example functions that we have used, with the exception that ‘c’ is a member variable and not a parameter.

The implementation of this function looks like so in assembler:

C::ThiscallFunction1 proc near

a= dword ptr  4
b= dword ptr  8

mov     eax, [esp+8]       ; eax=b
mov     edx, [esp+4]       ; edx=a
add     eax, edx           ; eax=a+b
imul    eax, [ecx]         ; eax=eax*this->c
retn    8                  ; return eax;
C::ThiscallFunction1 endp

Note that [ecx+0] is the offset of the member variable ‘c’ from this. This function is similar to the __stdcall version, except that instead of being passed as an explicit argument, the ‘c’ parameter is implicitly passed as part of the class object this and is then referenced off of the this pointer.

Consder a call to this function like this in C:

C* c = new C;
c->c = 3;
c->ThiscallFunction1(1, 2);

This is actually a bit more complicated than the other examples, because we also have a call to operator new to allocate memory for the class object. In this instance, operator new is a __cdecl function that takes a single argument, which is the count in bytes to allocate. Here, sizeof(class C) is 4 bytes.

In assembler, we can thus expect to see something like this:

push    4                    ; sizeof(class C)   
call    operator new         ; allocate a class C object
add     esp, 4               ; clean stack from new call
push    2                    ; 'a'
push    1                    ; 'b'
mov     ecx, eax             ; (class C* c)'this'
mov     dword ptr [eax], 3   ; c->c = 3
call    C::ThiscallFunction1 ; Make the call

Ignoring the call to operator new for the most part, this is relatively what we would expect. ecx is used to pass this, and this->c is set to 3 before the call to ThiscallFunction1, as we would expect, given the C++ code.

With all of this information, you should have all you need to recognize and identify __thiscall functions. The main takeaways are:

  • ecx is used as an argument register, along with the stack, but not edx. This allows you to differentiate between a __fastcall and __thiscall function.
  • Arguments passed on the stack are cleaned by the caller and not the callee, like __stdcall.
  • For virtual function calls, look for a vtable pointer as the first class member (at offset 0) from this. (For multiple inheritance, things are a bit more complex; I am ignoring this case right now). Vtable accesses to retrieve a function pointer to call through after loading ecx before a function call are a tell-table sign of a __thiscall virtual function call.
  • For functions whose visibility scope is confined to one module, the compiler sometimes substitutes ebx for ecx as a volatile argument register for this.

Note that if you explicitly specify a calling convention on a class member function, the function ceases to be __thiscall and takes on the characteristics of the specified calling convention, passing this as the first argument according to the conventions of the requested calling convention.

That’s all for __thiscall. Next up in this series is a brief review and table of contents of what we have covered so far with common Win32 x86 calling conventions.

Removing kernel patching on the fly with the kernel debugger

Wednesday, November 1st, 2006

Occasionally, you may find yourself in a situation where you need to “un-patch” the kernel in order to make forward progress with investigating a problem. This has often been the case with me and certain unnamed anti-virus programs that have the misguided intention to prevent the computer administrator from administering their computer, by denying everyone access to certain protected processes.

Normally, this is done by the use of a kernel driver that hooks various kernel system calls and prevents usermode from being able to access a protected process. While this may be done in the name of preventing malware from interfering with anti-virus software, it also has the unfortunate side effect of preventing legitimate troubleshooting of software issues.

Fortunately, with WinDbg installed and a little knowledge of the debugger, it is easy to reverse these abusive kernel patches that undermine the ability of a system administrator to do his or her job.

Now, normally, you might think that one would be stuck reverse engineering large sections of code in order to disable such kinds of protection mechanisms. However, in the vast majority of cases like these, you can simply have the kernel debugger perform a comparison of the kernel memory image with the image retrieved from the symbol server, and fix up any differences (accounting for relocations). This may be done with the !chkimg -f nt command. Using !chkimg in this fashion allows you to quickly remove unwanted kernel patches without having to dig through third party code that has injected itself into the system.

If you are feeling particularly adventurerous, you can even do this in local kernel debugger mode on Windows XP or later, without having to boot the system with /DEBUG. Be warned that this does carry an inherent race condition, though very unlikely in most cases with system service patching, that you might crash the system if someone makes a call to one of the regions of the kernel that you are unpatching while the kernel is being restored to its pristine state.

You should also be aware that depending on how the third party software that has patched the kernel is written, removing the patches out from under it may have varying negative side effects; be careful. As a result, if you are working on a critical or production system, you may want to pick a different approach. If you are just working on a throw-away repro environment in a VM, though, this can be a good quick-n-dirty way to get the job done.

Despite these potential problems, I’ve successfully used this trick in a pinch several times successfully. If you are running into a brick wall with debugging malfunctioning anti-virus software interactions with your product because of anti-debug protection mechanisms, you might give this technique a try.

Debugging programs that block symbol server access

Tuesday, October 31st, 2006

One of the rather frustrating things to debug in Windows is one of the programs or services that is in the path of symbol server requests. This is most often the case with a service running in a svchost group with a number of other services that are responsible for things like DNS or some internal support that WinInet relies on or soforth. This deadlock condition is also prone to happening if you are debugging something else that makes lots of calls out to WinInet, as WinInet has some cross-process state that allows the symbol server engine to get deadlocked waiting on the target to release a global mutex (or similar global synchronization / state).

If you naively try to simply attach a debugger and load symbols in such a situation, you’ll end up with a nasty surprise; the debugger will hang, and you’ll have to kill it (and whatever you were debugging) to recover. In the case of svchost groups, many of those services will not properly restart after just being abruptly killed, so you may even have to reboot.

Now, one obvious solution to this problem is to just turn off symbol server access at all and work without symbols. This is obviously a major pain, though – nobody wants to debug without symbols if you actually have access to them, right?

There are a couple of other things that you can do to debug things in this scenario, however, that are a bit less painful than forgoing symbols:

First, you can use the kernel debugger to debug user mode processes. I have often seen Microsoft employees recommend this on the newsgroups. While this works (the kernel debugger is not affected by the state of any services that you are poking on the target computer), it is most certainly a royal pain to do. Using the kernel debugger means that breakpoints will often affect all processes (unless used via ba), you’ll have to deal with parts of the program being paged out, and not to mention the fact that kernel debugger connections are typically much less responsive than a local user mode debugger.

Because of the inordinate amount of pain involved in using kd to debug a user mode process, I do not recommend ever going this route unless you absolutely positively have no other recourse for accurate debugging.

Therefore, I recommend a different procedure in this case:

  1. Disable symbol server entirely (remove all SRV* symbol server references from your symbol path) or make sure that you have loaded symbols for ntdll.dll.
  2. Attach to the process in question normally, but do not load symbols or issue any commands.
  3. Write a full user minidump out to disk somewhere. For example, .dump /ma c:\tmp.dmp.
  4. Detach from the process that blocks symbol server from working.
  5. Open the minidump you saved earlier, and issue a .reload /f command. This causes all symbols in the process to be downloaded from the symbol server if you do not already have them.
  6. Re-attach to the process that you wanted to debug, and set your symbol path to refer to the downstream store you used with symbol server, but without invoking symbol server. That is, if you had previously used SRV*C:\symbols*http://msdl.microsoft.com/download/symbols for your symbol path, set it to C:\symbols. This ensures that you will never try to hit the symbol server.
  7. Debug as normal.

After that, you’ll have everything in the process that could have symbols on the symbol server already downloaded into your downstream store. By turning off symbol server and just using the downstream store when you actually start your real debugging efforts, you’ll make sure that the debugger will never deadlock itself against the target. Best of all, you don’t need to download a very large symbol pack from Microsoft that might be missing files that have been hotfixed, since you are still (preloading) all of the symbols from the symbol server. This trick can, of course, also be used with your own internal symbol servers as well.