Archive for the ‘Windows’ Category

Remote debugging with KD and NTSD

Thursday, July 20th, 2006

Besides remote.exe, the next remote debugging option available is controlling NTSD through the kernel debugger.  This technique has also fallen by the wayside in recent days like remote.exe, but there are still some circumstances under which it is useful.

This remote debugging technique is based upon controlling NTSD through the kernel debugger connection.  As a crude form of remote debugging, this option allows you to control NTSD on the target computer from a different computer over a serial/1394/USB debugger cable.  This is useful in situations where you are debugging a user mode process in conjunction with the kernel debugger, and want to hide many of the kernel-debugger-specific overhead that is typically associated with doing user mode debugging from the kernel debugger – for instance, having to deal with whether a particular piece of code or data is paged out or not.

Under the hood, this mechanism works by having NTSD use DbgPrint and DbgPrompt instead of console output and console input, respectively.  Additionally, all other dependencies upon CSRSS by NTSD for debugging are disabled, so that NTSD can be used for debugging CSRSS (the Win32 subsystem).  The kernel debugger displays DbgPrint outputs and allows input through DbgPrompt, which allows you to control the user mode debugger through the kernel debugger.  One consequence of this is that the entire system is going to be frozen while you are inputting commands to NTSD, which can have implications for debugging RPC or network related programs, as connections may time out or go away while you are providing input to NTSD, since the networking stack will be frozen and be unable to acknowledge packets.

To activate this mechanism, you can start ntsd with the “-d” parameter (in addition to the usual parameters to specify what you are debugging) which “sends all debugger output to kernel debugger via DbgPrint” according to the documentation.  You must have the kernel debugger active in order for this option to be effective.  After you initially continue execution from NTSD (if applicable, depending on if you use “-g” or not), then you will need to cause a breakpoint exception (or other exception) in the process being debugged by NTSD through the kernel debugger (or another program) in order to return control back to NTSD.  The “.sleep” command is also mildly useful here, as effectively a “delayed breakin” type command that allows you to instruct the NTSD instance to continue execution and break back in to the kernel debugger with a prompt after a certain period of time.  This is necessary because there is no way to directly transfer control back to NTSD after you are done doing something in the kernel debugger.

Like remote.exe, this mechanism simply redirects input/output, although this time through the kernel debugger connection and not a pipe.  The main reasons to use this mechanism over remote.exe (or the other options) are for debugging certain situations that make it difficult or impossible to use a conventional user mode debugger, for instance, debugging CSRSS.exe.  Since the user interface I/O is redirected only, things like symbol processing are performed on the NTSD instance and not the local kernel debugger user interface.

Although the kernel debugger connection currently only supports serial cables, 1394, and USB for debugging, you can extent the NTSD-over-KD remoting mechanism to be useful for computers in remote locations by remoting the KD instance controlling NTSD through a network aware mechanism such as remote.exe, -server, or kdsrv.exe.

For most purposes, though, I would recommend not using this mechanism.  The other remoting mechanisms provide greater flexibility and are easier to user with remote computers.

This mechanism is, however, a valuable technique for certain special situations where you are either debugging the lowest level parts of the user mode NT infrastructure, or where you need to coordinate between a kernel debugger and user mode debugger on the same machine.  In the latter case, the NTSD-over-KD technique is superior to a network aware remote debugger connection to a NTSD/CDB/WinDbg instance running on the computer being kernel debugged because the NTSD-over-KD connection will not time out while you are broken into the kernel debugger like a network connection would.

That’s all for this installment of the remote debugging series.  I’ll talk about some of the more modern remoting mechanisms that you are likely to use in every-day debugging next time.

Debugging Tools for Windows 6.6.7.5 released

Wednesday, July 19th, 2006

Debugging Tools for Windows 6.6.7.5 was released yesterday.

(Note that the changelog includes things from 6.6.3.5, which makes it kind of a pain to see what really changed…).

Download links:

Note that for many remote debugging scenarios, the debugger packages on both computers must be the same version or the remote debugging process will silently fail, so take care in planning when to upgrade (and to upgrade everything at once) if you use remote debugging.

Remote debugging with remote.exe

Wednesday, July 19th, 2006

One of the oldest mechanisms for remote Windows debugging is the venerable remote.exe, which ships with the DTW package.  Remote.exe essentially just pipes console input/output over a the network, which means that you can only use it for the console debuggers (cdb, kd).  This mechanism has the server end of the remote debugging session do all of the “hard work”, with the client acting as just a dumb user interface.

To start a remote server, use the following:

remote.exe /s "command-line" instance-name

where “command-line” is the debugger command line (for instance, “cdb notepad.exe”), and instance-name is a unique identifier (you can pick anything) that allows the remote client to select what program to connect to.

To connect to a remote server, you can use this command:

remote.exe /c computer-name instance-name

where computer-name is the name of the computer running the remote server, and instance-name matches the value you passed to remote.exe /S.  To quit the remote client, you can send the special string “@K” (and hit enter) on its own line, which will terminate the remote session (but will leave the remote server running).  Likewise, you can enter the “@K” command on the server end of the console to quit the server and terminate the redirected process.

That’s all there is to it.  You can actually use remote.exe to control console programs other than cdb/kd remotely, as it doesn’t really have any special intelligence about interacting with the program you are running under remote.exe.

The remote.exe method is the most lightweight of all of the remote debugging options available to you, but it’s also the most limited (and it doesn’t have any concept of security, either).  You might find it useful in scenarios where you are very limited on bandwidth, but for most cases you are much better off using one of the other remote debugging mechanisms.

Update: Pavel Lebedinsky points out that you can set a security descriptor on the remote.exe pipe via “/u” and “/ud” (though this will require that you have either local accounts with the same password as a remote user, or a trust relationship [e.g. domain] with the remote computer).  This does allow for some form of access control, though it is generally convenient only if both computers are on the same domain from what I can tell.

Overview of WinDbg remote debugging

Tuesday, July 18th, 2006

One of the most powerful features of the DTW debuggers (including WinDbg) is the ability to do debugging remotely.  Besides the obvious capability to debug someone else’s computer from your computer, the remote debugging support built in to the DTW debuggers turns out to be useful for much more.

For instance, you can use some of the remote debugging options to show someone else what you are working on with the debugger if you get stuck, or you can use it to debug programs that you would otherwise be unavailable to in Terminal Server under Windows 2000.

The range of remote debugging options available to spans from “dumb terminal” type solutions where text input/output is simply redirected over the network to having the local debugger do the real work using an RPC-like protocol over the network.  Remote debugging is available for both user mode and kernel mode debugging.

Over the next couple days I’m going to do a quick run through of each of the major remote debugging facilities included with the DTW debuggers, what their major benefits are, and how to pick which remote debugging option for a debugging session (as the various methods are useful under different conditions).  Although there is some (rather sparse) documentation in the WinDbg help file covering some of the remote debugging topics, there are some limitations (and usage considerations) that the help file does not articulate which are very important for deciding what you are going to do.  I’ll try to address these in this post series.

Introduction to x64 debugging, part 5

Monday, July 17th, 2006

If you are porting an program to x64, one of the first things that you might have to debug are 64-bit portability problems. The most common types of these problems are pointer truncation problems, where assumptions are made by your (previously 32-bit) program that a LONG/DWORD/other 32-bit integral type can completely contain a pointer. On x64, this is no longer the case, and if you happen to be given a pointer with more than 32 significant bits, you’ll probably crash.

Although the compiler has very good support for helping detect some of these problems, (the /Wp64 command line option, or “Detect 64-bit portability issues” in the VC++ GUI) sometimes it won’t catch all of them. Fortunately, using your debugging knowledge, you can help catch many of these problems very quickly.

There is some built-in support to do this already, in the way of a feature that forces the operating system to load DLLs top-down on 64-bit Windows. This means that instead of starting at the low end of the user mode address space and going upwards when looking for free address space, the memory manager will start at the high end and move downwards (when loading DLLs). In practical terms, this means that instead of usually getting base addresses that are entirely contained within 32 significant bits of address space, you will often get load addresses that are above the 4GB boundary, thus quickly exposing pointer truncation problems with global variable pointers or function pointers. You can enable this support with the gflags utility in the Debugging Tools for Windows package.

Unfortunately, as far as I could tell, there isn’t any corresponding functionality to randomize other memory allocations. This means that things like heap allocations or VirtualAlloc-style allocations will still often get back pointers that are below 4GB, which can result in pointer truncation bugs being masked when you are testing your program and only showing up in high load conditions, maybe even on a customer site. Not good!

However, we can work around this with a conditional breakpoint in the DTW debuggers. Conditional breakpoints are extremely useful, and what we’ll use one for here is to set a particular flag that causes allocations to be done in a top-down fashion to the lowest level memory allocation routine (that ultimately the Win32 heap manager and things built on top of it, such as new or malloc will call) that is accessible to user mode: NtAllocateVirtualMemory. This function is the system call interface to ask the memory manager to allocate a block of address space (and possibly commit it). It is what VirtualAlloc is implemented against, and what the heap manager is implemented against, so by passing the appropriate flag here, we can guarantee that almost all user mode allocations will be top down.

How do we do this? Well, it’s actually pretty simple. Create a process under the debugger and then enter the following command:

bp ntdll!NtAllocateVirtualMemory "eq @rsp+28 qwo(@rsp+28)|100000;g"

This command sets a breakpoint on NtAllocateVirtualMemory that sets the 0x100000 flag in the fifth parameter (recall my previous discussion on x64 calling conventions). After altering that parameter, execution is resumed and the program continues to run normally.

If we look at the prototype for NtAllocateVirtualMemory:

// NtAllocateVirtualMemory allocates
// virtual memory in the user mode
// address range.
NTSYSAPI
NTSTATUS
NTAPI
NtAllocateVirtualMemory(
IN HANDLE ProcessHandle,
IN OUT PVOID *BaseAddress,
IN ULONG ZeroBits,
IN OUT PULONG AllocationSize,
IN ULONG AllocationType,
IN ULONG Protect
);
 

 

… we can see that we are modifying the “AllocationType” parameter. Compare this to the documentation of the VirtualAlloc function, and you’ll see what is going on here (the flAllocationType parameter is passed as AllocationType). The flag we passed is MEM_TOP_DOWN, which, according to MSDN, “allocates memory at the highest possible address”.

After performing this modification, most allocations will have more than 32 significant bits, which will help catch pointer truncation bugs that deal with dynamic memory allocations very quickly.

Earlier, I said that this will only affect most allocations. There are a couple of caveats for this tecnique:

  • It does not modify data section view mappings (file mappings). I leave it as an exercise for the reader to make a similar conditional breakpoint for ntdll!NtMapViewOfSection.
  • It does not catch the first heap segment in the first heap (the process heap) normally, unless you go out of your way to apply the breakpoint before the process heap is created. One workaround is to just add some dummy allocations at the start of the program to consume the first heap segment, such that subsequent allocations are forced to go through a new heap segment which will be allocated in the high end of the address space.

Despite these limitations, however, I think you’ll find this to be an effective tool to help catch pointer truncation bugs quickly.

For my next few posts, I’m going to take a break from x64 debugging topics and focus on a different topic for a bit. Stay tuned!

Update: Pavel Lebedinsky commented that you can set HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management\AllocationPreference (REG_DWORD) to 0x100000 to achieve a similar effect as the steps I posted, without some of the caveats of the conditional breakpoint I described above (in particular, the initial heap segments will reside in the high end of the address space).  This is a more elegant solution than the one I proposed, so I would recommend using it instead.  Note that ths alters the allocation granularity on a system-wide basis instead of a process-wide basis.

Introduction to x64 debugging, part 4

Friday, July 14th, 2006

Last time, I talked about how exception handling and unwinding works in x64, what it means to you when debugging, and how you can access exception handlers from the debugger. In this installment, I’ll be covering some more of the common pitfalls that can sneak up and bite you when doing Wow64 debugging with the native x64 debugger.

As I had alluded to in the first installment of this series, debugging Wow64 programs with the x64 debugger introduces a lot of extra complexity. I had already illustrated one of the major annoyances – that you need to manually toggle between the x86 and x64 contexts in many places.

The problems don’t end there, though. Many extensions, especially legacy extensions that were written long before x64 was introduced do not handle the Wow64 case gracefully. This results from extensions not properly checking the current effective processor (IDebugControl::GetEffectiveProcessorType). This is something to watch out for if you are writing a debugger extension of your own, as it is no longer enough to just see if the target uses 64-bit pointers or not, since with Wow64 debugging, the target processor type can change rapidly within the debugging session as the user switches modes with the “.effmach” command.

One example of a very useful extension that breaks like this is “!locks”, which analyzes the list of critical sections in a process (maintained by NTDLL) in order to help provide information about deadlocks. The !locks extension will always currently operate on the 64-bit critical section list, which makes it difficult to debug deadlocks in Wow64 programs with the native debugger.

Another common cause for confusion with Wow64 debugging is that references to NTDLL may not actually do what you expect. Under Wow64, there are actually two copies of NTDLL in every 32-bit process; the native 64-bit NTDLL, used by the Wow64 layer itself, and a modified version of the original 32-bit NTDLL (which thunks to Wow64 instead of making system calls itself). The problem here is that if you reference the name “ntdll”, you will tend to get the 64-bit version of ntdll back, even if you are in x64 mode. For instance, consider the following:

0:026:x86> u ntdll!NtClose ntdll!ZwClose:
00000000`78ef1350 4c      dec     esp
00000000`78ef1351 8bd1    mov     edx,ecx
00000000`78ef1353 b80c000000 mov  eax,0xc
00000000`78ef1358 0f05    syscall
00000000`78ef135a c3      ret
00000000`78ef135b 666690  nop
00000000`78ef135e 6690    nop
ntdll!NtQueryObject:
00000000`78ef1360 4c      dec     esp
0:026:x86> .effmach .
Effective machine: x64 (AMD64)
0:026> u ntdll!NtClose
ntdll!ZwClose:
00000000`78ef1350 4c8bd1           mov     r10,rcx
00000000`78ef1353 b80c000000       mov     eax,0xc
00000000`78ef1358 0f05             syscall
00000000`78ef135a c3               ret
00000000`78ef135b 666690           nop
00000000`78ef135e 6690             nop
ntdll!NtQueryObject:
00000000`78ef1360 4c8bd1           mov     r10,rcx
00000000`78ef1363 b80d000000       mov     eax,0xd

Here, we got the same address back even if we switched to x86 mode, and as a result the code we tried to disassemble wasn’t valid (because of the new instruction prefixes added by x64). This can get particularly insidious if you are trying to set a breakpoint in the middle of an ntdll function, since if you are not careful, you might set a breakpoint in the wrong copy of ntdll (and probably in the middle of an instruction, which would likely lead to a crash later on instead of the expected stop at a breakpoint exception). If you want to reference the 32-bit ntdll, then you have to use a special name that is a concatenation of the string “ntdll_” and the base address at which the 32-bit ntdll was loaded. For instance:

0:026:x86> u ntdll_7d600000!NtClose
ntdll_7d600000!NtClose:
00000000`7d61c917 b80c000000 mov  eax,0xc
00000000`7d61c91c 33c9    xor     ecx,ecx
00000000`7d61c91e 8d542404 lea    edx,[esp+0x4]
00000000`7d61c922 64ff15c0000000 call dword ptr fs:[000000c0]
00000000`7d61c929 c20400  ret     0x4
00000000`7d61c92c 8d4900  lea     ecx,[ecx]
ntdll_7d600000!NtQueryObject:
00000000`7d61c92f b80d000000 mov  eax,0xd
00000000`7d61c934 33c9    xor     ecx,ecx

Another common gotcha is forgetting that you are in the wrong processor mode for the code you are disassembling. The disassembler operates in the current effective processor as set by “.effmach”, regardless of whether you are disassembling code in a 32-bit or 64-bit module. This can be confusing if you forget to change the processor type, as you can end up looking at something that is almost valid code, but not quite (due to some subtle differences in the 32-bit and 64-bit instruction sets).

Finally, one other source of confusion can be filenames. Remember that under Wow64, programs have an altered view of cetain filesystem locations, such as %SystemRoot%\System32. Some filenames (especially for loaded modules) may refer to %SystemRoot%\system32, and some may refer to %SystemRoot%\syswow64. Despite the difference in apparent filenames, if you are debugging a Wow64 process, these two directories are the same (and both refer to %SystemRoot%\SysWOW64 on the actual filesystem as viewed from 64-bit programs).

Next time: Tricks for catching 64-bit portability problems with the debugger.

Introduction to x64 debugging, part 3

Thursday, July 13th, 2006

The last installment of this series described some of the basics of the new calling convention in use on x64 Windows, and how it will impact the debugging experience. This post describes how the unwinding and exception handling aspects matter to you when you debug programs.

I touched on some of the benefits of the new unwind mechanism in the last post – specifically, how you can expect to see full stack traces even without symbols – but, I didn’t really go into a whole lot of detail as to how they are implemented. Microsoft has the full set of details available on MSDN. Rather than restate them all here, I’m going to try to put them into perspective with respect to debugging and how they matter to you.

Perhaps the easiest way to do this is to compare them with x86 exception handling (EH)/unwind support. In the x86 Win32 world, EH/unwind are implemented as a linked list of EXCEPTION_REGISTRATION structures stored at fs:[0] (the start of the current threads TEB). When an exception occurs, the exception dispatcher code (either in NTDLL for a user mode exception or NTOSKRNL for a kernel mode exception) searches through this linked list and calls each handler with information about the exception. The exception handler can indicate that control should be resumed immediately to the faulting context, or that the next handler should be called, or that the exception handler has handled the exception and that the stack should be unwound to it. The first two paths are fairly straightforward; either a context record is continued via NtContinue (if you aren’t familiar with the native API layer, this is effectively a longjmp), or the next handler in the chain is called. If the last handler in the list is reached and does not handle the exception then the thread is terminated (for Win32 programs, this should never happen, as Kernel32 installs an exception handler that will catch all exceptions before it calls process / thread entrypoint functions definened by an application).

The unwind path is a bit more interesting; here, all of the exception handlers between the one that requested an unwind and the top of the list are called with a flag indicating that they should unwind the stack. Each exception handler routine “knows” how to unwind the procedure(s) that it is responsible for. In this mechanism, the stack gets unwound properly back to the point where the exception was handled. While this works well enough for the actual exception handling process itself, there is a flaw in this design; it precludes unwinding call frames without actually calling the unwind handlers in question. In addition, functions in the middle of an unwind path which did not register an exception handler are invisible to the unwind code itself (this does not pose a problem for normal unwinds, as for any function that has any unwind special unwind requirements, such as functions with C++ objects on the stack that have destructors, will implicitly register an exception handler).

What this means for you as it relates to debugging is that on x86, it isn’t generally possible to cleanly unwind *without calling the unwind/exception handler functions*. This means that the debugger cannot automatically unwind the stack and produce a valid stack trace with reliable results, without special help, typically in the form of symbols that specify how a function uses the stack. If a function in the middle of the call stack doesn’t have symbols, then there is a good chance that any debugger-initiated stack traces will stop at that function (a common and frustrating occurance if you are debugging code without symbols on x64).

As I alluded to in the previous posting, this problem has gone away on x64, thanks to the new unwind semantics. The way this works under the hood is that every function that is a non-leaf function (that is, every function which calls another function) is required to have a set of metadata associated with it that describes how the function is to be unwound. This is similar in prinicple to the symbol unwind information used in x86 if you have symbols, except that it is built into the binary itself (or dynamically registered at runtime, for dynamically generated code, like .NET). This unwind metadata has everything necessary to unwind a function without actually having to call exception handling code (and, indeed, exception handlers no longer perform “manual” unwinds as is the case on x86 – the NTDLL or NTOSKRNL exception dispatcher can take care of this for you thanks to the new unwind metadata).

For most purposes, you can be oblivious to this fact while debugging something; the debugger will automagically use the unwind metadata to construct accurate stack traces, even with no symbols available. An example of this is:

 

0:000> k
Child-SP          RetAddr           Call Site
00000000`0012fa28 00000000`78ef6301 ntdll!ZwRequestWaitReplyPort+0xa
00000000`0012fa30 00000000`78ddc6ed ntdll!CsrClientCallServer+0x61
00000000`0012fa60 00000000`78ddc92a kernel32!GetConsoleInputWaitHandle+0x39d
00000000`0012fbd0 00000000`4ad1df2c kernel32!ReadConsoleW+0x7a
00000000`0012fca0 00000000`4ad15fa7 cmd+0x1df2c
00000000`0012fd60 00000000`4ad02530 cmd+0x15fa7
00000000`0012fdc0 00000000`4ad035ca cmd+0x2530
00000000`0012fe30 00000000`4ad17027 cmd+0x35ca
00000000`0012fe80 00000000`4ad04eef cmd+0x17027
00000000`0012ff20 00000000`78d5965c cmd+0x4eef
00000000`0012ff80 00000000`00000000 kernel32!BaseProcessStart+0x2c

 

With symbols loaded, we can see that the stack trace is exactly the same:

 

0:000> k
Child-SP          RetAddr           Call Site
00000000`0012fa28 00000000`78ef6301 ntdll!ZwRequestWaitReplyPort+0xa
00000000`0012fa30 00000000`78ddc6ed ntdll!CsrClientCallServer+0x9f
00000000`0012fa60 00000000`78ddc92a kernel32!ReadConsoleInternal+0x23d
00000000`0012fbd0 00000000`4ad1df2c kernel32!ReadConsoleW+0x7a
00000000`0012fca0 00000000`4ad15fa7 cmd!ReadBufFromConsole+0x11c
00000000`0012fd60 00000000`4ad02530 cmd!FillBuf+0x3d6
00000000`0012fdc0 00000000`4ad035ca cmd!Lex+0xd2
00000000`0012fe30 00000000`4ad17027 cmd!Parser+0x132
00000000`0012fe80 00000000`4ad04eef cmd!main+0x458
00000000`0012ff20 00000000`78d5965c cmd!mainCRTStartup+0x171
00000000`0012ff80 00000000`00000000 kernel32!BaseProcessStart+0x29

 

As you can see, even with no symbols, we still get a stack trace that includes all of the functions active in the selected thread context.

Sometimes you will need to manually examine the unwind data, however. One of the major reasons for this is if you need to do some work with an exception handler. On x86, the familiar set of instructions “push fs:[0]; mov fs:[0], esp” (or equivalent) signify an exception handler registration. In x64 debugging, you won’t see anything like this, because there is no runtime registration of exception handlers (except via calls to RtlAddFunctionTable). To determine if a function has an exception handler (and what the address is), you’ll need to use a command that you have probably never touched before – .fnent. The .fnent (function entry) command displays the active EH/unwind metadata associated with a function, among other misc. information about the function in question (such as its extents). For instance:

 

0:000> .fnent kernel32!LocalAlloc
Debugger function entry 00000000`01dc2ab0 for:
(00000000`78d6e690)   kernel32!LocalAlloc   |
(00000000`78d6e730)   kernel32!GetCurrentProcessId
Exact matches:
kernel32!LocalAlloc = 

BeginAddress      = 00000000`0002e690
EndAddress        = 00000000`0002e6c3
UnwindInfoAddress = 00000000`000d9174

 

Unfortunately, this command does not directly translate the exception handler information that we are interested in, so we have to do some manual work. The offsets provided are relative to the base of the module in which the function resides, so working with our existing example, we’ll need to add the value “kernel32” to each of the offsets to form a completed address.

The format of the unwind information itself is described on MSDN; the important parts are as follows:

 

typedef struct _UNWIND_INFO {
UBYTE Version       : 3;
UBYTE Flags         : 5;
UBYTE SizeOfProlog;
UBYTE CountOfCodes;
UBYTE FrameRegister : 4;
UBYTE FrameOffset   : 4;
UNWIND_CODE UnwindCode[1];
/*  UNWIND_CODE MoreUnwindCode[((CountOfCodes + 1) & ~1) - 1];
*   union {
*       OPTIONAL ULONG ExceptionHandler;
*       OPTIONAL ULONG FunctionEntry;
*   };
*   OPTIONAL ULONG ExceptionData[]; */
} UNWIND_INFO, *PUNWIND_INFO; 

typedef union _UNWIND_CODE {
struct {
UBYTE CodeOffset;
UBYTE UnwindOp : 4;
UBYTE OpInfo   : 4;
};
USHORT FrameOffset;
} UNWIND_CODE, *PUNWIND_CODE;

 

Given the structure definition above, we can write a simplified debugger expression to parse the unwind information structure and tell us the interesting bits. This expression does not handle all cases – in particular, it doesn’t handle chained unwind information properly, for which you would need to write a more complicated expression or do the work manually.

 

0:000> u kernel32+dwo(kernel32+00000000`000d9174+
@@c++((1+ @@masm(by(2+kernel32+00000000`000d9174))) & ~1) * 2 + 4)
kernel32!_C_specific_handler:
00000000`78d92180 ff25eafafaff jmp qword ptr
[kernel32!_imp___C_specific_handler (0000000078d41c70)]

 

The expression finds the count of unwind codes from an UNWIND_INFO structure, performs the necessary alignment calculates, multiplies the resulting value by the size of the UNWIND_CODE union, and adds the resultant value to the offset into the UNWIND_INFO structure where unwind codes are stored. Then, this value is added to the pointer to the UNWIND_INFO structure itself, which gives us a pointer to UNWIND_INFO.ExceptionHandler. This value is an offset into the module for which the exception handler routine is associated with, so by adding the base address of the module, we (finally!) get the address of the exception handler function itself. In this case, it’s __C_specific_handler, which is the equivalent of _except_handler3 in x86 (the standard VC++ generated exception handler for C/C++ code). __C_specific_handler has its own metadata stored in the “ExceptionData” member that describes where the actual C/C++ exception handlers are (i.e. the exception filter/exception handler defined with __except in CL). The format of these structures is as so:

 

typedef struct _CL_SCOPE {
ULONG BeginOffset;   // imagebase relative
ULONG EndOffset;     // imagebase relative
ULONG HandlerOffset; // imagebase relative
ULONG TargetOffset;  // imagebase relative
} CL_SCOPE, * PCL_SCOPE; 

typedef struct _CL_EXCEPTION_DATA {
ULONG NumEntries;
CL_SCOPE ScopeEntries;
} CL_EXCEPTION_DATA, * PCL_EXCEPTION_DATA;

 

If the exception handler is a CL one using __C_specific_handler (as is the case here), we can find the code corresponding to the __except filter/handler by dumping the CL scope table entries as so:
 

0:000> dd kernel32+00000000`000d9174+
@@c++((1+ @@masm(by(2+kernel32+00000000`000d9174))) & ~1)
* 2 + 4 + 4 + 4) L dwo(kernel32+00000000`000d9174+
@@c++((1+ @@masm(by(2+kernel32+00000000`000d9174))) & ~1) * 2 + 4 + 4) * 4
00000000`78e19198  000164fb 00016524 00000001 000709ef
00000000`78e191a8  00016524 00016565 00000001 000709ef
00000000`78e191b8  00016565 00016583 00000001 000709ef
00000000`78e191c8  00016583 00016585 00000001 000709ef
00000000`78e191d8  00070968 0007098d 00000001 000709ef
00000000`78e191e8  0007098d 000709cc 00000001 000709ef
00000000`78e191f8  000709cc 000709ef 00000001 000709ef

 

This command gave us a list of address ranges within kernel32!LocalAlloc that are covered by a C/C++ exception handler, whether there is a filter expression or not (depending on the value of HandlerOffset; 1 signifies that the exception is simply handled by executing the “TargetOffset” routine), and the offset of the handler (TargetOffset). All of the offsets are relative to the base address to kernel32. We can unassemble the handler specified by each of them to see that it is simply setting the last Win32 error based on an exception code:

 

0:000> u kernel32+000709ef
kernel32!LocalAlloc+0x1cb:
00000000`78db09ef 33ff             xor     edi,edi
00000000`78db09f1 48897c2420       mov     [rsp+0x20],rdi
00000000`78db09f6 8bc8             mov     ecx,eax
00000000`78db09f8 e863dcfbff call kernel32!BaseSetLastNTError (0000000078d6e660)
00000000`78db09fd 8d7701           lea     esi,[rdi+0x1]
00000000`78db0a00 448b642460       mov     r12d,[rsp+0x60]
00000000`78db0a05 488b5c2428       mov     rbx,[rsp+0x28]
00000000`78db0a0a e9765bfaff jmp kernel32!LocalAlloc+0x1e6 (0000000078d56585)

 

That’s all for this post. Next time, I’ll talk about some of the common “gotchas” when dealing with Wow64 debugging.

Additional credits for this article: C++ exception handling information from “Improved Automated Analysis of Windows x64 Binaries” by skape.

VMware Server 1.0 released

Wednesday, July 12th, 2006

It’s here – VMware Server 1.0 has been released!  You can get it here.

I’ve been using the VMware Server betas for some time and it is well worth a look if you need to setup some dedicated/always on VMs.  It is not quite a replacement for Workstation (in particular, with its lack of multiple snapshot support) for certain testing scenarios, but if you need to run a set of VMs always on it does the job well.

Be sure to read my earlier posting for an interoperability problem with RDP if you try to connect to a console session, as this problem may limit its usefulness if you do not use full Terminal Server (a reason to consider installing it on Windows Server 2003 instead of Windows XP).

Introduction to x64 debugging, part 2

Wednesday, July 12th, 2006

Last time, I talked about some of the basic differences you’ll see when switching to an x64 system if you are doing debugging using the Debugging Tools for Windows package.  In this installment, I’ll run through some of the other differences with debugging that you’ll likely run into – in particular, how changes to the x64 calling convention will make your life much easier when debugging.

Although the x64 architecture is in many respects very similar to x86, many of the conventions of x86-Win32 that you might be familiar with have changed.  Microsoft took the opportunity to “clean house” with many aspects of Win64, since for native x64 programs, there is no concern of backwards binary compatibility.

One of the major changes that you will quickly discover is that the calling conventions that x86 used (__fastcall, __cdecl, __stdcall) are not applicable to x64.  Instead of many different calling conventions, x64 unifies everything into a single calling conention that all functions use.  You can read the full details of the new calling convention on MSDN, but I’ll give you the executive summary as it applies to debugging programs here.

  •  The first four arguments of a function are passed as registers; rcx, rdx, r8, and r9 respectively.  Subsequent arguments are passed on the stack.
  • The caller allocates the space on the stack for parameter passing, like for __stdcall on x86.  However, the caller must allocate at least 32 bytes of stack space for the callee to use a “register home space” the first four parameters (or scratch space).  This must be done even if the callee has no arguments or less than four arguments.
  • The caller always cleans the stack of arguments passed (like __cdecl on x86) if necessary.
  • Stack unwinding and exception handling are significantly different on x64; more details on that later.  The new stack unwinding model is data-driven rather than code-driven (like on x86).
  • Except for dynamic stack adjustments (like _alloca), all stack space must be allocated in the prologue.  Effectively, for most functions, the stack pointer will remain constant throughout the execution process.
  • The rax register is used for return values.  For return values larger than 64 bits, a hidden pointer argument is used.  There is no more spillover into a second register for large return values (like edx:eax, on x86).
  • The rax, rcx, rdx, r8, r9, r10, r11 registers are volatile, all other registers must be preserved.  For floating point usage, the xmm0, xmm1, xmml2, xmm3, xmm4, xmm5 registers are volatile, and the other registers must be preserved.
  • For floating point arguments, the xmm0 through xmm3 registers are used for the first four arguments, after which stack spillover is performed.
  • The instructions permitted in function prologues and epilogues are highly restricted to a very small subset of the instruction set to facilitate unwinding operations.

The main takeaways here from a debugging pespective are thus:

  • Even though a register calling convention like __fastcall is used, the register arguments are often spilled to the “home area” and so are typically visible in call stacks, especially in debug builds.
  • Due to the nature of parameter passing on x64, the “push” instruction is seldom used for setting up arguments.  Instead, the compiler allocates all space up front (like for local variables on x86) and uses the “mov” instruction to write stack parameters onto the stack for function calls.  This also means that you typically will not see an “add rsp” (or equivalent) after each function call, despite the fact that the caller cleans the stack space.
  • The first stack arguments (argument 5, etc) will appear at [rsp+28h] instead of [rsp+08h], because of the mandatory register home area.  This is a departure from how __fastcall worked on x86, where the first stack argument would be at [esp+04h].
  • Because of the data driven unwind semantics, you will see perfect stack unwinding even without symbols.  This means that even if you don’t have any symbols at all for a third party binary, you should always get a complete stack trace all the way back to the thread start routine.  As a side effect, this means that the stack traces captured by PageHeap or handle traces will be much more reliable than on x86, where they tended break at the first function that did not use ebp (because those stack traces never used symbols).
  • Because of the restrictions on the prologue and epilogue instruction usage, it is very easy to recognize where the actual important function code begins and the boilerplate prologue/epilogue code ends.

If you’ve been debugging on x86 for a long time, then you are probably pretty excited about the features of the new calling convention.  Because of the perfect unwind semantics and constant stack pointer throughout function execution model, debugging code that you don’t have symbols for (and using the built-in heap and handle verification utilities) is much more reliable than x86.  Additionally, compiler generated code is usually easier to understand, because you don’t have to manually track the value of the stack pointer changing throughout the function call like you often did on x86 functions compiled with frame pointer omission (FPO) optimizations.

 There are some exceptions to the rules I laid out above for the x64 calling convention.  For functions that do not call any other functions (called “leaf” functions), it is permissible to utilize custom calling conventions so long as the stack pointer (rsp) is not modified.  If the stack pointer is modified then regular calling convention semantics are required.

Next time, I’ll go into more detail on how exception handling and unwinding is different on x64 from the perspective of what the changes mean to you if you are debugging programs, and how you can access some of the metadata associated with unwinding/exception handling and use it to your advantage within the debugger.

Introduction to x64 debugging, part 1

Tuesday, July 11th, 2006

There are some subtle differences between using the Debugging Tools for Windows (DTW) toolset on x86 and x64 that are worth mentioning, especially if you are new to doing x64 debugging. Most of this post applies to all of the debuggers shipped in the DTW package, which is why I avoid talking about WinDbg or ntsd or cdb specifically, and often just refer to the “DTW debuggers”. This is the first post in a multipart series, and it provides a general overview of the options you have for doing 32-bit and 64-bit debugger on an x64 machine, and how to setup the debugger properly to support both, using either the 32-bit or 64-bit packages.

There are many ways to do x64 debugging, which can get confusing, simply because there are so many different choices. You can use both the 32-bit and 64-bit DTW packages, with some restrictions. Here’s a summary of the most common cases (including “cross-debugging” scenarios, where you are using the 32-bit debugger to debug 64-bit processes). For now, I’ll just limit this to user mode, although you can use many of these options for kernel debugging too.

  • Natively debugging 64-bit processes on the same computer using the 64-bit DTW package
  • Natively debugging 32-bit (Wow64) processes on the same computer using the 64-bit DTW package
  • Debugging 32-bit (Wow64) processes on the same computer using the 32-bit DTW package (running the debugger itself under Wow64)
  • Debugging 64-bit processes or 32-bit (Wow64) processes on the same or a different computer using either the 64-bit or 32-bit DTW package, with the remote debugging support (e.g. dbgsrv.exe, or -remote/-server). This requires a 64-bit remote debugger server.
  • Debugging 32-bit (Wow64) processes on the same or a different computer using either the 64-bit or 32-bit DTW package, with the remote debugger support (e.g. dbgsrv.exe, or -remote/-server). This works with a 32-bit remote debugging server.
  • Debugging a 64-bit or 32-bit dump file using the 32-bit or 64-bit DTW package. Both DTW packages are capable of doing this task natively.

There are actually even more combinations, but to keep it simple, I just listed the major ones. Now, as for which setup you want to use, there are a couple of considerations to keep in mind. Most of the important differences for the actual debugging experience stem from whether the process that is making the actual Win32 debugger API calls is a 64-bit or 32-bit process. For the purposes of this discussion, I’ll call the process that makes the actual debugger API calls (e.g. DebugActiveProcess) the actual debugger process.

If the actual debugger process is a 32-bit process under Wow64, then it will be unable to interact meaningfully with 64-bit processes (if you are using WinDbg, 64-bit processes will all show as “System” in the process list). For 32-bit processes, it will see them exactly as you would under an x86 Win32 system; there is no direct indication that they are running under Wow64, and the extra Wow64 functionality is completely isolated from the debugger (and the person driving the debugger). This can be handy, as the extra Wow64 infrastructure can in many cases just get in the way if you are debugging a pure 32-bit program running under Wow64 (unless you suspect a bug in Wow64 itself, which is fairly unlikely to be the case).

If the actual debugger process is a native 64-bit process, then the whole debugging environment changes. The native 64-bit debugging environment allows you to debug both 32-bit (Wow64) and 64-bit targets. However, when you are debugging 32-bit targets, the experience is not the same as if you were just debugging a 32-bit program on a 32-bit Windows installation. The 64-bit debugger will see all of the complexities of Wow64, which often gets confusing and can get in your way. I’ll go into specifics of what exactly is different and how the 64-bit debugger can sometimes be annoying when working with Wow64 processes in a moment; for now, stick with me.

So, if you need to do development on 64-bit computers, which debugging package is the best for you to use? Well, that really depends on what you are doing, but I would recommend installing both the 32-bit and 64-bit DTW packages. The main reason to do this is that it will allow you to debug 32-bit processes without having to deal with the Wow64 layer all the time, but it at the same time it will allow you to debug native 64-bit processes.

After you have installed the DTW packages, one of the familiar first steps with setting up the debugger tools on a new system is to register WinDbg as your default post-portem debugger. This turns out to be a bit more complicated on 64-bit systems than on 32-bit systems, however, in large part due to a new concept added to Windows to support Wow64: registry reflection. Registry reflection allows for 32-bit and 64-bit applications to have their own virtualized view of several key sections of the registry, such as HKEY_LOCAL_MACHINE\Software. What this means in practice is that if you write to the registry from a 32-bit process, you might not see the changes from 64-bit processes (and vice versa), depending on which registry keys are changed. Alternatively, you might see different changes than you made, such as if you are registering a COM interface in HKEY_CLASSES_ROOT.

So, what does all of this mean to you, as it relates to doing debugging on 64-bit systems? Well, the main difference that impacts you is that there are different JIT handlers for 32-bit and 64-bit processes. This means that if you register a 32-bit DTW debugger as a default postmortem debugger, it won’t be activated for 64-bit processes. Conversely, if you register a 64-bit DTW debugger as a default postmortem debugger, it won’t be activated for 32-bit processes.

This leaves you with a couple of options: Register both the 32-bit and 64-bit DTW packages as default postmortem debuggers (if you only want to use the 64-bit DTW package on 64-bit processes and not 32-bit (Wow64) processes as a JIT debugger), or register the 64-bit DTW debugger as a default postmortem debugger for both 32-bit and 64-bit processes. If you want to do the former, then what you need to do is as simple as logging in as an administrator and running both the 32-bit and 64-bit DTW debuggers with the -I command line option (install as default postmortem debugger), and then you’re set. However, if you want to use the 64-bit debugger for both 64-bit and 32-bit processes as a JIT debugger, then things are a bit more complicated. The best way to set this up is to install the 64-bit DTW debugger as a default postmortem debugger (run it with -I), and then open the 64-bit version of regedit.exe, navigate to HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\AeDebug, and copy the value of the “Debugger” entry into the clipboard. Then, navigate to the 32-bit view of this key, located at HKEY_LOCAL_MACHINE\Software\Wow6432Node\Microsoft\Windows NT\CurrentVersion\AeDebug, create (or modify, if it already exists) the “Auto” string value and set it to “1”, then create (or modify, if it already exists) the “Debugger” string value and set it to the value you copied from the 64-bit view of the AeDebug key. For my system, the “Debugger” value is set to something like “C:\Program Files\Debugging Tools for Windows 64-bit\WinDbg.exe” -p %ld -e %ld -g. If you don’t see a Wow6432Node registry key under HKEY_LOCAL_MACHINE\Software, then you are probably accidentally running the 32-bit version of regedit.exe and not the 64-bit version of regedit.exe.

Now, there are a couple of other considerations to take into account when picking whether to use the 32-bit or 64-bit DTW tools on 32-bit processes. Besides the ease of use consideration (which I’ll come back to in more detail shortly), many third party extension DLLs (including my own SDbgExt, for the moment) are only available as 32-bit binaries. While these extension DLLs might support 64-bit targets, they will only run under a 32-bit debugger host.

I said I’d describe some of the reasons why debugging Wow64 processes under the native 64-bit debugger can be cumbersome. The main problem with doing this is that you need to be careful with whether the debugger is active as a 32-bit or 64-bit debugger. This is controlled by something that the DTW package calls the effective machine, which is a way to tell the debugger that it should be treating the program as a 32-bit or 64-bit program. If you are using the native 64-bit debugger on a Wow64 process, you will often find yourself having to manually switch between the native (x64) machine mode and the Wow64 (x86) mode.

To give you an idea of what I mean, let’s take a simple example of breaking into the 32-bit version of CMD.EXE, and getting a call stack of the first thread (thread 0). If you are experienced with the DTW tools, then you probably already know how to do this on x86-based systems: the “~0k” command, which means “show me a stack trace for thread 0”. If you run this on the 32-bit CMD.exe process, though, you won’t quite get what you were expecting:

0:000> ~0k
Child-SP          RetAddr           Call Site
00000000`0013e318 00000000`78ef6301 ntdll!ZwRequestWaitReplyPort+0xa
00000000`0013e320 00000000`78bc0876 ntdll!CsrClientCallServer+0x9f
00000000`0013e350 00000000`78ba1394 wow64win!ReadConsoleInternal+0x236
00000000`0013e4c0 00000000`78be6866 wow64win!whReadConsoleInternal+0x54
00000000`0013e510 00000000`78b83c7d wow64!Wow64SystemServiceEx+0xd6
00000000`0013edd0 00000000`78be6a5a wow64cpu!ServiceNoTurbo+0x28
00000000`0013ee60 00000000`78be5e0d wow64!RunCpuSimulation+0xa
00000000`0013ee90 00000000`78ed8501 wow64!Wow64LdrpInitialize+0x2ed
00000000`0013f6c0 00000000`78ed6416 ntdll!LdrpInitializeProcess+0x17d9
00000000`0013f9d0 00000000`78ef3925 ntdll!LdrpInitialize+0x18f
00000000`0013fab0 00000000`78d59630 ntdll!KiUserApcDispatch+0x15
00000000`0013ffa8 00000000`00000000 0x78d59630
00000000`0013ffb0 00000000`00000000 0x0
00000000`0013ffb8 00000000`00000000 0x0
00000000`0013ffc0 00000000`00000000 0x0
00000000`0013ffc8 00000000`00000000 0x0
00000000`0013ffd0 00000000`00000000 0x0
00000000`0013ffd8 00000000`00000000 0x0
00000000`0013ffe0 00000000`00000000 0x0
00000000`0013ffe8 00000000`00000000 0x0

Hey, that doesn’t look like the 32-bit CMD at all! Well, the reason for the strange call stack is that the 32-bit CMD’s first thread is sleeping in a system call to the 64-bit kernel, and the last active processor state for that thread was native 64-bit mode, and NOT 32-bit mode. You will find that this is the common case for threads that are not spinning or doing actual work when you break in with the debugger.

In order to get the more useful 32-bit stack trace, we’ll have to use a debugger command that is probably unfamiliar to you if you haven’t done Wow64 debugging before: .effmach. This command controls the “effective machine” of the debugger, which I previously described. We’ll want to tell the debugger to show us the 32-bit state of the debugger, which we can do with the “.effmach x86” command. Then, we can get a 32-bit stack trace for the first thread with the “~0k” command:

0:002> .effmach x86
Effective machine: x86 compatible (x86)
0:002:x86> ~0k
ChildEBP          RetAddr
002dfd68 7d542f32 KERNEL32!ReadConsoleInternal+0x15
002dfdf4 4ad0fe14 KERNEL32!ReadConsoleW+0x42
002dfe5c 4ad15803 cmd!ReadBufFromConsole+0xb5
002dfe88 4ad02378 cmd!FillBuf+0x174
002dfe8c 4ad02279 cmd!GetByte+0x11
002dfea8 4ad026c5 cmd!Lex+0x6b
002dfeb8 4ad02783 cmd!GeToken+0x20
002dfec8 4ad02883 cmd!ParseStatement+0x36
002dfedc 4ad164c0 cmd!Parser+0x46
002dff44 4ad04cdd cmd!main+0x1d6
002dffc0 7d4e6e1a cmd!mainCRTStartup+0x12f
002dfff0 00000000 KERNEL32!BaseProcessStart+0x28

Much better! That’s more in line with what we’d be expecting an idle CMD.EXE to be doing. We can now treat the target as a 32-bit process, including things like displaying and altering registry contexts, disassembling, and soforth. For instance:

 

0:002:x86> ~0s
KERNEL32!ReadConsoleInternal+0x15:
00000000`7d54e9c3 c22000  ret     0x20
0:000:x86> r
eax=00000001 ebx=002dfe84 ecx=00000000 edx=00000000 esi=00000003 edi=4ad2faa0
eip=7d54e9c3 esp=002dfd6c ebp=002dfdf4 iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
KERNEL32!ReadConsoleInternal+0x15:
00000000`7d54e9c3 c22000  ret     0x20
0:000:x86> u poi(esp)
KERNEL32!ReadConsoleW+0x42:
00000000`7d542f32 8b4dfc  mov     ecx,[ebp-0x4]
00000000`7d542f35 5f      pop     edi
00000000`7d542f36 5e      pop     esi
00000000`7d542f37 5b      pop     ebx
00000000`7d542f38 e8545df9ff call KERNEL32!__security_check_cookie (7d4d8c91)
00000000`7d542f3d c9      leave
00000000`7d542f3e c21400  ret     0x14
00000000`7d542f41 90      nop

If we want to switch the debugger back to the 64-bit view of the process, we can use “.effmach .” to change to the native processor type:

0:000:x86> .effmach .
Effective machine: x64 (AMD64)

Now, we’re back to 64-bit mode, and all of the debugger commands will reflect this:

0:000> r
rax=000000000000000c rbx=000000000013e3a0 rcx=0000000000000000
rdx=00000000002df1f4 rsi=0000000000000000 rdi=00000000003e0cd0
rip=0000000078ef148a rsp=000000000013e318 rbp=00000000002dfdf4
r8=000000007d61c929  r9=000000007d61caf1 r10=0000000000000000
r11=00000000002df1f4 r12=00000000002dfe34 r13=0000000000000001
r14=00000000002dfe84 r15=000000004ad2faa0
iopl=0         nv up ei pl zr na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000244
ntdll!ZwRequestWaitReplyPort+0xa:
00000000`78ef148a c3               ret

That should give you a basic idea as to what you will be needing to do most of the time when you are doing Wow64 debugging. If you are running the 32-bit debugger packages, then all of this extra complexity is hidden and the process will appear to be a regular 32-bit process, with all of the transitions to Wow64 looking like 32-bit system calls (these typically happen in places like ntdll or user32.dll/gdi32.dll).

That’s the end of this post. The next in this series will go into more detail as to what has changed when you take the plunge and start debugging things on a 64-bit system.