Archive for the ‘Debugging’ Category

WinDbg symbol proxies

Friday, July 21st, 2006

One of the cool things included with WinDbg is an ISAPI DLL that allows you to host what is called a “symbol proxy” on any Windows Server 2003 computer running IIS.

What is a symbol proxy, you might ask, and why would I want one?

A symbol proxy allows you to serve up multiple symbol paths using a single HTTP symbol server path.  Additionally, the symbol proxy will cache everything that it downloads locally, so that the next time you request the same symbol, it won’t need to hit the Internet (or a slow WAN link) to download the symbols again.  You can configure one symbol proxy to cover your internal symbols as well as the public Microsoft symbols available at the Microsoft symbol server.  This can be really handy if you are debugging in many different VMs and don’t want to wait for every VM to download a bunch of MS symbols over the Internet each time you use one – instead, they’ll be locally cached at your site and will come up much faster.  Additionally, you won’t have to set a multitude of symbol paths to talk to the MS symbol server and your internal one, as you can have the symbol proxy talk to both “behind the scenes”.

There is some documentation relating to setting up the symbol proxy included with the DTW distribution.  I would highly recommend taking a look and considering setting up a symbol proxy yourself if you do Windows debugging, as it can significantly reduce the hassle of finding the right symbols for both MS binaries and your binaries.

N.B. There is a known problem with the previously current version (6.6.3.5) of the symbol proxy DLL included with the DTW distribution that causes it to be unable to talk to HTTP-based symbol repositories if you do not configure a WinHttp HTTP proxy with proxycfg.  You might send a mail to the WinDbg feedback alias (windbgfb at microsoft.com) and ask for a version of symproxy.dll that has this bug fixed if you cannot use a WinHttp-compatible HTTP proxy).  [Note: I have not tested to see if this bug is fixed in the release that came out two days ago.  It should be fixed, however, from what I know.]

Remote debugging with KD and NTSD

Thursday, July 20th, 2006

Besides remote.exe, the next remote debugging option available is controlling NTSD through the kernel debugger.  This technique has also fallen by the wayside in recent days like remote.exe, but there are still some circumstances under which it is useful.

This remote debugging technique is based upon controlling NTSD through the kernel debugger connection.  As a crude form of remote debugging, this option allows you to control NTSD on the target computer from a different computer over a serial/1394/USB debugger cable.  This is useful in situations where you are debugging a user mode process in conjunction with the kernel debugger, and want to hide many of the kernel-debugger-specific overhead that is typically associated with doing user mode debugging from the kernel debugger – for instance, having to deal with whether a particular piece of code or data is paged out or not.

Under the hood, this mechanism works by having NTSD use DbgPrint and DbgPrompt instead of console output and console input, respectively.  Additionally, all other dependencies upon CSRSS by NTSD for debugging are disabled, so that NTSD can be used for debugging CSRSS (the Win32 subsystem).  The kernel debugger displays DbgPrint outputs and allows input through DbgPrompt, which allows you to control the user mode debugger through the kernel debugger.  One consequence of this is that the entire system is going to be frozen while you are inputting commands to NTSD, which can have implications for debugging RPC or network related programs, as connections may time out or go away while you are providing input to NTSD, since the networking stack will be frozen and be unable to acknowledge packets.

To activate this mechanism, you can start ntsd with the “-d” parameter (in addition to the usual parameters to specify what you are debugging) which “sends all debugger output to kernel debugger via DbgPrint” according to the documentation.  You must have the kernel debugger active in order for this option to be effective.  After you initially continue execution from NTSD (if applicable, depending on if you use “-g” or not), then you will need to cause a breakpoint exception (or other exception) in the process being debugged by NTSD through the kernel debugger (or another program) in order to return control back to NTSD.  The “.sleep” command is also mildly useful here, as effectively a “delayed breakin” type command that allows you to instruct the NTSD instance to continue execution and break back in to the kernel debugger with a prompt after a certain period of time.  This is necessary because there is no way to directly transfer control back to NTSD after you are done doing something in the kernel debugger.

Like remote.exe, this mechanism simply redirects input/output, although this time through the kernel debugger connection and not a pipe.  The main reasons to use this mechanism over remote.exe (or the other options) are for debugging certain situations that make it difficult or impossible to use a conventional user mode debugger, for instance, debugging CSRSS.exe.  Since the user interface I/O is redirected only, things like symbol processing are performed on the NTSD instance and not the local kernel debugger user interface.

Although the kernel debugger connection currently only supports serial cables, 1394, and USB for debugging, you can extent the NTSD-over-KD remoting mechanism to be useful for computers in remote locations by remoting the KD instance controlling NTSD through a network aware mechanism such as remote.exe, -server, or kdsrv.exe.

For most purposes, though, I would recommend not using this mechanism.  The other remoting mechanisms provide greater flexibility and are easier to user with remote computers.

This mechanism is, however, a valuable technique for certain special situations where you are either debugging the lowest level parts of the user mode NT infrastructure, or where you need to coordinate between a kernel debugger and user mode debugger on the same machine.  In the latter case, the NTSD-over-KD technique is superior to a network aware remote debugger connection to a NTSD/CDB/WinDbg instance running on the computer being kernel debugged because the NTSD-over-KD connection will not time out while you are broken into the kernel debugger like a network connection would.

That’s all for this installment of the remote debugging series.  I’ll talk about some of the more modern remoting mechanisms that you are likely to use in every-day debugging next time.

Debugging Tools for Windows 6.6.7.5 released

Wednesday, July 19th, 2006

Debugging Tools for Windows 6.6.7.5 was released yesterday.

(Note that the changelog includes things from 6.6.3.5, which makes it kind of a pain to see what really changed…).

Download links:

Note that for many remote debugging scenarios, the debugger packages on both computers must be the same version or the remote debugging process will silently fail, so take care in planning when to upgrade (and to upgrade everything at once) if you use remote debugging.

Remote debugging with remote.exe

Wednesday, July 19th, 2006

One of the oldest mechanisms for remote Windows debugging is the venerable remote.exe, which ships with the DTW package.  Remote.exe essentially just pipes console input/output over a the network, which means that you can only use it for the console debuggers (cdb, kd).  This mechanism has the server end of the remote debugging session do all of the “hard work”, with the client acting as just a dumb user interface.

To start a remote server, use the following:

remote.exe /s "command-line" instance-name

where “command-line” is the debugger command line (for instance, “cdb notepad.exe”), and instance-name is a unique identifier (you can pick anything) that allows the remote client to select what program to connect to.

To connect to a remote server, you can use this command:

remote.exe /c computer-name instance-name

where computer-name is the name of the computer running the remote server, and instance-name matches the value you passed to remote.exe /S.  To quit the remote client, you can send the special string “@K” (and hit enter) on its own line, which will terminate the remote session (but will leave the remote server running).  Likewise, you can enter the “@K” command on the server end of the console to quit the server and terminate the redirected process.

That’s all there is to it.  You can actually use remote.exe to control console programs other than cdb/kd remotely, as it doesn’t really have any special intelligence about interacting with the program you are running under remote.exe.

The remote.exe method is the most lightweight of all of the remote debugging options available to you, but it’s also the most limited (and it doesn’t have any concept of security, either).  You might find it useful in scenarios where you are very limited on bandwidth, but for most cases you are much better off using one of the other remote debugging mechanisms.

Update: Pavel Lebedinsky points out that you can set a security descriptor on the remote.exe pipe via “/u” and “/ud” (though this will require that you have either local accounts with the same password as a remote user, or a trust relationship [e.g. domain] with the remote computer).  This does allow for some form of access control, though it is generally convenient only if both computers are on the same domain from what I can tell.

Overview of WinDbg remote debugging

Tuesday, July 18th, 2006

One of the most powerful features of the DTW debuggers (including WinDbg) is the ability to do debugging remotely.  Besides the obvious capability to debug someone else’s computer from your computer, the remote debugging support built in to the DTW debuggers turns out to be useful for much more.

For instance, you can use some of the remote debugging options to show someone else what you are working on with the debugger if you get stuck, or you can use it to debug programs that you would otherwise be unavailable to in Terminal Server under Windows 2000.

The range of remote debugging options available to spans from “dumb terminal” type solutions where text input/output is simply redirected over the network to having the local debugger do the real work using an RPC-like protocol over the network.  Remote debugging is available for both user mode and kernel mode debugging.

Over the next couple days I’m going to do a quick run through of each of the major remote debugging facilities included with the DTW debuggers, what their major benefits are, and how to pick which remote debugging option for a debugging session (as the various methods are useful under different conditions).  Although there is some (rather sparse) documentation in the WinDbg help file covering some of the remote debugging topics, there are some limitations (and usage considerations) that the help file does not articulate which are very important for deciding what you are going to do.  I’ll try to address these in this post series.

Introduction to x64 debugging, part 5

Monday, July 17th, 2006

If you are porting an program to x64, one of the first things that you might have to debug are 64-bit portability problems. The most common types of these problems are pointer truncation problems, where assumptions are made by your (previously 32-bit) program that a LONG/DWORD/other 32-bit integral type can completely contain a pointer. On x64, this is no longer the case, and if you happen to be given a pointer with more than 32 significant bits, you’ll probably crash.

Although the compiler has very good support for helping detect some of these problems, (the /Wp64 command line option, or “Detect 64-bit portability issues” in the VC++ GUI) sometimes it won’t catch all of them. Fortunately, using your debugging knowledge, you can help catch many of these problems very quickly.

There is some built-in support to do this already, in the way of a feature that forces the operating system to load DLLs top-down on 64-bit Windows. This means that instead of starting at the low end of the user mode address space and going upwards when looking for free address space, the memory manager will start at the high end and move downwards (when loading DLLs). In practical terms, this means that instead of usually getting base addresses that are entirely contained within 32 significant bits of address space, you will often get load addresses that are above the 4GB boundary, thus quickly exposing pointer truncation problems with global variable pointers or function pointers. You can enable this support with the gflags utility in the Debugging Tools for Windows package.

Unfortunately, as far as I could tell, there isn’t any corresponding functionality to randomize other memory allocations. This means that things like heap allocations or VirtualAlloc-style allocations will still often get back pointers that are below 4GB, which can result in pointer truncation bugs being masked when you are testing your program and only showing up in high load conditions, maybe even on a customer site. Not good!

However, we can work around this with a conditional breakpoint in the DTW debuggers. Conditional breakpoints are extremely useful, and what we’ll use one for here is to set a particular flag that causes allocations to be done in a top-down fashion to the lowest level memory allocation routine (that ultimately the Win32 heap manager and things built on top of it, such as new or malloc will call) that is accessible to user mode: NtAllocateVirtualMemory. This function is the system call interface to ask the memory manager to allocate a block of address space (and possibly commit it). It is what VirtualAlloc is implemented against, and what the heap manager is implemented against, so by passing the appropriate flag here, we can guarantee that almost all user mode allocations will be top down.

How do we do this? Well, it’s actually pretty simple. Create a process under the debugger and then enter the following command:

bp ntdll!NtAllocateVirtualMemory "eq @rsp+28 qwo(@rsp+28)|100000;g"

This command sets a breakpoint on NtAllocateVirtualMemory that sets the 0x100000 flag in the fifth parameter (recall my previous discussion on x64 calling conventions). After altering that parameter, execution is resumed and the program continues to run normally.

If we look at the prototype for NtAllocateVirtualMemory:

// NtAllocateVirtualMemory allocates
// virtual memory in the user mode
// address range.
NTSYSAPI
NTSTATUS
NTAPI
NtAllocateVirtualMemory(
IN HANDLE ProcessHandle,
IN OUT PVOID *BaseAddress,
IN ULONG ZeroBits,
IN OUT PULONG AllocationSize,
IN ULONG AllocationType,
IN ULONG Protect
);
 

 

… we can see that we are modifying the “AllocationType” parameter. Compare this to the documentation of the VirtualAlloc function, and you’ll see what is going on here (the flAllocationType parameter is passed as AllocationType). The flag we passed is MEM_TOP_DOWN, which, according to MSDN, “allocates memory at the highest possible address”.

After performing this modification, most allocations will have more than 32 significant bits, which will help catch pointer truncation bugs that deal with dynamic memory allocations very quickly.

Earlier, I said that this will only affect most allocations. There are a couple of caveats for this tecnique:

  • It does not modify data section view mappings (file mappings). I leave it as an exercise for the reader to make a similar conditional breakpoint for ntdll!NtMapViewOfSection.
  • It does not catch the first heap segment in the first heap (the process heap) normally, unless you go out of your way to apply the breakpoint before the process heap is created. One workaround is to just add some dummy allocations at the start of the program to consume the first heap segment, such that subsequent allocations are forced to go through a new heap segment which will be allocated in the high end of the address space.

Despite these limitations, however, I think you’ll find this to be an effective tool to help catch pointer truncation bugs quickly.

For my next few posts, I’m going to take a break from x64 debugging topics and focus on a different topic for a bit. Stay tuned!

Update: Pavel Lebedinsky commented that you can set HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management\AllocationPreference (REG_DWORD) to 0x100000 to achieve a similar effect as the steps I posted, without some of the caveats of the conditional breakpoint I described above (in particular, the initial heap segments will reside in the high end of the address space).  This is a more elegant solution than the one I proposed, so I would recommend using it instead.  Note that ths alters the allocation granularity on a system-wide basis instead of a process-wide basis.

Nothing a little cardboard won’t fix…

Sunday, July 16th, 2006

The box that I have been hosting this blog on has been having hardware problems lately, or so it seems. Well, that’s never a fun thing, certainly. It’s been hard-locking periodically for awhile now, enough that I can’t get anything out of it via the “Special Administration Console” (SAC) – the Windows version of a serial console. I can’t even break in with the kernel debugger when it locks up, so I’m fairly sure it is hardware related.

Now, a bit of background; this isn’t exactly a new box, and in fact I got it used (for free), so I can’t really complain too much about it. But, suffice to say it has seen better days. There is only one screw holding down the motherboard (!) and both of the NICs that the box came with, well, suck – a Netgear FA310 (the predecessor to the infamously terrible FA311) that doesn’t even have drivers past NT4, and a VIA NIC that occasionally stopped receiving traffic until reboots. Well, I’ve since replaced the NICs with good ol’ reliable 3COM 3C905C-TX-based cards and that at least stopped the NIC-related problems the box has had. (The box needs two NICs as it is my gateway system that sits in between the cable modem and the rest of my LAN here.)

Anyways, there still remained the problem of the periodic complete system lockups. I tried repositioning the computer a few times (on its side, put it sitting up straight, etc), but nothing really helped. This was getting fairly annoying, as it would die every couple of days, naturally in the middle of some extremely inconvenient situation in which to be disconnected in Wow. Since the box only had one motherboard screw, I suspected that it might be shorting out against something (as is not really very well secured against the case). Unfortunately, I didn’t have any screws compatible with the motherboard/case on hand (my other computers here are laptops). So, I set to looking for other solutions; what I came up with is none other than cardboard. Specifically, I ended up just slipping a chunk of cardboard in between the motherboard and the part of the case that it seemed most likely to be shorting out against.

Ever since, I’ve been lockup-free for at least a couple days now. Hopefully this will tide me over for now until I manage to grab some screws…

Introduction to x64 debugging, part 4

Friday, July 14th, 2006

Last time, I talked about how exception handling and unwinding works in x64, what it means to you when debugging, and how you can access exception handlers from the debugger. In this installment, I’ll be covering some more of the common pitfalls that can sneak up and bite you when doing Wow64 debugging with the native x64 debugger.

As I had alluded to in the first installment of this series, debugging Wow64 programs with the x64 debugger introduces a lot of extra complexity. I had already illustrated one of the major annoyances – that you need to manually toggle between the x86 and x64 contexts in many places.

The problems don’t end there, though. Many extensions, especially legacy extensions that were written long before x64 was introduced do not handle the Wow64 case gracefully. This results from extensions not properly checking the current effective processor (IDebugControl::GetEffectiveProcessorType). This is something to watch out for if you are writing a debugger extension of your own, as it is no longer enough to just see if the target uses 64-bit pointers or not, since with Wow64 debugging, the target processor type can change rapidly within the debugging session as the user switches modes with the “.effmach” command.

One example of a very useful extension that breaks like this is “!locks”, which analyzes the list of critical sections in a process (maintained by NTDLL) in order to help provide information about deadlocks. The !locks extension will always currently operate on the 64-bit critical section list, which makes it difficult to debug deadlocks in Wow64 programs with the native debugger.

Another common cause for confusion with Wow64 debugging is that references to NTDLL may not actually do what you expect. Under Wow64, there are actually two copies of NTDLL in every 32-bit process; the native 64-bit NTDLL, used by the Wow64 layer itself, and a modified version of the original 32-bit NTDLL (which thunks to Wow64 instead of making system calls itself). The problem here is that if you reference the name “ntdll”, you will tend to get the 64-bit version of ntdll back, even if you are in x64 mode. For instance, consider the following:

0:026:x86> u ntdll!NtClose ntdll!ZwClose:
00000000`78ef1350 4c      dec     esp
00000000`78ef1351 8bd1    mov     edx,ecx
00000000`78ef1353 b80c000000 mov  eax,0xc
00000000`78ef1358 0f05    syscall
00000000`78ef135a c3      ret
00000000`78ef135b 666690  nop
00000000`78ef135e 6690    nop
ntdll!NtQueryObject:
00000000`78ef1360 4c      dec     esp
0:026:x86> .effmach .
Effective machine: x64 (AMD64)
0:026> u ntdll!NtClose
ntdll!ZwClose:
00000000`78ef1350 4c8bd1           mov     r10,rcx
00000000`78ef1353 b80c000000       mov     eax,0xc
00000000`78ef1358 0f05             syscall
00000000`78ef135a c3               ret
00000000`78ef135b 666690           nop
00000000`78ef135e 6690             nop
ntdll!NtQueryObject:
00000000`78ef1360 4c8bd1           mov     r10,rcx
00000000`78ef1363 b80d000000       mov     eax,0xd

Here, we got the same address back even if we switched to x86 mode, and as a result the code we tried to disassemble wasn’t valid (because of the new instruction prefixes added by x64). This can get particularly insidious if you are trying to set a breakpoint in the middle of an ntdll function, since if you are not careful, you might set a breakpoint in the wrong copy of ntdll (and probably in the middle of an instruction, which would likely lead to a crash later on instead of the expected stop at a breakpoint exception). If you want to reference the 32-bit ntdll, then you have to use a special name that is a concatenation of the string “ntdll_” and the base address at which the 32-bit ntdll was loaded. For instance:

0:026:x86> u ntdll_7d600000!NtClose
ntdll_7d600000!NtClose:
00000000`7d61c917 b80c000000 mov  eax,0xc
00000000`7d61c91c 33c9    xor     ecx,ecx
00000000`7d61c91e 8d542404 lea    edx,[esp+0x4]
00000000`7d61c922 64ff15c0000000 call dword ptr fs:[000000c0]
00000000`7d61c929 c20400  ret     0x4
00000000`7d61c92c 8d4900  lea     ecx,[ecx]
ntdll_7d600000!NtQueryObject:
00000000`7d61c92f b80d000000 mov  eax,0xd
00000000`7d61c934 33c9    xor     ecx,ecx

Another common gotcha is forgetting that you are in the wrong processor mode for the code you are disassembling. The disassembler operates in the current effective processor as set by “.effmach”, regardless of whether you are disassembling code in a 32-bit or 64-bit module. This can be confusing if you forget to change the processor type, as you can end up looking at something that is almost valid code, but not quite (due to some subtle differences in the 32-bit and 64-bit instruction sets).

Finally, one other source of confusion can be filenames. Remember that under Wow64, programs have an altered view of cetain filesystem locations, such as %SystemRoot%\System32. Some filenames (especially for loaded modules) may refer to %SystemRoot%\system32, and some may refer to %SystemRoot%\syswow64. Despite the difference in apparent filenames, if you are debugging a Wow64 process, these two directories are the same (and both refer to %SystemRoot%\SysWOW64 on the actual filesystem as viewed from 64-bit programs).

Next time: Tricks for catching 64-bit portability problems with the debugger.

Introduction to x64 debugging, part 3

Thursday, July 13th, 2006

The last installment of this series described some of the basics of the new calling convention in use on x64 Windows, and how it will impact the debugging experience. This post describes how the unwinding and exception handling aspects matter to you when you debug programs.

I touched on some of the benefits of the new unwind mechanism in the last post – specifically, how you can expect to see full stack traces even without symbols – but, I didn’t really go into a whole lot of detail as to how they are implemented. Microsoft has the full set of details available on MSDN. Rather than restate them all here, I’m going to try to put them into perspective with respect to debugging and how they matter to you.

Perhaps the easiest way to do this is to compare them with x86 exception handling (EH)/unwind support. In the x86 Win32 world, EH/unwind are implemented as a linked list of EXCEPTION_REGISTRATION structures stored at fs:[0] (the start of the current threads TEB). When an exception occurs, the exception dispatcher code (either in NTDLL for a user mode exception or NTOSKRNL for a kernel mode exception) searches through this linked list and calls each handler with information about the exception. The exception handler can indicate that control should be resumed immediately to the faulting context, or that the next handler should be called, or that the exception handler has handled the exception and that the stack should be unwound to it. The first two paths are fairly straightforward; either a context record is continued via NtContinue (if you aren’t familiar with the native API layer, this is effectively a longjmp), or the next handler in the chain is called. If the last handler in the list is reached and does not handle the exception then the thread is terminated (for Win32 programs, this should never happen, as Kernel32 installs an exception handler that will catch all exceptions before it calls process / thread entrypoint functions definened by an application).

The unwind path is a bit more interesting; here, all of the exception handlers between the one that requested an unwind and the top of the list are called with a flag indicating that they should unwind the stack. Each exception handler routine “knows” how to unwind the procedure(s) that it is responsible for. In this mechanism, the stack gets unwound properly back to the point where the exception was handled. While this works well enough for the actual exception handling process itself, there is a flaw in this design; it precludes unwinding call frames without actually calling the unwind handlers in question. In addition, functions in the middle of an unwind path which did not register an exception handler are invisible to the unwind code itself (this does not pose a problem for normal unwinds, as for any function that has any unwind special unwind requirements, such as functions with C++ objects on the stack that have destructors, will implicitly register an exception handler).

What this means for you as it relates to debugging is that on x86, it isn’t generally possible to cleanly unwind *without calling the unwind/exception handler functions*. This means that the debugger cannot automatically unwind the stack and produce a valid stack trace with reliable results, without special help, typically in the form of symbols that specify how a function uses the stack. If a function in the middle of the call stack doesn’t have symbols, then there is a good chance that any debugger-initiated stack traces will stop at that function (a common and frustrating occurance if you are debugging code without symbols on x64).

As I alluded to in the previous posting, this problem has gone away on x64, thanks to the new unwind semantics. The way this works under the hood is that every function that is a non-leaf function (that is, every function which calls another function) is required to have a set of metadata associated with it that describes how the function is to be unwound. This is similar in prinicple to the symbol unwind information used in x86 if you have symbols, except that it is built into the binary itself (or dynamically registered at runtime, for dynamically generated code, like .NET). This unwind metadata has everything necessary to unwind a function without actually having to call exception handling code (and, indeed, exception handlers no longer perform “manual” unwinds as is the case on x86 – the NTDLL or NTOSKRNL exception dispatcher can take care of this for you thanks to the new unwind metadata).

For most purposes, you can be oblivious to this fact while debugging something; the debugger will automagically use the unwind metadata to construct accurate stack traces, even with no symbols available. An example of this is:

 

0:000> k
Child-SP          RetAddr           Call Site
00000000`0012fa28 00000000`78ef6301 ntdll!ZwRequestWaitReplyPort+0xa
00000000`0012fa30 00000000`78ddc6ed ntdll!CsrClientCallServer+0x61
00000000`0012fa60 00000000`78ddc92a kernel32!GetConsoleInputWaitHandle+0x39d
00000000`0012fbd0 00000000`4ad1df2c kernel32!ReadConsoleW+0x7a
00000000`0012fca0 00000000`4ad15fa7 cmd+0x1df2c
00000000`0012fd60 00000000`4ad02530 cmd+0x15fa7
00000000`0012fdc0 00000000`4ad035ca cmd+0x2530
00000000`0012fe30 00000000`4ad17027 cmd+0x35ca
00000000`0012fe80 00000000`4ad04eef cmd+0x17027
00000000`0012ff20 00000000`78d5965c cmd+0x4eef
00000000`0012ff80 00000000`00000000 kernel32!BaseProcessStart+0x2c

 

With symbols loaded, we can see that the stack trace is exactly the same:

 

0:000> k
Child-SP          RetAddr           Call Site
00000000`0012fa28 00000000`78ef6301 ntdll!ZwRequestWaitReplyPort+0xa
00000000`0012fa30 00000000`78ddc6ed ntdll!CsrClientCallServer+0x9f
00000000`0012fa60 00000000`78ddc92a kernel32!ReadConsoleInternal+0x23d
00000000`0012fbd0 00000000`4ad1df2c kernel32!ReadConsoleW+0x7a
00000000`0012fca0 00000000`4ad15fa7 cmd!ReadBufFromConsole+0x11c
00000000`0012fd60 00000000`4ad02530 cmd!FillBuf+0x3d6
00000000`0012fdc0 00000000`4ad035ca cmd!Lex+0xd2
00000000`0012fe30 00000000`4ad17027 cmd!Parser+0x132
00000000`0012fe80 00000000`4ad04eef cmd!main+0x458
00000000`0012ff20 00000000`78d5965c cmd!mainCRTStartup+0x171
00000000`0012ff80 00000000`00000000 kernel32!BaseProcessStart+0x29

 

As you can see, even with no symbols, we still get a stack trace that includes all of the functions active in the selected thread context.

Sometimes you will need to manually examine the unwind data, however. One of the major reasons for this is if you need to do some work with an exception handler. On x86, the familiar set of instructions “push fs:[0]; mov fs:[0], esp” (or equivalent) signify an exception handler registration. In x64 debugging, you won’t see anything like this, because there is no runtime registration of exception handlers (except via calls to RtlAddFunctionTable). To determine if a function has an exception handler (and what the address is), you’ll need to use a command that you have probably never touched before – .fnent. The .fnent (function entry) command displays the active EH/unwind metadata associated with a function, among other misc. information about the function in question (such as its extents). For instance:

 

0:000> .fnent kernel32!LocalAlloc
Debugger function entry 00000000`01dc2ab0 for:
(00000000`78d6e690)   kernel32!LocalAlloc   |
(00000000`78d6e730)   kernel32!GetCurrentProcessId
Exact matches:
kernel32!LocalAlloc = 

BeginAddress      = 00000000`0002e690
EndAddress        = 00000000`0002e6c3
UnwindInfoAddress = 00000000`000d9174

 

Unfortunately, this command does not directly translate the exception handler information that we are interested in, so we have to do some manual work. The offsets provided are relative to the base of the module in which the function resides, so working with our existing example, we’ll need to add the value “kernel32” to each of the offsets to form a completed address.

The format of the unwind information itself is described on MSDN; the important parts are as follows:

 

typedef struct _UNWIND_INFO {
UBYTE Version       : 3;
UBYTE Flags         : 5;
UBYTE SizeOfProlog;
UBYTE CountOfCodes;
UBYTE FrameRegister : 4;
UBYTE FrameOffset   : 4;
UNWIND_CODE UnwindCode[1];
/*  UNWIND_CODE MoreUnwindCode[((CountOfCodes + 1) & ~1) - 1];
*   union {
*       OPTIONAL ULONG ExceptionHandler;
*       OPTIONAL ULONG FunctionEntry;
*   };
*   OPTIONAL ULONG ExceptionData[]; */
} UNWIND_INFO, *PUNWIND_INFO; 

typedef union _UNWIND_CODE {
struct {
UBYTE CodeOffset;
UBYTE UnwindOp : 4;
UBYTE OpInfo   : 4;
};
USHORT FrameOffset;
} UNWIND_CODE, *PUNWIND_CODE;

 

Given the structure definition above, we can write a simplified debugger expression to parse the unwind information structure and tell us the interesting bits. This expression does not handle all cases – in particular, it doesn’t handle chained unwind information properly, for which you would need to write a more complicated expression or do the work manually.

 

0:000> u kernel32+dwo(kernel32+00000000`000d9174+
@@c++((1+ @@masm(by(2+kernel32+00000000`000d9174))) & ~1) * 2 + 4)
kernel32!_C_specific_handler:
00000000`78d92180 ff25eafafaff jmp qword ptr
[kernel32!_imp___C_specific_handler (0000000078d41c70)]

 

The expression finds the count of unwind codes from an UNWIND_INFO structure, performs the necessary alignment calculates, multiplies the resulting value by the size of the UNWIND_CODE union, and adds the resultant value to the offset into the UNWIND_INFO structure where unwind codes are stored. Then, this value is added to the pointer to the UNWIND_INFO structure itself, which gives us a pointer to UNWIND_INFO.ExceptionHandler. This value is an offset into the module for which the exception handler routine is associated with, so by adding the base address of the module, we (finally!) get the address of the exception handler function itself. In this case, it’s __C_specific_handler, which is the equivalent of _except_handler3 in x86 (the standard VC++ generated exception handler for C/C++ code). __C_specific_handler has its own metadata stored in the “ExceptionData” member that describes where the actual C/C++ exception handlers are (i.e. the exception filter/exception handler defined with __except in CL). The format of these structures is as so:

 

typedef struct _CL_SCOPE {
ULONG BeginOffset;   // imagebase relative
ULONG EndOffset;     // imagebase relative
ULONG HandlerOffset; // imagebase relative
ULONG TargetOffset;  // imagebase relative
} CL_SCOPE, * PCL_SCOPE; 

typedef struct _CL_EXCEPTION_DATA {
ULONG NumEntries;
CL_SCOPE ScopeEntries;
} CL_EXCEPTION_DATA, * PCL_EXCEPTION_DATA;

 

If the exception handler is a CL one using __C_specific_handler (as is the case here), we can find the code corresponding to the __except filter/handler by dumping the CL scope table entries as so:
 

0:000> dd kernel32+00000000`000d9174+
@@c++((1+ @@masm(by(2+kernel32+00000000`000d9174))) & ~1)
* 2 + 4 + 4 + 4) L dwo(kernel32+00000000`000d9174+
@@c++((1+ @@masm(by(2+kernel32+00000000`000d9174))) & ~1) * 2 + 4 + 4) * 4
00000000`78e19198  000164fb 00016524 00000001 000709ef
00000000`78e191a8  00016524 00016565 00000001 000709ef
00000000`78e191b8  00016565 00016583 00000001 000709ef
00000000`78e191c8  00016583 00016585 00000001 000709ef
00000000`78e191d8  00070968 0007098d 00000001 000709ef
00000000`78e191e8  0007098d 000709cc 00000001 000709ef
00000000`78e191f8  000709cc 000709ef 00000001 000709ef

 

This command gave us a list of address ranges within kernel32!LocalAlloc that are covered by a C/C++ exception handler, whether there is a filter expression or not (depending on the value of HandlerOffset; 1 signifies that the exception is simply handled by executing the “TargetOffset” routine), and the offset of the handler (TargetOffset). All of the offsets are relative to the base address to kernel32. We can unassemble the handler specified by each of them to see that it is simply setting the last Win32 error based on an exception code:

 

0:000> u kernel32+000709ef
kernel32!LocalAlloc+0x1cb:
00000000`78db09ef 33ff             xor     edi,edi
00000000`78db09f1 48897c2420       mov     [rsp+0x20],rdi
00000000`78db09f6 8bc8             mov     ecx,eax
00000000`78db09f8 e863dcfbff call kernel32!BaseSetLastNTError (0000000078d6e660)
00000000`78db09fd 8d7701           lea     esi,[rdi+0x1]
00000000`78db0a00 448b642460       mov     r12d,[rsp+0x60]
00000000`78db0a05 488b5c2428       mov     rbx,[rsp+0x28]
00000000`78db0a0a e9765bfaff jmp kernel32!LocalAlloc+0x1e6 (0000000078d56585)

 

That’s all for this post. Next time, I’ll talk about some of the common “gotchas” when dealing with Wow64 debugging.

Additional credits for this article: C++ exception handling information from “Improved Automated Analysis of Windows x64 Binaries” by skape.

Introduction to x64 debugging, part 2

Wednesday, July 12th, 2006

Last time, I talked about some of the basic differences you’ll see when switching to an x64 system if you are doing debugging using the Debugging Tools for Windows package.  In this installment, I’ll run through some of the other differences with debugging that you’ll likely run into – in particular, how changes to the x64 calling convention will make your life much easier when debugging.

Although the x64 architecture is in many respects very similar to x86, many of the conventions of x86-Win32 that you might be familiar with have changed.  Microsoft took the opportunity to “clean house” with many aspects of Win64, since for native x64 programs, there is no concern of backwards binary compatibility.

One of the major changes that you will quickly discover is that the calling conventions that x86 used (__fastcall, __cdecl, __stdcall) are not applicable to x64.  Instead of many different calling conventions, x64 unifies everything into a single calling conention that all functions use.  You can read the full details of the new calling convention on MSDN, but I’ll give you the executive summary as it applies to debugging programs here.

  •  The first four arguments of a function are passed as registers; rcx, rdx, r8, and r9 respectively.  Subsequent arguments are passed on the stack.
  • The caller allocates the space on the stack for parameter passing, like for __stdcall on x86.  However, the caller must allocate at least 32 bytes of stack space for the callee to use a “register home space” the first four parameters (or scratch space).  This must be done even if the callee has no arguments or less than four arguments.
  • The caller always cleans the stack of arguments passed (like __cdecl on x86) if necessary.
  • Stack unwinding and exception handling are significantly different on x64; more details on that later.  The new stack unwinding model is data-driven rather than code-driven (like on x86).
  • Except for dynamic stack adjustments (like _alloca), all stack space must be allocated in the prologue.  Effectively, for most functions, the stack pointer will remain constant throughout the execution process.
  • The rax register is used for return values.  For return values larger than 64 bits, a hidden pointer argument is used.  There is no more spillover into a second register for large return values (like edx:eax, on x86).
  • The rax, rcx, rdx, r8, r9, r10, r11 registers are volatile, all other registers must be preserved.  For floating point usage, the xmm0, xmm1, xmml2, xmm3, xmm4, xmm5 registers are volatile, and the other registers must be preserved.
  • For floating point arguments, the xmm0 through xmm3 registers are used for the first four arguments, after which stack spillover is performed.
  • The instructions permitted in function prologues and epilogues are highly restricted to a very small subset of the instruction set to facilitate unwinding operations.

The main takeaways here from a debugging pespective are thus:

  • Even though a register calling convention like __fastcall is used, the register arguments are often spilled to the “home area” and so are typically visible in call stacks, especially in debug builds.
  • Due to the nature of parameter passing on x64, the “push” instruction is seldom used for setting up arguments.  Instead, the compiler allocates all space up front (like for local variables on x86) and uses the “mov” instruction to write stack parameters onto the stack for function calls.  This also means that you typically will not see an “add rsp” (or equivalent) after each function call, despite the fact that the caller cleans the stack space.
  • The first stack arguments (argument 5, etc) will appear at [rsp+28h] instead of [rsp+08h], because of the mandatory register home area.  This is a departure from how __fastcall worked on x86, where the first stack argument would be at [esp+04h].
  • Because of the data driven unwind semantics, you will see perfect stack unwinding even without symbols.  This means that even if you don’t have any symbols at all for a third party binary, you should always get a complete stack trace all the way back to the thread start routine.  As a side effect, this means that the stack traces captured by PageHeap or handle traces will be much more reliable than on x86, where they tended break at the first function that did not use ebp (because those stack traces never used symbols).
  • Because of the restrictions on the prologue and epilogue instruction usage, it is very easy to recognize where the actual important function code begins and the boilerplate prologue/epilogue code ends.

If you’ve been debugging on x86 for a long time, then you are probably pretty excited about the features of the new calling convention.  Because of the perfect unwind semantics and constant stack pointer throughout function execution model, debugging code that you don’t have symbols for (and using the built-in heap and handle verification utilities) is much more reliable than x86.  Additionally, compiler generated code is usually easier to understand, because you don’t have to manually track the value of the stack pointer changing throughout the function call like you often did on x86 functions compiled with frame pointer omission (FPO) optimizations.

 There are some exceptions to the rules I laid out above for the x64 calling convention.  For functions that do not call any other functions (called “leaf” functions), it is permissible to utilize custom calling conventions so long as the stack pointer (rsp) is not modified.  If the stack pointer is modified then regular calling convention semantics are required.

Next time, I’ll go into more detail on how exception handling and unwinding is different on x64 from the perspective of what the changes mean to you if you are debugging programs, and how you can access some of the metadata associated with unwinding/exception handling and use it to your advantage within the debugger.