You can open a PE image as a dump file with WinDbg

November 17th, 2006

There is a little known feature of WinDbg, ntsd, cdb, kd, and anything else that uses DbgEng to open dump files.

It turns out that with anything powered by DbgEng, anywhere where you could open a dump file (user dump, kernel dump, etc), you can instead open a PE image (.exe/.dll/.sys/etc) and have the debugger treat it as a dump containing just the contents of the selected PE image.

This is actually a relatively useful feature. When you open a PE image as a dump file, the debugger maps it as an image as if it were loaded in-memory as executable code (though it doesn’t actually run any code, just maps it as if it were an executable and not a data file). This gets you an in-memory representation of your exe/dll/sys/other PE file as if you were debugging a live process (or a dump) that had the image in question loaded.

Like a dump debugging session, this is essentially a read-only session; you can’t really modify anything, as there is no target to control. Additionally, there is no real register context either (or stack or heap), although things like initialized and zero filled global variables and executable code belonging to the module will be in-memory. (The preferred image base for the module is used in this situation for basing the requested PE module in the virtual address space constructed for the debugging session.)

After you have loaded the target, you can do anything that you would normally do with a dump for the most part, as far as examining symbols and disassembling the target go. If you need a disassembler with symbol support and can’t start a process or whatnot to contain a PE image, this particular trick is a great quick-n-dirty replacement for a more full-featured disassembler program.

Note that a side effect of opening a PE image in dump mode is that the symbol server is used to retrieve the binary (which might seem a bit strange, until you consider that for dump files, the normal case is that you don’t have the entire binary saved in memory; just enough header information to retrieve the binary from the symbol server). Therefore, make sure that your symbol path is setup correctly before trying this particular trick.

The troolean strikes back

November 16th, 2006

There is a particularly amusing curiousity in Microsoft Win32 code that I have seen crop up a couple of times over the year: The “troolean”, or the “boolean” value with true/false/other as possible values.

You have probably seen this before in the winuser messaging APIs, such as the classic example of GetMessage:

WINUSERAPI
BOOL
WINAPI
GetMessageW(
    __out LPMSG lpMsg,
    __in_opt HWND hWnd,
    __in UINT wMsgFilterMin,
    __in UINT wMsgFilterMax);

According to the GetMessage documentation on MSDN, the return value has the following meanings:

If the function retrieves a message other than WM_QUIT, the return value is nonzero.

If the function retrieves the WM_QUIT message, the return value is zero.

If there is an error, the return value is -1. For example, the function fails if hWnd is an invalid window handle or lpMsg is an invalid pointer. To get extended error information, call GetLastError.

Yep, it’s zero, non-zero, or negative one.

Okay, I’ll give Microsoft a bit of slack there, since GetMessage has been around since the 16-bit Windows days, and the alternate return value for GetMessage was probably not there in the original design.

The thing that really got me started writing this article, though, was a little gem that I found while poking around in the CRT source code included with Visual Studio 2005. In “mstartup.cpp”, I found the following bit of code which is apparently used for Managed C++ support (which is not exactly legacy/old code):

class TriBool
{
public:
    enum State { False = 0, True = -1, Unknown = 2 };
private:
    TriBool();
    ~TriBool();
};

I guess trooleans are still all the rage these days (for some of us, anyway). Perhaps somebody needs to introduce the CRT programmer(s) to The Daily WTF

The kernel object namespace and Win32, part 2

November 15th, 2006

Last time, I talked about how the kernel object namespace intersects with the Win32 world as of how things stood in NT4 (and Windows 2000 when Terminal Server is disabled).

Although the object namespace model used in these operating systems worked well, some cracks began to appear in it with the introduction of Terminal Server as a mainstream product (and later, RunAs).

There are basically two problems that are imposed by Terminal Server:

  1. Many programs are designed to run as “singleton” programs, but in multi-session environments like Terminal Server, it is desirable to allow each session to run an instance of such programs. The classic way to enforce singleton behavior like this is to create a named kernel object at startup, and check to see if the object already existed before the program tried to create it. Without alterations to how the object namespace presented to Win32 functions, this would prevent singleton applications from working under Terminal Server.
  2. Drive letters and other “DOS Devices” symbolic links need to become “sessionized”, because the “glass terminal” (physical computer console) typically has things like COM1 or LPT1 pointing to physical serial ports or printer ports, whereas Terminal Server sessions might be using device redirection to point those device names to serial ports or printer ports on the Terminal Server client machine.

The solution to this problem was partitioning the view of the kernel object namespace provided to Win32 based on Terminal Server session id. This is done with the use of a set of symbolic links and object directories.

The basic idea is that session zero continues to use the \BaseNamedObjects object directory, as how things used to work on downlevel systems. In this object directory, there are a couple of new symbolic links:

  • \Local, which points to the session local namespace. For session zero, this symbolic link typically points to \BaseNamedObjects. For other sessions, it points to a “sessionized” namespace, such as \Sessions\<Terminal-Server-Session-ID>\BaseNamedObjects.
  • \Global, which always points to the session zero namespace (the “global” namespace). This always points to \BaseNamedObjects for all sessions.
  • \Session, which points to \Sessions\BNOLINKS. This latter symbolic link is not documented (except to say that it is “reserved for system use”), but its function is to allow one session (with the appropriate access granted to it) to create objects in an arbitrary session local namespace, by using a path in the form \Session\<Terminal-Server-Session-ID>\ObjectName. \Sessions\BNOLINKS is an object directory which contains a set of symbolic link objects, each named after a Terminal Server session ID. These links point to the appropriate “sessionized” BaseNamedObjects directory for each session (such as \BaseNamedObjects for session zero, \Sessions\1\BaseNamedObjects for session 1, and soforth).

For sessions other than session zero, a BasedNamedObjects directory named in the form of \Sessions\<Terminal-Server-Session-ID>\BaseNamedObjects is created. This is the session local namespace for that session, and it is what is linked to via the “\Local” symbolic link. Additionally, a corresponding symbolic link in \Sessions\BNOLINKS is created so that the (undocumented) “\Session” link works for the new session.

This scheme allows for maximum compatibility with pre-Terminal Server applications which are not aware of the “session isolation” concept, and need some help in order to have their named objects placed in a “sessionized” location where they will not conflict with other user sessions. A means for programs that truly need globally-accessible object names is also provided (the magical \Global prefix symbolic link), which is typically used by services and user-level UI applications that need to communicate with their privileged service counterpart.

In addition to the session isolation of kernel object names, “DOS device” names also became isolated based on Terminal Server session ID. The way this works differs between Windows 2000 and future OS versions, though; I’ll cover it in the next installment of this series. The basic idea for Windows 2000 is that each session got its own directory for DOS device names, but Windows XP and beyond go one step further to better accomodate the “runas” case.

Blog DNS issues fixed

November 15th, 2006

There was a small DNS glitch affecting access to the site which was resolved yesterday – sorry about that.

Win32 calling conventions review

November 10th, 2006

Recently, I’ve posted about the Win32 calling conventions. Here’s a table of contents of the various different posts I’ve made.

  1. Win32 calling conventions: Concepts
  2. Win32 calling conventions: Usage cases
  3. Win32 calling conventions: __cdecl in assembler
  4. Win32 calling conventions: __stdcall in assembler
  5. Win32 calling conventions: __fastcall in assembler
  6. Win32 calling conventions: __thiscall in assembler

Remember that when picking a calling convention to use, there are a number of factors to consider. There is no one calling convention that fits all cases (however, __stdcall is a good default if you are not sure).

Hopefully, you’ll have found this series to be enlightening, useful, and practically applicable.

Alex Ionescu’s blog is up

November 9th, 2006

Alex Ionescu of TinyKRNL and ReactOS has put up his blog. I’d encourage you to check it out; I’m certain that he’ll have plenty of interesting and worthwhile content up there given his background.

Debugger flow control: Using conditional breakpoints (part 3)

November 9th, 2006

Previously, I had touched some more on the when and why’s as to where hardware breakpoints can be useful.

If you have been following along so far, you should already know the ups and downs of each flavor of breakpoint, and have at least a fair idea as to when you should prefer one to another. There is one other aspect of breakpoint management that I have yet to cover, though, and it is perhaps the most useful feature of breakpoints in WinDbg: conditional breakpoints.

Conditional breakpoints allow you to, as you might imagine, set conditions for breakpoints. That is, the debugger will only actually stop for you to investigate something when both the breakpoint is triggered, and its associated condition is met. These kinds of breakpoints are very useful if you need to stop on a certain function, but only if a certain argument has a certain value (for instance).

However, in WinDbg (and the other DTW/DbgEng debuggers), the support for conditional breakpoints allows you to do much more than that. Specifically, you are permitted to define arbitrary commands that are automatically executed any time a breakpoint is hit. Aside from allowing you to create conditional breakpoints, this also allows you to perform a number of other highly useful tasks in a quick and automated fashion with the debugger. For example, you might want to alter arguments passed to a particular function, or you might want to log arguments (or a stack trace) to a particular function for future analysis.

For example, let’s say that you wanted to see anyone who called CreateFileW, which filename they provided, and what the call stack for each caller might be, what the return value is, and then continue execution. Now, you could do this manually with a “plain” breakpoint and repeating a certain set of commands every time the breakpoint hits, but it would be far superior to automate the whole process.

With DbgEng’s conditional breakpoint support, this is easy. All you need to do in order to have a set of commands that are executed after a breakpoint is to follow the breakpoint statement with a set of commands enclosed in double quotes. (If the commands themselves require the use of double quotes, then you’ll have to escape those, using \”).

From looking at CreateFile on MSDN, we can see that the first argument is a unicode string describing the name of the file that we are to create.

Armed with this information, we can construct a breakpoint that will perform the logging that we are looking for.

Here’s what you might come up with:

0:001> bp kernel32!CreateFileW "du poi(@esp+4);kv;gu;? @eax; g"

Let’s break down this breakpoint a little bit. As usual, the semicolon character (;) is used to separate multiple debugger commands appearing on the same line. The first command is fairly straightforward; it makes use of the du command to display the first argument. The du command simply displays a zero-terminated Unicode string at a given address.

Next, we take a full backtrace (k). After that, we allow execution to continue until we reach the return address of CreateFileW with the gu command (“Go Up one call level”). Finally, we display the eax register (which happens to be the return value register for x86), and continue execution with g.

In action, you might see something like so. In this instance, I am attached to cmd.exe and have executed the command “type c:\config.sys”…

0013fa4c  "c:\\CONFIG.SYS"
ChildEBP RetAddr  
0013e6e4 4ad02f2a kernel32!CreateFileW
0013e730 4ad02e91 cmd!Copen_Work+0x157
0013e744 4ad0dbff cmd!Copen+0x12
0013f7a8 4ad0db62 cmd!TyWork+0x48
0013fc58 4ad0daac cmd!LoopThroughArgs+0x1dd
0013fc6c 4ad05aa2 cmd!eType+0x17
0013fe9c 4ad013eb cmd!FindFixAndRun+0x1f5
0013fee0 4ad0bbba cmd!Dispatch+0x137
0013ff44 4ad05164 cmd!main+0x216
0013ffc0 7c816fd7 cmd!mainCRTStartup+0x125
0013fff0 00000000 kernel32!BaseProcessStart+0x23
Evaluate expression: 132 = 00000084

Essentially, we have turned the debugger into something to inspect function calls and provide us with detailed information about what is happening, in a completely automated fashion.

You can also use this technique for more active intervention in the target as well – for instance, skipping over a function call entirely, modifying what a function does when it is called, or any number of things. Previous articles have, for instance, used conditional breakpoints to alter function behavior (such as making all new virtual memory allocations come from the high end of the address space instead of the low end).

Conditional breakpoints are an invaluable tool in your debugging (and reverse engineering) arsenal; you should absolutely consider them any time that you need any sort of automation to record or alter the behavior of the target, in situations where it is not practical to manually perform the work on every single breakpoint hit.

Next up in this series: Other flow control mechanisms with the debugger, such as stepping and tracing.

The Windows Vista SDK is RTM

November 8th, 2006

The release version of the Windows Vista SDK has been released to the world.

You can download it for free from Microsoft in the full installer image or web installer formats.

The full installer image is almost 1.2GB, so be prepared to have a large chunk of hard drive space to burn on the new SDK.

Debugger flow control: More on breakpoints (part 2)

November 8th, 2006

Last time, I discussed the two types of breakpoints that you’ll see in a debugger (hardware and sfotware) at a high level. I didn’t really explain when it was best to use one instead of the other, though, besides a couple hints relating to hardware breakpoints being limited in number and good for tracking down memory corruption issues at times.

By taking a closer look at how each of the two breakpoints work, we can get some idea as to when we’ll prefer one to another. Both types of breakpoints alter the target in some way, but to differing degrees.

The primary concern with software breakpoints is that they actually involve patching memory in the target to set the breakpoint. This is usually fine; the debugger uses it as its default breakpoint strategy when you give an end address to g, for instance. However, it begins to break down if the target both executes a region that you are setting a breakpoint on, and also reads that same region.

This particular concern is a real problem when you are dealing with self-modifying code, or certain protection schemes (such as code that attempts to checksum itself in memory). In these cases, you might accidentally break self-modifying code, or trip a protection scheme, simply by virtue of setting a breakpoint (since the act of setting a software breakpoint actually modifies the address space of the target).

In cases like this, hardware breakpoints can come to the rescue. Since setting an execute hardware breakpoint does not actually modify the underlying instruction, anything that reads the memory backing that instruction will not get back an 0xCC opcode instead of the real first byte of the instruction opcode. Granted, you can only have four enabled hardware breakpoints at a time, but usually you can get by with that many (or at least, a “sliding window” of hardware breakpoints, assuming you have breakpoints over a well-defined execution sequence. In this case, you could have breakpoints disable themselves and enable the next breakpoint, thus conserving the number of active breakpoints).

There is also the other obvious advantage to hardware breakpoints which I touched on earlier: the ability to set a breakpoint on a memory fetch for a particular address. This obviously has a great deal of uses, whether you’re reverse engineering something or are tracking down a corruption problem. Memory-access breakpoints are an excellent way to very quickly figure out which piece of code is modifying a variable, without having to trace through an arbitrarily large set of code to find the access that you were looking for. One thing to consider about memory-access breakpoints on x86 and x64, though, is that there is only support for setting memory-access breakpoints on regions that are 1) a power of 2 in length, and 2) have a length that is less than or equal to the native pointer size (8 for x64, or 4 for x86). (If you are lucky enough [or perhaps unlucky enough, as Itanium isn’t exactly the most friendly thing to view from an assembler perspective] to be debugging on an Itanium platform, this restriction does not exist; you can set a length of any power of 2 between 1 byte and two gigabytes). As a result, you’ll have to plan where to set your breakpoints carefully, as on x86, you can only cover at most 16 bytes with this kind of “memory guard” access. You might or might not be able to use the same kind of “sliding breakpoint window” idea I mentioned above, if the memory locations you are setting breakpoints on are accessed in a particular sequence (or at least, the accesses that you are interested in).

Hardware breakpoints are typically less invasive than software breakpoints, but there are still ways that they can be interfered with. The most common case for this happening is if you try to set a hardware breakpoint while DLL initializers are being called during process startup (such as at the initial create process breakpoint). If you try to do this, you’ll get a warning from the debugger advising you that your breakpoints won’t stick:

0:000> ba e1 kernel32!CreateFileA
        ^ Unable to set breakpoint error
The system resets thread contexts after the process
breakpoint so hardware breakpoints cannot be set.
Go to the executable's entry point and set it then.
 'ba e1 kernel32!CreateFileA'

The reason why this is the case is that there is a context set that occurs between the initial process breakpoint being hit and the requested thread start address / process start address being executed. I’ll go into just how this works at process startup in a future posting, but to keep it simple, the basic idea is that an APC is queued to the new usermode thread that runs the loader component in NTDLL. One of the arguments to the APC is a context record describing the register context that was requested for the new thread by CreateProcess, CreateThread, and soforth. The loader component runs process (or thread) DLL initializers, and then calls NtContinue to continue execution at the specified context record, which kicks off execution at the user requested thread start address. We can see this in action easily by looking at the arguments that the APC dispatcher supplies to the loader initializer APC:

0:000> kv
ChildEBP RetAddr  Args to Child              
0013fb1c 7c93edc0 7ffdf000 ntdll!DbgBreakPoint
0013fc94 7c921639 0013fd30 ntdll!LdrpInitializeProcess+0xffa
0013fd1c 7c90eac7 0013fd30 ntdll!_LdrpInitialize+0x183
00000000 00000000 00000000 ntdll!KiUserApcDispatcher+0x7
0:000> .cxr 0013fd30 
eax=4ad05056 ebx=7ffdd000 ecx=00f2faa8 edx=00090000
esi=7c9118f1 edi=00011970
eip=7c810665 esp=0013fffc ebp=7c910570 iopl=0
         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000
             efl=00000200
kernel32!BaseProcessStartThunk:
7c810665 33ed            xor     ebp,ebp

If you have been paying attention so far, it should be clear why hardware breakpoints set at the initial process breakpoint do not appear to work like you might expect when you set them at the initial process breakin: When the APC that runs loader initializers returns, it restores a previously saved register context image via NtContinue. Since hardware breakpoints are part of the register context, they are wiped away after the context is restored, and so your breakpoints would appear to simply disappear after DLL initializers were finished.

This limitation also implies that calling SetThreadContext on a thread can interfere with hardware breakpoints if care is not taken to preserve the value of the Dr series of registers. Indeed, some protection schemes utilize such a trick in an attempt to defeat hardware breakpoints.

Fortunately, it is easy to work around such limitations using the debugger. There is a little-used command called “.apply_dbp” that allows you to instruct the debugger that it should re-apply hardware breakpoints, either to the current register context, or a saved register context image in-memory (supplied by the /m Context argument). With the use of this command, you can quickly restore your hardware breakpoints even after something attempts to trash them. Combined with a conventional breakpoint on, say, kernel32!SetThreadContext, this can be used to quickly re-enable the use of hardware breakpoints on such cases. You can also use this trick to persist hardware breakpoints in the process startup case, by using .apply_dbp /m <address-of-context-record-argument-from-APC-dispatcher> to enforce any hardware breakpoints you set in the register context image that will eventually be restored by NtContinue. For instance, in the case of the example that I gave above, you might use the following to apply hardware breakpoints to the context that NtContinue will restore:

0:000> .apply_dbp /m 0013fd30 
Applied data breakpoint state

Next up, some more tricks that you can do to get the most out of controlling the target in the debugger.

Debugger flow control: Hardware breakpoints vs software breakpoints

November 7th, 2006

In debugging parlance, there are two kinds of breakpoints that you may run across – “hardware” breakpoints, and “software breakpoints”. While the two overlap to a certain degree, it is important to know the differences between the two, and when it is better to use a “hardware” or “software” breakpoint.

For the purposes of this discussion, I’ll stick to using WinDbg on an x86 target. The same general concepts apply to other architectures (especially x64, which works near identically), and the commands to set breakpoints are the same, but details such as where and how many hardware or software breakpoints you may set slightly vary from platform to platform.

In most debugging scenarios, you have probably just used software breakpoints exclusively. Software breakpoints are issued by the bp or bu commands (breakpoint and deferred breakpoint, respectively). These breakpoints are fairly simple and straightforward; they cause the processor to halt in the debugger whenever a thread attempts to execute a piece of code that you set a breakpoint on. Typically, you may set any number of software breakpoints that you want at the same time. Software breakpoints may only be targetted at code; there is no support for setting a “memory breakpoint” via a software breakpoint. Many features such as stepping over a call or going to the return address of a function also implicitly use a temporary software breakpoint that is removed once execution hits it the first time.

Hardware breakpoints, on the other hand, are much more powerful and flexible than software breakpoints. Unlike software breakpoints, you may use hardware breakpoints to set “memory breakpoints”, or a breakpoint that is fired when any instruction attempts to read, write, or execute (depending on how you configure the breakpoint) a specific address. (There is also support for setting breakpoints on I/O port access, but I’ll not cover that feature here, as it is typically of very limited applicibility for every-day debugging tasks.) Hardware breakpoints have some limitations, however; the main limit being that the number of hardware breakpoints that you may have active is extremely limited (on x86, you may only have four hardware breakpoints active at the same time).

Now that we have a basic overview of what the two breakpoint types are, let’s dig a bit deeper and see how they work under the hood, and when you might use them.

The way software breakpoints work is fairly simple. Speaking about x86 specifically, to set a software breakpoint, the debugger simply writes an int 3 instruction (opcode 0xCC) over the first byte of the target instruction. This causes an interrupt 3 to be fired whenever execution is transferred to the address you set a breakpoint on. When this happens, the debugger “breaks in” and swaps the 0xCC opcode byte with the original first byte of the instruction when you set the breakpoint, so that you can continue execution without hitting the same breakpoint immediately. There is actually a bit more magic involved that allows you to continue execution from a breakpoint and not hit it immediately, but keep the breakpoint active for future use; I’ll discuss this in a future posting.

Now, you might be tempted to say that this isn’t really how software breakpoints work, if you have ever tried to disassemble or dump the raw opcode bytes of anything that you have set a breakpoint on, because if you do that, you’ll not see an int 3 anywhere where you set a breakpoint. This is actually because the debugger tells a lie to you about the contents of memory where software breakpoints are involved; any access to that memory (through the debugger) behaves as if the original opcode byte that the debugger saved away was still there.

Now that we know how software breakpoints work at a high level, it’s time to talk about the other side of the story, hardware breakpoints.

Hardware breakpoints are, as you might imagine given the name, set with special hardware support. In particular, for x86, this involves a special set of perhaps little-known registers know as the “Dr” registers (for debug register). These registers allow you to set up to four (for x86, this is highly platform specific) addresses that, when either read, read/written, or executed, will cause the processor to throw a special exception that causes execution to stop and control to be transferred to the debugger.

Given that on x86, you can only have four hardware breakpoints active at once, why would anyone possibly want to use them?

Well, the main strength of hardware breakpoints is that you can use them to halt on non-execution accesses to memory locations. This is actually an extremely useful capability; for example, if you were debugging a memory corruption problem where an initial instance of corruption eventually causes a crash, your initial reaction would probably be something on the lines of “gee, if I know who caused the corruption in the first place, this would be much, much easier to debug” – and this is exactly what hardware breakpoints let you do. In essence, you can use a hardware breakpoint to tell the processor to stop when a specific variable (address) is read or read/written to. You can also use hardware breakpoints to break in on code execution as well, although in the typical case, it is more common to use software breakpoints for that purpose due to the relaxed restrictions on how many breakpoints you may have active at once.

That’s the high level overview of the two main types of breakpoints you’ll encounter in a debugger. In some upcoming postings, I’ll go into some specifics as to how certain edge cases (such as stepping over a call) are implemented, and describe other situations where you’ll find it very useful to use one kind of breakpoint instead of another. I am also planning on discussing how some of the other debugger flow control features are really implemented (such as tracing / single step), and what the consequences of using each flow control method are on the debuggee.