Debugger tricks: Break on a specific Win32 last error value in Windows Vista

July 24th, 2007

Often times, one type of problem that you might want to track down in a debugger (aside from a crash) is a particular function failing in a certain way. In the case of most Win32 functions, you’ll often get some sort of (hopefully meaningful) last error code. Sometimes you might need to know why that error is returned, or where it originated from (in the case of a last error value that is propagated up through several functions).

One way you might approach this is with a conditional breakpoint, but the SetLastError path is typically frequently hit, so this is often be problematic in terms of performance, even in user mode debugging on the local computer.

On Windows Vista, there is an undocumented hook inside of NTDLL (which is now responsible for the bulk of the logic behind SetLastError) that allows you to configure a program to break into the debugger when a particular error code is being set as the last error. This is new to Vista, and as it is not documented (at least not anywhere that I can see), it might not be around indefinitely.

For the moment, however, you can set ntdll!g_dwLastErrorToBreakOn to a non-zero value (via the ed command in the debugger) to ask NTDLL to execute a breakpoint when it sees that last error value being set. Obviously, this won’t catch things that modify the field in the TEB directly, but anything using SetLastError or RtlSetLastWin32Error will be checked against this value (in the context of the debuggee).

For example, you might see something like this if you ask NTDLL to break on error 5 (ERROR_ACCESS_DENIED) and then try to open a file or directory that you don’t have access to:

0:002> ed ntdll!g_dwLastErrorToBreakOn 5
0:002> g

[...] Perform an operation to cause ERROR_ACCESS_DENIED

(1864.2774): Break instruction exception
  - code 80000003 (first chance)
ntdll!DbgBreakPoint:
00000000`76d6fdf0 cc              int     3
0:004> k
Call Site
ntdll!DbgBreakPoint
ntdll! ?? ::FNODOBFM::`string'+0x377b
kernel32!BaseSetLastNTError+0x16
kernel32!CreateFileW+0x325
SHELL32!CEnumFiles::InitAndFindFirst+0x7a
SHELL32!CEnumFiles::InitAndFindFirstRetry+0x3e
SHELL32!CFileSysEnum::_InitFindDataEnum+0x5e
SHELL32!CFileSysEnum::Init+0x135
SHELL32!CFSFolder::EnumObjects+0xd3
SHELL32!_GetEnumerator+0x189
SHELL32!CEnumThread::_RunEnum+0x6d
SHELL32!CEnumThread::s_EnumThreadProc+0x13
SHLWAPI!WrapperThreadProc+0xfc
kernel32!BaseThreadInitThunk+0xd
ntdll!RtlUserThreadStart+0x1d

(The debugger is slightly confused about symbol names in NTDLL due to the binary being reorganized into function chunks, but “ntdll! ?? ::FNODOBFM::`string’+0x377b” is part of ntdll!RtlSetLastWin32Error.)

Sometimes, it can be useful to add “debugger knobs” like this to your program that can be used to enable special diagnostics behavior that might be useful while debugging something. Several other components provide options like this; for example, there’s a global variable in NTDLL named ntdll!ShowSnaps that you can set to 1 in order to enable a large volume of debug print spew about the symbol import resolution process when the loader is resolving imported modules and symbols.

(Incidentally, debugger-settable global variables like ntdll!ShowSnaps are a good example of a correct way of using debug prints in release builds, though there are certainly many other good ways to do so.)

Update: Andrew Rogers points out that g_dwLastErrorToBreakOn existed on Srv03 as well, though it was resident in kernel32 (kernel32!g_dwLastErrorToBreakOn) and not NTDLL in that timeframe. As the last error logic was finally moved entirely to NTDLL in the Vista timeframe, so was the last error breakpoint hook.

Update: Pavel Lebedinsky points out that I neglected to mention that as a consequence of the internal BaseSetLastNTError routine in kernel32 on Srv03 not going through kernel32!SetLastError, the functionality available in Srv03 is generally much less useful (only catches things external to kernel32) than in Vista. Which, to be clear, is more my fault in not making this point known and not Andrew getting it wrong.

An introduction to DbgPrintEx (and why it isn’t an excuse to leave DbgPrints on by default in release builds)

July 21st, 2007

One of the things that was changed around the Windows XP era or so in the driver development world was the introduction of the DbgPrintEx routine. This routine was introduced to combat the problem of debug spew from all sorts of different drivers running together by allowing debug prints to be filtered by a “component id”, which is supposed to be unique per class of driver. By allowing a user to filter which debug prints are being displayed by driver, on the host-side instead of the debugger side, a better debugging experience can be provided when the system is fairly “crowded” as far as debug prints go. This is especially common on checked builds of the operating system.

Additionally, DbgPrintEx provides an additional mechanism to filter output host-side – a severity level. The way the filtering system works is that each component has an associated set of allowed severity levels, such that only a message with a severity level that is “allowed” will actually be transmitted to the debugger when the DbgPrintEx call is made. This allows debug prints to be filtered on a (component, severity) basis, such that especially verbose debug prints can be turned off at runtime without requiring a rebuild or patching the binary on the fly (which might be a problem if the binary in question is the kernel itself and you’re running a 64-bit build). In order to edit the allowed severity of a component at runtime, you typically modify one of the nt!Kd_<component>_Mask global variables in the kernel with the debugger, setting the global corresponding to the desired component to the desired severity mask.

With respect to older drivers, the old DbgPrint call still works, but it is essentially repackaged into a DbgPrintEx call, with hardcoded (default) component and severity values. This means that you’ll still be able to get output from DbgPrint, but (obviously) you can’t take advantage of the extra host-side filtering. Host-side filtering is much preferable to .ofilter-style filtering (which occurs debugger-side), as debug prints that are transferred “over the wire” to the debugger incur a significant performance penalty – the system is essentially suspended while the transfer occurs, and if you have a large amount of debug print spew, this can quickly make the system unusably slow.

Windows Vista takes this whole mechanism one step further and turns off plain DbgPrint calls by default, by setting the default severity level for the DbgPrint-assigned component value such that DbgPrint calls are not transmitted to the debugger. This can be overridden at runtime by modifying Kd_DEFAULT_Mask, but it’s an extra step that must be taken (and one that may be confusing if you don’t know about the default behavior change in Vista, as your debug prints will seemingly just never work).

However, just because DbgPrintEx now provides a way to filter debug prints host-side, that doesn’t mean that you can just go and turn on all your debug prints for release builds by default. Among other things, it’s possible that someone else is using the same component id as your driver (remember, component ids are for classes of devices, such as “network driver”, or “video driver”). Furthermore, DbgPrintEx calls still do incur some overhead, even if they aren’t transmitted to the debugger on the other end (however as long as the debug print is masked off by the allowed severities mask for your component, the overhead is fairly minimal).

Still, the problem of limited component ids remains a significant enough one that you don’t want to turn on debug prints always, or if someone else with the same component id wants to debug their driver, they’ll have all of your debug print spew mixed in with their own.

Also, there is an option to turn on all debug prints, which can sometimes come in handy, and if every driver in the system has debug prints on by default, this often results in a lot of badness. This can be accomplished by specifying a global severity mask via the nt!Kd_WIN2000_Mask global, which is checked before any component specific masks. (If Kd_WIN2000_Mask allows the debug print, it is short-circuited to being allowed without considering component-specific masks. This makes things easier if you want to grab certain severities of debug print messages from many components at the same time, without having to go and manually poke around with severity levels on every component you’re intersted in.)

Unfortunately, this is already a problem on Vista RTM (even free builds) – there are a couple of in-box Microsoft drivers that are guilty of this sort of thing, making Kd_WIN2000_Mask less than useful on Vista. Specifically, cdrom.sys likes to print debug messages like this every second:

Will not retry; Sense/ASC/ASCQ of 02/3a/00

That’s hardly the worst of it, though. Try starting a manifested program on Vista with all debug print severities turned on in Kd_WIN2000_Mask and you’ll get pages of debug spew (no, I’m not kidding – try it with iexplore.exe and you’ll see what I mean). In that respect, shame on the SxS team for setting a bad example with polluting the world with debug prints that are useless to most of us (and to a lesser extent, cdrom.sys also gets a “black lump of coal”, so to speak). Maybe these two components will be fixed for Srv08 RTM or Vista SP1 RTM, if we’re lucky.

So, take this opportunity to check and make sure that none of your drivers ship with DbgPrints turned on by default – even if you do happen to use DbgPrintEx. Somebody who has to debug a problem on a computer with your driver is installed will be all the more happier as a result.

Debugger tricks: API call logging, the quick’n’dirty way (part 3)

July 20th, 2007

Previously, I introduced several ways to use the debugger to log API calls. Beyond what was described in that article, there are some other, more complicated examples that are worth reviewing. Additionally, there are certain limitations that should be considered when using the debugger instead of a dedicated API logging program.

Although logging breakpoints like I’ve previously described (i.e. displaying function input parameters and return values) are certainly handy, you’ve probably already come up with a couple of scenarios where breakpoints in the style like I’ve provided won’t give you what you need to track down a problem.

The most notable example of this is when you need to examine an out parameter that is filled by a function call, after the function call is made. This provides a problem, as it’s generally not reliable to access the function parameters on the stack after the function call has returned (in both stack and register based calling conventions in use on Windows, the called function is free to modify the parameter locations as it sees fit, and this is actually fairly common with optimizations enabled). As a result, what we really need is the ability to save some state across the function call, so that we can access some of the function’s arguments after the function returns.

Fortunately, this is doable within the debugger, albeit in a rather roundabout way. The key here is the usage of so-called user-defined pseudo-registers, which are conceptually extra platform-independent storage locations (accessed like regular registers in terms of the expression evaluator, hence the term pseudo-register). These pseudo-registers are essentially just variables in the conventional programming sense, although there are a limited number of them available (20 in the current release). As a result, there are some limitations on what can be accomplished using them, but for most circumstances, 20 is enough. If you find yourself needing to track more state than that, you should strongly consider writing a debugger extension in C instead of using the debugger script language.

(As an aside, at Driver DevCon a couple of years ago, I remember sitting in on a WinDbg-oriented session in which the presenter was at one point going over a large program written in the (then-relatively-new) expanded debugger scripting language, with additional support for conditionals and error handling. I still can’t but help think of debugger-script programs as combining the ugliest parts of Perl with cmd.exe-style batch scripts (although to be fair, the debugger expression evaluator is a bit more powerful than batch scripts, and it was also never originally intended to be used for more than simple expressions). To be honest, I would still strongly recommend against writing highly complex debugger-script programs where possible; they are something of a maintenance nightmare, among other things. For such circumstances, writing a debugger extension (or a program to drive the debugger entirely) is a better choice. I digress, however; back to the subject of call logging.)

The debugger’s user-defined pseudo-register facility provides an effective (if perhaps slightly awkward) means of storing state, and this can be used to save parameter values across a function call. For example, we might want to log all calls to ReadFile, such that we want a dump of the file data being read in. To accomplish this task, we’ll need to dump the contents of the output buffer (and use the bytes transferred count, another out parameter). This could be accomplished like so (in this case, for brevitiy, I am assuming that the program is using ReadFile in synchronous I/O mode):

0:000> bp kernel32!ReadFile "r @$t0 = poi(@esp+8) ; r @$t1 = poi(@esp+10) ; g @$ra ; .if (@eax != 0) { .printf \"Read %lu bytes: \\n\", dwo(@$t1) ; db @$t0 ldwo(@$t1) } .else { .echo Read failed! ; !gle } ; g "

The output of this command might be like so:

Read 22 bytes: 
0016ec3c  54 68 69 73 20 69 73 20-61 20 74 65 78 74 20 66
              This is a text f
0016ec4c  69 6c 65 2e 0d 0a
              ile...

(Awkward wrapping done by me to avoid breaking the blog layout.)

This command is essentially a logical extension of yesterday’s example, with the addition of some state that is shared across the call. Specifically, the @$t0 and @$t1 user-defined pseudo-registers are used to save the lpBuffer ([esp+08h]) and lpNumberOfBytesRead ([esp+10h]) arguments to the ReadFile call across the function’s execution. When execution is stopped at the return address, the contents of the file data that were just read are dumped by dereferencing the values referred to by @$t0 and @$t1.

Although this sort of state-saving across execution can be useful, there are downsides. Firstly, this sort of breakpoint is fundamentally incompatible with multiple threads (at least in as much as multiple threads hitting the breakpoint in question simultaneously). This is because the debugger provides no provision for “expression-local”, or “thread-local” state – multiple threads hitting the breakpoint at the same time can step on eachothers toes, so to speak. (This problem can also occur with any sort of breakpoint that involves resuming execution until an implicit breakpoint created by a “g <address>” command, although it is arguably more severe with “stateful” breakpoints.)

This limitation in the debugger can be worked around in a limited fashion by making a breakpoint thread-specific via a thread specifier in the g command, although this is typically hardly convenient to do. Many call logging programs will account for multithreading natively and will not require any special work to accomodate multithreaded function calls. (Note that this problem is often not as severe as it might sound – in many cases, even in multithreaded programs, there is typically only one function that calls a function you’re interested in, or the liklihood of a thread collision is sufficiently small that it works anyway the vast majority of the time. However, in some circumstances, these style of breakpoints just do not work well if the function in question is called frequently from many threads and requires inspection of data after the function returns.)

Another significant limitation of using the debugger to do call logging (as opposed to a dedicated program) is that the debugger is typically very slow compared to a dediated program doing logging. The reason here is that for every breakpoint event, essentially all threads in the program are frozen, various state information is copied from the program to the debugger, and then the breakpoint expression is evaluated debugger-side. Additionally, unlike with a dedicated program, the results of the logging breakpoint are displayed in real time, instead of (say) being stored in a binary log buffer somewhere for later format and display. This means that even more overhead is incurred as the debugger UI needs to be updated on every breakpoint. As a result, if you set a conditional breakpoint on a frequently hit function, you may notice the program slow down significantly, perhaps even to the point of being unusable. Dedicated logging programs can employ a variety of techniques to circumvent these limitations of the debugger, which are primarily artifacts of the fact that the debugger is primarily designed to be a debugger and not a high-speed API monitor.

This is even more noticible in the kernel debugger case, as transitions to the debugger in in KD mode are very slow, such that even several transitions per second is enough to make a system all but unusable in practical terms. As a result, one needs to be extra careful in picking locations to set conditional logging breakpoints at in the kernel debugger (perhaps placing them in the middle of a function, in a specific interesting code path, rather than at the start so that all calls will be caught).

Given these limitations, it is worth doing a bit of analysis on the problem to determine if the debugger or a dedicated logging program is the best choice. Both approaches have strengths and weaknesses, and although the debugger is extremely flexible (and often very convenient), it isn’t necessarily the best choice in every conceivable scenario. In other words, use the best tool for the job. However, there are some circumstances where the only option is to use the debugger, such as kernel-mode call logging, so I would recommend at least having some basic knowledge of how to accomplish logging tasks with the debugger, even if you would normally always use a dedicated logging program. (Although, in the case of kernel mode debugging, again, the slowness of debugger transitions makes it important to pick “low-traffic” locations to breakpoint on.)

Still, an important part of being effective at debugging and solving problems is knowing your options and when (and when not) to use them. Using the debugger to perform call logging should just be one of many such options in your “debugging toolkit”.

Debugger tricks: API call logging, the quick’n’dirty way (part 2)

July 19th, 2007

Last time, I put forth the notion that WinDbg (and the other DTW debuggers) are a fairly decent choice for API call logging. This article expands on just how to do this sort of logging via the debugger, by starting out with a simple logging breakpoint and expanding on it to be more intelligent.

As previously mentioned, it is really not all that difficult to use the debugger to perform call logging. The basic idea involved is to just set a “conditional” breakpoint (e.g. via the bp command) at the start of a function you’re interested in. From there, the breakpoint can have commands to display input parameters. However, you can also get a bit more clever in some scenarios (e.g. displaying return values, values in output parameters, and the like), although there are some limitations to this that may or may not be a problem based on the characteristics of the program that you’re debugging.

To give a simple example of what I mean, there’s the classic “show all files opened via Win32 CreateFile as they are opened”. In order to do this, the way to go is to set a breakpoint on kernel32!CreateFileW. (Remember that most of the “A” Win32 APIs thunk to the “W” APIs, so you can often set a breakpoint on just the “W” version to get both. Of course, this is not always true (and some bizzare APIs like WinInet actually thunk “W” to “A”), but as a general rule of thumb, it’s more often the case than not.) The kernel32 breakpoint needs to be imbued with the knowledge of how to display the first argument based on the calling convention of the routine in question. Since CreateFile is __stdcall, that would be [esp+4] (for x86), and rcx (for x64).

At it’s most basic, the breakpoint command might look like so:

0:001> bp kernel32!CreateFileW "du poi(@esp+4) ; gc"

(Note that the gc command is similar to g, except that it is designed especially for use in conditional breakpoints. If you trace into a breakpoint that controls execution with gc, it will resume executing the same way the user was controlling the program instead of unconditionally resuming normally. The difference between a breakpoint using g and one using gc is that if you trace into a gc breakpoint, you’ll trace to the next instruction, whereas if you trace into a g breakpoint, control will resume full speed and you’ll lose your place.)

The debugger output for this breakpoint (when hit) lists the names passed to kernel32!CreateFileW, like so (if I were setting this breakpoint in cmd.exe, and then did “type C:\readme.txt”, this might come up in the debugger output):

00657ff0  "C:\\readme.txt"

(Note that as the breakpoint displays the string passed to the function, it will be a relative path if the program uses the relative path.)

Of course, we can do slightly more complicated things as well. For instance, it might be a good idea to display the returned handle and the last error code. This could be done by having the breakpoint go to the return point of the function after it dumps the first parameter, and then display the additional information. To do this, we might use the following breakpoint:

0:001> bp kernel32!CreateFileW "du poi(@esp+4) ; g @$ra ; !handle @eax f ; !gle ; g"

The gist of this breakpoint is to display the returned handle (and last error status) after the function returns. This is accomplished by directing the debugger to resume execution until the return address is hit, and then operate on the return value (!handle @eax f) and last error status (!gle). (The @$ra symbol is a pseudo-register that refers to the current function’s return address in a platform-independent fashion. Essentially, the g @$ra command runs the program until the return address is hit.)

The output from this breakpoint might be like so:

0016f0f0  "coffbase.txt"
Handle 60
  Type         	File
  Attributes   	0
  GrantedAccess	0x120089:
         ReadControl,Synch
         Read/List,ReadEA,ReadAttr
  HandleCount  	2
  PointerCount 	3
  No Object Specific Information available
LastErrorValue: (Win32) 0 (0) - The operation
  completed successfully.
LastStatusValue: (NTSTATUS) 0 - STATUS_WAIT_0

However, if we fail to open the file, the results are less than ideal:

00657ff0  "c:\\readme.txt"
Handle 4
  Type         	Directory
[...] enumeration of all handles follows [...]
21 Handles
Type           	Count
Event          	3
File           	4
Directory      	3
Mutant         	1
WindowStation  	1
Semaphore      	2
Key            	6
Thread         	1
LastErrorValue: (Win32) 0x2 (2) - The system
  cannot find the file specified.
LastStatusValue: (NTSTATUS) 0xc0000034 -
  Object Name not found.

What went wrong? Well, the !handle command expanded into essentially “!handle -1 f“, since CreateFile returned INVALID_HANDLE_VALUE (-1). This mode of the !handle extension enumerates all handles in the process, which isn’t what we want. However, with a bit of cleverness, we can improve upon this. A second take at the breakpoint might look like so:

0:001> bp kernel32!CreateFileW "du poi(@esp+4) ; g @$ra ; .if (@eax != -1) { .printf \"Opened handle %p\\n\", @eax ; !handle @eax f } .else { .echo Failed to open file, error: ; !gle } ; g"

Although that command might appear a bit intimidating at first, it’s actually fairly straightfoward. Like with the previous version of this breakpoint, the essence of what it accomplishes is to display the filename passed to kernel32!CreateFileW, and then resumes execution until CreateFile returns. Then, depending on whether the function returned INVALID_HANDLE_VALUE (-1), either the handle is displayed or the last error status is displayed. The output from the improved breakpoint might be something like this (with an example of successfully opening a file and then failing to open a file):

Success:

0016f0f0  "coffbase.txt"
Opened handle 00000060
Handle 60
  Type         	File
  Attributes   	0
  GrantedAccess	0x120089:
         ReadControl,Synch
         Read/List,ReadEA,ReadAttr
  HandleCount  	2
  PointerCount 	3
  No Object Specific Information available

Failure:

00657ff0  "C:\\readme.txt"
Failed to open file, error:
LastErrorValue: (Win32) 0x2 (2) - The system
  cannot find the file specified.
LastStatusValue: (NTSTATUS) 0xc0000034 -
  Object Name not found.

Much better. A bit of intelligence in the breakpoint allowed us to skip the undesirable behavior of dumping the entire process handle table in the failure case, and we could even skip displaying the last error code in the success case.

As one can probably imagine, there’s a whole other range of possibilities here once one considers the flexbility offered by conditional breakpoints. However, there are some downsides with this approach that must be considered as well. More on a couple of other more advanced condition breakpoints for logging purposes in a future posting (as well as a careful look at some of the limitations and disadvantages of using the debugger instead of a specialized program, and some “gotchas” you might run into with this sort of approach).

Debugger tricks: API call logging, the quick’n’dirty way (part 1)

July 18th, 2007

One common task that you may be faced with while debugging a problem is to log information about calls to a function or function(s). While if you want to know about a function in your program that you have source code to, you could often just add some sort of debug print and rebuild the program, sometimes this isn’t practical. For example, you might not always be able to reproduce a problem and so it might not viable to have to restart with a debug-ified build because you might blow away your repro. Or, more importantly, you might need to log calls to functions that you don’t have source code to (or aren’t building as part of your program, or otherwise don’t want to modify).

For example, you might want to log calls to various Windows APIs in order to gain information about a problem that you are troubleshooting. Now, depending on what you’re doing, you might be able to do this by adding debug prints before and after every single call to the particular API. However, this is often less than convenient, and if you aren’t the immediate caller of the function you want to log, then you’re not going to be able to take that route anyway.

There are a number of API spy/API logging packages out there (and the Debugging Tools for Windows distribution even ships with one, called Logger, though it tends to be fairly fragile – personally, I’ve had it crash out on me more often than I’ve had it actually work). Although you might be able to use one of those, a big limitation of “shrink-wrapped” logging tools is that they won’t know how to properly log calls to custom functions, or functions that are otherwise not known to the logging tool. The better logging tools out there are user-extensible to a certain extent, in that they typically provide some sort of scripting- or programmming- language that allows the user (i.e. you) to describe function parameters and calling conventions, so that they can be logged.

However, it can often be difficult (or even impossible) to describe many types of functions to these tools – such as functions that contain pointers to structures that contain pointers to other structures, or other such non-trivial constructs. As a result, for many circumstances, I tend to recommend to not use so-called “shrink-wrapped” API logging tools in situations where I want to log calls to functions.

In the event that it’s not a feasible solution to implement debug prints in source code, though, it would appear on the surface that this leaves one without a usable solution for logging calls. Not so, in fact – it turns out that with some careful use of so-called “conditional breakpoints”, you can often use the debugger (e.g. WinDbg/ntsd/cdb/kd, which is what I shall be referring to for the rest of this article) to provide this sort of call logging. Using the debugger has many advantages; for instance, you can do this sort of API logging “on the fly”, and in situations where you can attach the debugger after the process has started, you don’t even need to start the program specially in order to log it. Even better, however, is that the debugger has extensive support for displaying data in meaningful forms to the user.

If you think about it, displaying data to the user is one of the prinicpal functions of the debugger, in fact. It’s also one of the major reasons why the debugger is highly extensible via extensions, such that complicated data structures can be displayed and interpreted in a meaningful fashion. By using the debugger to perform your API logging, you can take advantage of the rich functionality for displaying data that is already baked into the debugger (and its extensions, and even any custom extensions of your own that you have written) to double as a call logging facility.

Even better, because the debugger can read and display many data types in a meaningful fashion based off of symbol files (if you have private symbols, such as for programs you compile or provide), for data types that don’t have specific debugger extensions for displaying them (like !handle, !error (for error codes), !devobj, and soforth), you can often utilize the debugger’s ability to format data based off of type information in symbols. This is typically done via the dt command, and often provides a workable display for most custom data types without having to do any sort of complicated “training” like you might have to do with a logging program. (Some data structures, such as trees and lists may need some more intelligence than what is provided in dt for displaying all parts of the data structure. This is typically true for “container” data types, although even for those types, you can still often use dt to display actual members within the container in a meaningful fashion.) Utilizing the information contained within symbols files (via the debugger) for API logging also frees you from having to ensure that your logging program’s definitions for all of your structures and other types are in synch with the program you are debugging, as the debugger automagically receives the correct definitions based on symbols (and if you are using a symbol server that includes indexed versions of your own internal symbols, the debugger will even be able to find the symbols on its own).

Another plus to this approach is that, provided you are reasonably familiar with the debugger, you probably won’t have to learn a new description language like you might if you were using an API logging program. This is because you’re probably already familiar with many of the commands the debugger makes available for displaying data, from every-day debugger usage. (Even if you aren’t all that familiar with the debugger, there is extensive documentation that ships with the debugger by default which describes how to format and display data via various debugger commands. Additionally, there are many examples describing how to use most of the important or useful debugger commands out there on the Internet.)

Okay, enough about why you might want to consider using the debugger to perform call logging. Next time, a quick look and walkthrough describing how you can do this (it’s really quite simple, as alluded to previously), along with some caveats and gotchas that you might want to watch out for along the way.

Silly debugger tricks: Using KD to reset a forgotten administrator password

July 11th, 2007

One particularly annoying occurance that’s happened to me on a couple of occasions is losing the password to a long-forgotten test VM that I need to thaw for some reason or another, months from the last time I used it. (If you follow good password practices and use differing passwords for accounts, you might find yourself in this position.)

Normally, you’re kind of sunk if you’re in this position, which is the whole idea – no administrator password is no administrative access to the box, right?

The officially supported solution in this case, assuming you don’t have a password reset disk (does anyone actually use those?) is to reformat. Oh, what fun that is, especially if you just need to grab something off of a test system and be done with it in a few minutes.

Well, with physical access (or the equivalent if the box is a VM), you can do a bit better with the kernel debugger. It’s a bit embarassing having to “hack” (and I use that term very loosely) into your own VM because you don’t remember which throwaway password you used 6 months ago, but it beats waiting around for a reformat (and in the case of a throwaway test VM, it’s probably not worth the effort anyway compared to cloning a new one, unless there was something important on the drive).

(Note that as far as security models go, I don’t really think that this is a whole lot of a security risk. After all, to use the kernel debugger, you need physical access to the system, and if you have that much, you could always just use a boot CD, swap out hard drives, or a thousand other different things. This is just more convenient if you’ve got a serial cable and a second box with a serial port, say a laptop, and you just want to reset the password for an account on an existing install.)

This is, however, perhaps an instructive reminder in how much access the kernel debugger gives you over a system – namely, the ability to do whatever you want, like bypass password authentication.

The basic idea behind this trick is to use the debugger to disable the password cheeck used at interactive logon inside LSA.

The first step is to locate the LSA process. The typical way to do this is to use the !process 0 0 command and look for a process name of LSASS.exe. The next step requires that we know the EPROCESS value for LSA, hence the enumeration. For instance:

kd> !process 0 0
**** NT ACTIVE PROCESS DUMP ****
PROCESS fffffa80006540d0
    SessionId: none  Cid: 0004    Peb: 00000000
      ParentCid: 0000
    DirBase: 00124000  ObjectTable: fffff88000000080
      HandleCount: 545.
    Image: System
[...]
PROCESS fffffa8001a893a0
    SessionId: 0  Cid: 025c    Peb: 7fffffda000
     ParentCid: 01ec
    DirBase: 0cf3e000  ObjectTable: fffff88001b99d90
     HandleCount: 822.
    Image: lsass.exe

Now that we’ve got the LSASS EPROCESS value, the next step is to switch to it as the active process. This is necessary as we’re going to need to set a conditional breakpoint in the context of LSA’s address space. For this task, we’ll use the .process /p /r eprocess-pointer command, which changes the debugger’s process context and reloads user mode symbols.

kd> .process /p /r fffffa8001a893a0
Implicit process is now fffffa80`01a893a0
.cache forcedecodeuser done
Loading User Symbols
.....

Next, we set up a breakpoint on a particular internal LSA function that is used to determine whether a given password is accepted for a local account logon. The breakpoint changes the function to always return TRUE, such that all local account logons will succeed if they get to the point of a password check. After that, execution is resumed.

kd> ba e1 msv1_0!MsvpPasswordValidate
   "g @$ra ; r @al = 1 ; g"
kd> g

We can dissect this breakpoint to understand better just what it is doing:

  • Set a break on execute hardware breakpoint on msv1_0!MsvpPasswordValidate. Why did I use a hardware breakpoint? Well, they’re generally more reliable when doing user mode breakpoints from the kernel debugger, especially if what you’re setting a breakpoint on might be paged out. (Normal breakpoints require overwriting an instruction with an “int 3”, whereas a hardware breakpoint simply programs an address into the processor such that it’ll trap if that address is accessed for read/write/execute, depending on the breakpoint type.)
  • The breakpoint has a condition (or command) attached to it. Specifically, this command runs the target until it returns from the current function (“g @$ra” continues the target until the return address is hit. @$ra is a special platform-independent psueod-register that refers to the return address of the ccurrent function.) Once the function has returned, the al register is set to 1 and execution is resumed. This function returns a BOOLEAN value (in other words an 8-bit value), which is stored in al (the low 8 bits of the eax or rax register, depending on whether you’re on x86 or x64). IA64 targets don’t store return values in this fashion and so the breakpoint is x86/x64-specific.

Now, log on to the console. Make sure to use a local account and not a domain account, so the authentication is processed by the Msv1_0 package. Also, non-console logons might not run through the Msv1_0 package, and may not be affected. (For example, Network Level Authentication (NLA) for RDP in Vista/Srv08 doesn’t seem to use Msv1_0, even for local accounts. The console will still allow you to log in, however.)

From there, you can simply reset the password for your account via the Computer Management console. Be warned that this will wipe out EFS keys and the like, however. To restore password checking to normal, either reboot the box without the kernel debugger, or use the bc* command to disable the breakpoint you set.

(For the record, I can’t really take credit for coming up with this trick, but it’s certainly one I’ve found handy in a number of scenarios.)

Now, one thing that you might take away from this article, from a security standpoint, is that it is important to provide physical security for critical computers. To be honest, if someone really wants to access a box they have physical access to, this is probably not even the easist way; it would be simplere to just pop in a bootable CD or floppy and load a different operating system. As a result, as previously mentioned, I wouldn’t exactly consider this a security hole as it already requires you to have physical access in order to be effective. It is, however, a handy way to reset passwords for your own computers or VMs in a pinch if you happen to know a little bit about the debugger. Conversely, it’s not really a supported “solution” (more of a giant hack at best), so use it with care (and don’t expect PSS to bail you out if you break something by poking around in the kernel debugger). It may break without warning on future OS versions (and there are many cases that won’t be caught by this trick, such as domain accounts that use the Kerberos provider to process authentication).

Update: I forgot to mention the very important fact that you can turn on the kernel debugger from the “F8” boot menu when booting the system, even if you don’t have kernel debugging enabled in the boot configuration or boot.ini. This will enable kernel debugging on the highest numbered COM port, at 19200bps. (On Windows Vista, this also seems capable of auto-selecting 1394 if your machine had a 1394 port, if memory serves. I don’t know offhand whether that translates to downlevel platforms, though.)

The beginning of the end of the single-processor era

July 10th, 2007

I came across a quote on CNet that stuck with me yesterday:

It’s hard to see how there’s room for single-core processors when prices for nearly half of AMD’s dual-core Athlon 64 X2 chips have crept well below the $100 mark.

I think that that this sentiment is especially true nowadays (at least for conventional PC-style computers – not counting embedded things). Multiprocessor (at least pseudo-multiprocessor, in the form of Intel’s HyperThreading) has been available on end-user computers for some time now. Furthermore, full multiprocessor, in terms of multi-core chips, is now mainstream. What I mean by that is that by now, most of that computers you’ll get from Dell, Best Buy, and the likes will be MP, whether via HyperThreading or multi-core.

To give you an idea, I recently got a 4-way server (a single quad core chip) recently, for ~$2300 or so (though it was also reasonably equipped other than in the CPU department). At work, we got an 8-way box (2x dual core chips) for under under ~$3000 or so as well, for running VMs for our quality assurance department. Just a few years ago, getting an 8-way box “just like that” would have been unheard of (and ridiculously expensive), and yet here we are, with medium-level servers that Dell ships coming with that kind of multiprocessing “out of the box”.

Even laptops are coming with multicore chips in today’s day and age, and laptops have historically not been exactly performance leaders due to size, weight, and battery life constraints. All but the most entry-level laptops Dell ships nowadays are dual core, for instance (and this is hardly limited to Dell either; Apple is shipping dual-core Intel Macs as well for their laptop systems, and has been for some time in fact.)

Microsoft seems to have recognized this as well; for instance, there is no single processor kernel shipping with Windows Vista, Windows Server 2008, or future Windows versions. That doesn’t mean that Windows doesn’t support single processor systems, but just that there is no longer an optimized single processor kernel (e.g. replacing spinlocks with a simple KeRaiseIrql(DISPATCH_LEVEL) call) anymore. The reason is that for new systems, of which are expected to be the vast majority of Vista/Server 2008 installs, multiprocessing capability is just so common that it’s not worth maintaining a separate kernel and HAL just for the odd single processor system’s benefit anymore.

What all this means is that if as developers, you haven’t been really paying attention to the multiprocessor scene, now’s the time to start – it’s a good bet that within a few years, even on very low end systems, single processor boxes are going to become very rare. For intensive applications, the capability to take advantage of MP is going to start being a defining point now, especially as chip makers have realized that they can’t just indefinitely increase clock rates and have accordingly began to transition to multiprocessing as an alternative way to increase performance.

Microsoft isn’t the only company that’s taking notice of MP becoming mainstream, either. For instance, VMware now fully supports multiprocessor virtual machines (even on its free VMware Server product), as a way to boost performance on machines with true multiprocessing capability. (And to their credit, it’s actually not half-bad as long as you aren’t loading the processors down completely, at which point it seems to turn into a slowdown – perhaps due to VMs competing with eachother for scheduling while waiting on spinlocks, though I didn’t dig in deeper.)

(Sorry if I sound a bit like Steve when talking about MP, but it really is here, now, and now’s the time to start modifying your programs to take advantage of it. That’s not to say that we’re about to see 100+ core computers becoming mainstream tommorow, but small-scale multiprocessing is very rapidly becoming the standard in all but the most low cost systems.)

Reversing the V740, part 4: Implementing a solution

July 6th, 2007

In the previous post in this series, I described some of the functionality in place in the V740’s abstraction module for the Verizon connection manager app, and the fact that as it was linked to a debug build of the Novatel SDK, reversing relevant portions of it would be (relatively) easy (especially due to the numerous debug prints hinting at function names throughout the module).

As mentioned last time, while examining the WmcV740 module, I came across some functions that appeared as if they might be of use (and one in particular, Diag_Call_End, that assuming my theory panned out, would instruct the device to enter dormant mode – and from there, potentially reacquiring an EVDO link if available).

However, several obstacles remained in the way: First, there were no callers for this function in particular, potentially complicating the process of determining valid input arguments if the purpose of any arguments were not immediately obvious from the function implementation. Second, the function in question wasn’t in the export table of the DLL, so there existed no (clean) way to resolve its address.

The first problem turned out to be a fairly trivial one, as very basic analysis of the function determined that it didn’t even take any arguments. It does use some global state, however that global state is already initialized by exported functions to initialize the abstraction layer module, meaning that the function itself should be fairly straightforward to use.

From an implementation standpoint, the function looked much like many of the other diagnostics routines shipped with the Novatel SDK. The function is essentially a very thin wrapper around the communications protocol used to talk to the device firmware, and doesn’t really add a lot of “value” on top of that, other than managing the transmission of a request to the firmware and the reception of a response. In pseudocode, the function is roughly laid out as follows:

Diag_Call_End()
{
  DebugPrint(severity, "Diag_Call_End: Begin\\n");

  acquire-lock;
  pre-send-serial-port-setup;

  initialize-packet;

  //
  // Set the packet opcode.  All other
  // packet parameters are defaults.
  //
  TxPacket.Cmd = NVTL_CMD::DIAG_CALL_END;

  //
  // Transmit the request to the firmware
  //
  Error = Diag_Send_Tx_Packet(&TxPacket, PACKET_SIZE);

  if (Error)
   handle-error;

  //
  // Receive the response.
  //
  Error = GetResponse(NVTL_CMD::DIAG_CALL_END);

  if (Error)
   handle-error;

  if (Bad response format)
   handle-error;

  //
  // Clean up and return
  //
  post-send-serial-post-cleanup;
  free-memory(response-buf);

  release-lock;

  DebugPrint(severity, "Diag_Call_End: "
   "End: RetVal: %d\\n", return-status);
  return success;
}

It’s a good thing the debug prints were there, as there isn’t really anything to go on besides them. All this function does, from the perspective of the code in the DLL, is simply set up a (very simple) request packet, send it to the firmware, receive the response, and return to the caller. This same structure is shared by most of the other Diag_* functions in the module which communicate to the firmware; in general, all those functions do is translate C arguments into the over-the-wire protocol, call the functions to send the packet and wait for a response, and then unpackage the response data back into return data for the C caller (if applicable). The firmware is responsible for doing all the real work behind the requests. Putting it another way, think of the SDK functions embedded in the WMC module as RPC stubs, the driver that creates the virtual serial port as the RPC runtime library, and the firmware on the card as the RPC server (although the whole protocol and data repackaging process is far simpler than RPC).

Now, because most (really, all) of the logic for implementing particular requests resides in the device firmware on the card, the actual implementation is for the most part a “black box” – we can see the interface (and sometimes have examples for how it is called, if a certain SDK function is actually used), but we can’t really see what a particular request will do, other than observe side effects that the client code calling that function (if any) appears to depend upon.

Normally, that’s a pretty unpleasant situation to be in from a reversing standpoint, but the debug prints at least give us a fighting chance here. Thanks to them, as opposed to an array of un-named functions that send different unknown bitpatterns to an opaque firmware interface, at least we know what a particular firmware call wrapper function is ostensibly supposed to do (assuming the name isn’t too cryptic – we’ll probably never know what some things like “nw_nw_dtc_sms_so_get” actually refer to exactly).

Back to the problem at hand, however. After analyzing Diag_Call_End, it’s pretty clear that the function doesn’t take any arguments and simply returns an error code (or success) indicator to the caller. All of the global state depended upon by the function is the “standard stuff” that is shared by anything using the firmware comms interface, including functions that we can observe being called indirectly by the connection manager app, so it’s a good bet that we should be able to just call the function and see what happens.

However, there’s still the minor snag relating to the fact that Diag_Call_End isn’t exported from WmcV740.dll. There are a couple of different approaches that we could take to try and solve this problem, with varying degrees of complexity, depending on our requirements. For example, in an attempt to provide some level of automatic compatibility with future (or previous) releases, we might implement some kind of code fingerprinting that could be used to scan code in the DLL to look for the start of this particular function. In this instance, however, I decided it wasn’t really worth the trouble; for one, WmcV740.dll is fairly well self-contained and doesn’t depend on anything other than the driver to set up the virtual serial port (and the device, of course), and from examining debug prints in the DLL, it became clear that it was designed to support multiple firmware revisions (and even multiple devices). Given this, it seemed an acceptable limitation to tie a program to this particular version of WmcV740.dll and trust that it will remain backwards/forwards compatible enough with any device firmware updates I apply (if any). Because the DLL is self-contained, the connection manager software could even conceivibly be updated after placing a copy of the DLL in a different location, since it isn’t tied into the rest of the connection manager software in any meaningful way.

As a result of these factors, I settled on just hardcoding offsets from the module base address to the start of the function in question that I wanted to call. Ugly, yes, but in this particular instance, it seemed like the most reasonable compromise. Recall that in Win32, the HMODULE value returned by LoadLibrary is really the base address of a given module, making it trivially easy to locate a loaded module base address in-memory. From there, it was just a matter of adding the offsets to the module base to form complete pointer values, casting these to function points, and making the call.

After all of that, all that’s left is to try the function out. This involves loading the WMC module, calling a standard export, WMC_Startup to initialize it, and then just making the call to the non-exported Diag_Call_End.

As luck would have it, the function call did exactly what I had hoped – it caused the device to enter dormant mode if there was an active data session. The next time link activity occured, the call was picked back up (and if the call had failed over to 1xRTT and an EVDO link could be re-acquired, the call would be automatically upgraded back to EVDO). Not quite as direct as simply commanding the card to re-scan for EVDO, but it did get the job done, if in a slightly round-about fashion.

From there, all that remained was to add an automated component to this – periodically ask the card whether it was in 1xRTT or EVDO mode, and if the latter, push the metaphorical “end call” button every so often to try and coax the card into switching over to EVDO. This information is readily available via the standard WMC abstraction layer (which was fairly well understood at this point), albeit with a caveat: The card appears to not even try to scan for an EVDO link after it has failed over to 1xRTT (or if it does, it doesn’t make this fact known to anything on the other end of the firmware comms interface as far as I could tell), meaning that it’s not easy to distinguish between the device being in 1xRTT mode due to there really being no EVDO coverage locally, period, or because you went under a bridge/into an elevator/whatever for a moment and temporarily lost signal, and the device picked the wrong network up when it re-acquired signal.

Still, all things considered, the solution is workable (if a major hack in terms of architecture). For those in a similar predicament, I’ve posted the program that I wrote to periodically try to re-acquire an EVDO link based on the information I arrived at while working on this series. It’s a console app that will display basic signal strength statistics over time, and will (as previously mentioned) automatically place the device into dormant mode every so often while you’re on 1xRTT, in an attempt to re-acquire EVDO access after a signal loss event. To use it, you’ll need the VC2005 SP1 CRT installed, and you’ll also need WmcV740.dll version 1.0.6.6 (exact match required for the dormant mode functionality to operate), which comes with the current version of VZAccess Manager for the V740 for Windows Vista (at the time of this writing, that’s 6.1.8). Other versions may work if they include the exact same version of WmcV740.dll. You’ll need to place WmcV740.dll in the same directory as wwanutil.exe for it to function, or it’ll bail out when it can’t load the module. Also, only one program can talk to the V740’s firmware communication port at a time, which means that while you are running wwanutil, you can’t run VZAccess (or any other program that tries to talk to the V740’s firmware communication port – if you try to start wwanutil while VZAccess is using the card, you’ll get error 65, and likewise, if you try to start VZAccess while wwanutil is running, VZAccess will complain that something else is using the device). You can still dial the connection manually via Windows DUN, however – the “AT” modem port is unaffected.

Of course, software considerations aside, you’ll also need a V740 (otherwise known as a Merlin X720) ExpressCard as well, with a corresponding service provider plan. (As far as I can tell, the Sprint and Verizon Novatel Rev.A ExpressCards are all rebranded Novatel Merlin X720’s and should be functionally identical, but as I am not a Sprint customer, I can’t test that.) Theoretically, the WmcV740 module supports other Novatel devices, but I haven’t tested that either (I suspect that the protocol used to talk to the firmware is actually a generic Qualcomm chipset diagnostics protocol that may function across other manufacturers – it sure seems to be very similar to the protocol that BitPim uses to talk to many Qualcomm phones, for instance – but the Wmc module will only detect Novatel devices). Also, given that the program is calling undocumented functions in the device’s firmware control interface, I’d recommend against trying it out on every single device you can get your hands on, just to be on the safe side. Although the module is theoretically smart enough to detect whether it’s really talking to a Novatel device of a sufficiently high firmware/protocol revision, or something else, I can’t help you if you somehow manage to brick your card with it (though I don’t see how you’d possibly do that with the program, just covering all the bases…). The usual disclaimers apply: no warranty provided (this program is provided “as-is”), and I can’t provide support for your device or add support for (insert X random other device here).

If you hit Alt-2 while the wwanutil console window is up, you’ll get some statistics akin to the field test mode available in VZAccess manager, although I can’t guarantee that the FTM option in VZAccess was actually accurate (or tell you how to interpret many of the fields). Since the verbose display is based on the same information as the connection manager GUI, it is probably just as accurate (or inaccurate) as the normal RTM display, though perhaps in a more readable format. Alt-3 will also display a log of recent connection events (Alt-1 to return to the main screen), and you can use the Ctrl-D keystroke combination at any of the screens to manually force the device into dormant mode (though it may immediately pick back up into active mode if there is link activity, just as if you hit “end call” on a tethered handset and the link was still active).

With a workable solution for my original predicament found, this wraps up the V740 series (at least for now…). Hopefully, at some point, support for things like periodically auto-reacquiring EVDO might find itself into the stock connection manager software, but for now, this will have to do.

Reversing the V740, part 3: The V740 abstraction layer module

July 4th, 2007

Last time, I described (at a high level) the interface used by the Verizon connection manager to talk to the V740, and how it related to my goal of coercing the V740 to re-upgrade to EVDO after failing over to 1xRTT, without idling the connection long enough for dormant mode to take over.

As we saw previously, it turned out that the V740’s abstraction layer module (WmcV740.dll) was static linked to a debug version of a Novatel SDK that encapsulated the mechanism to talk to the V740’s firmware and instruct the device to perform various tasks.

Now, the WmcV740.dll module contains code that is specific to the V740 (or at least specific to Novatel devices), which knows how to talk to a V740 that is connected to a computer. Internally, the way this appears to work is that the Novatel driver creates a virtual serial port (conveniently named Novatel Wireless Merlin CDMA EV-DO Status Port (COMxx)), which is then used by the DLL to send and receive data from the firmware interface. In other words, the virtual serial port is essentially a “backdoor” control channel to talk to the firmware, separate from the dial-up modem aspect of the device that is also presented by the driver in the form of a modem device that can be controlled via standard “AT” commands. (The advantage to taking this approach is that a serial port is typically only accessible by one program at a time, and if one is using the standard RAS/dial-up modem support in Windows to dial the connection, then this precludes being able to use the modem serial port to perform control functions on the device, as it’s already being used for the data session.)

By simply looking at the debug print strings in the binary, it’s possible to learn a fair amount at what sort of capabilities the SDK functions baked into the binaries might hold. Most functions contained at the very least a debug print at the start and end, like so:

Diag_Get_Time   proc near

push    ebp
mov     ebp, esp

[...]

 ; "Diag_Get_Time: Begin\\n"
push    offset aDiag_get_timeB
push    2               ; int
call    DebugPrint
add     esp, 8
cmp     IsNewFirmware, 0
jnz     short FirmwareOkay1

cmp     FirmwareRevision, 70h
jnb     short FirmwareOkay1

 ; "Diag_Get_Time: End: Old Firmware\\n"
push    offset aDiag_get_timeE
push    2               ; int
call    DebugPrint
add     esp, 8
mov     ax, 14h
jmp     retpoint

[...]

Although most of the routines named in debug prints didn’t appear all that relevant to what I was planning on doing, at least one stood out as worth further investigation:

strings WmcV740.dll | find /i "Begin"
[..]
Diag_ERI_Clear: Begin
Diag_Call_Origination: Begin
Diag_Call_End: Begin
Diag_Read_DMU_Entry: Begin
[...]

(Strings is a small SysInternals utility to locate printable strings within a file.)

In particular, the Diag_Call_End routine looked rather interesting (recall that at least for my handset, pressing the “end call” button on the device while a data connection is active places it into dormant mode). If this routine performed the same function as my handset’s “end call” button, and if “ending” the data call also put the V740 into dormant mode (such that it would re-select which network to use), then it just might be what I was looking for. There were several other functions that looked promising as well, though experimentation with them proved unhelpful in furthering my objective in this particular instance.

At this point, there were essentially two choices: Either I could try and re-implement the code necessary to speak the custom binary protocol used by WmcV740.dll to communicate with the device, or I could try and reuse as much of the already-existing code in the DLL as possible to achieve the task. There are trade-offs to both approaches; reimplementing the protocol would allow me to bypass any potential shortcomings in the DLL (it actually turns out that the DLL is slightly buggy, and the commands to reset power on the device will result in heap corruption on the process heap – ick! – fortunately, those functions were not required for what I wanted to do). Additionally, with a “from-scratch” implementation, assuming that I was correct about my theory that the the driver’s virtual serial port is just a pass-through to the firmware, such a re-implementation would possibly be more portable to other platforms for which Novatel doesn’t supply drivers (and might be extendable to do things that the parts of the SDK linked to WmcV740.dll don’t have support for).

However, taking the approach of reimplementing a client for the firmware control interface also would require a much greater investment of time (while not extremely complicated, extra work would need to be done to reverse enough of the protocol to get to the point of sending the desired commands as performed by the “Diag_Call_End” function that we’re interested in) compared to simply reusing the already existing protocol communications code in the DLL. Furthermore, reimplementing the protocol client from scratch carries a bit more risk here than if you were just reverse engineering a network protocol, because instead of talking to another program running on a conventional system, in this case, I would be talking to the firmware of a (non-serviceable-by-me) device. In other words, if I managed to botch my protocol client in such a way as to cause the firmware on the device to do something sufficiently bad as to break the device entirely, I’d have a nice expensive paperweight on my hands (without knowing a whole lot more about the firmware and the device itself, it’s hard to predict what it’ll do when given bad input, or the like – although it might well be quite robust against such things, that’s still a relatively risky decision to make, because it could just as well fall over and die in some horribly bad way on malformed input). Not fun, by any means, that. To add to that, at this point, I didn’t even know if the Diag_Call_End function would pan out at all, so if I spent all the time to go the full nine yards just to try it out, I might be blowing a lot of effort on another dead end.

Given that, I decided to go the more conservative route of trying to use the existing code in WmcV740.dll, at least initially. (Although I did research the actual protocol a fair bit after the fact, I didn’t end up writing my own client for it, merely just reusing what I could from the WmcV740.dll module). However, there’s a minor sticking point here; the DLL doesn’t provide any way to actually reach the code in Diag_Call_End that is externally visible to code outside the module. In other words, there are no exported functions that lead to Diag_Call_End. Actually, the situation was even more grim than that; after a bit of analysis by IDA, it became immediately clear that there weren’t any callers of Diag_Call_End present in the module, period! That meant that I wouldn’t have a “working model” to debug at runtime, as I did with the interface that WmcV740.dll exports for use by the Verizon connection manager GUI.

Nonetheless, the problem is not exactly an insurmountable one, at least not if one is willing to get their hands dirty and use some rather “under-handed tricks”, assuming that we can afford to tie ourselves to one particular version of WmcV740.dll.

Next time: Zeroing in on Diag_Call_End as a potential solution.

Reversing the V740, part 2: Digging deeper: The connection manager software

July 3rd, 2007

Continuing from the previous article, the first step in determining a possible way to kick the V740 back into EVDO mode is to understand how the Verizon connection manager software interfaces with it. In order to do this, I used WinDbg and IDA for debugging and code analysis, respectively.

The software that Verizon ships presents a, uninform user interface regardless of what type of device you might be using. This means that it likely has some sort of generic abstraction layer to communicate with any connected devices (as the connection manager software supports a wide variety of cards and tethered phones, from many different manufacturers).

Indeed, it turns out that there is an abstraction layer, in the term of a set of DLLs present in the connection manager installation directory, which are named based off of the device they support (e.g. WMCLGE_VX8700.dll, WMC_NOK_6215i.dll, WmcV740.dll). These DLLs implement a high level interface in the form of a (fairly simple) exported C API, which is then used by the connection manager software to control the connected device.

The abstraction layer is fairly generic across devices, and as such does not support a whole lot of advanced device-specific features. However, it does provide support for operations like querying information about the network and over the air protocols in use, link quality (e.g. signal levels), sending and receiving SMS messages, manipulating the phone book (for some devices, not the V740 as it would have it), powering on and off the device, performing over the air activation, and a handful of other operations. In turn, each device-specific module translates requests from the standard exported C API into calls to their associated devices, using whatever device-specific mechanism that device offers for communication with the computer. This is typically done by building a sort of adapter from the connection manager’s expected C API interface to a vendor-specific library or SDK for communicating with each device.

As most of the potentially APIs in question have only one or two parameters (and could be easily inspected on the fly by debugging the connection manager) it was fairly trivial to gain a basic working understanding of the device abstraction layer API. The approach that I took to accomplish this was to simply set a breakpoint on all of the exported functions (in WinDbg, bm wmcv740!*), and from there, record which functions were invoked by what parts of the connection manager GUI and inspect in/out parameter values, in conjunction with analysis in IDA. Although there are some fairly tantalyzingly named functions (e.g. WMC_SetPower, WMC_SetRadioPower), none of the APIs in question turned out to do just what I wanted. (The calls to set low power mode / normal mode / on / off for the device cause any active PPP dialup connection over the device to be reset, thereby ruling them out as useful for my purposees.)

The fact that none of the APIs did what I set out to look for (though I did determine how to do some interesting things like determine signal strengths and what sort of coverage is present) is not all that surprising, as there doesn’t appear to be any sort of support for forcing a device to change protocols on-the-fly in the connection manager app (or support for forcing a dormant mode transition). Because the abstraction layer essentially implements only the minimum amount of functionality for the connection manager software to function, extra device-specific functionality that isn’t used by the connection manager isn’t directly supported by the standard exported C API. All of this essentially translates into there being no dice on using the exported APIs to do what I wanted with forcing the card to upgrade back to EVDO on-the-fly.

While this might initially seem like a pretty fatal roadblock (at least as far as using the connection manager and its libraries to perform the task at hand), it turned out that there was actually something of value in the V740 module after all, even if it didn’t directly expose any useful functionality of this sort through the exported interface. In a stroke of luck, it just so happened that the V740 module was linked to what appeared to be a debug version of the Novatel SDK. In particular, two important points arose from this being the case:

First, the debug version of the Novatel SDK would appear to have handy debug prints all over the place, many naming the function they are called from, providing a fair amount of insight into what internal functions in the module are named. Essentially, the debug prints made it essentially the same as if public symbols had been made available for the Novatel SDK part of the module, as most non-trivial functions in the SDK appear to contain at least one debug print, and virtually all the debug prints named the function they were called from.

Second, the SDK itself and the abstraction layer module appear to have been built without a certain type of linker optimization (/OPT:REF) that removes unreferenced code and data from a final binary. (Disabling this optimization is a default setting for debug builds.) This meant that for the most part, a large part of the Novatel SDK (with debug prints included) happened to be present in the V740 abstraction layer module. As a result of this, there existed the possibility that there might be something of use still there inside the V740 module, even if it wasn’t directly exposed.

At this point, I decided to continue down the road of examining the abstraction layer module, in the hopes that some portion of the baked-in Novatel SDK might be of use.

Next time: Taking a closer look at the V740 abstraction layer module and the Novatel SDK embedded into it.