The kernel object namespace and Win32, part 1

October 26th, 2006

The kernel object namespace is partially exposed by various Win32 APIs. Everything that allows you to create a named object that returns a kernel handle is interacting with the kernel object namespace in some form or another, and many Win32 APIs internally use the object namespace under the hood.

The kernel object namespace is fairly similar to a filesystem; there are object directories, which contain named objects. Objects can be of various different types, such as a Device object (created by a kernel driver) or an Event object, a Semaphore object, and soforth. Additionally, there are symbolic link objects, which (like filesystem links on a UNIX-based system) allow you to create one name that simply refers to another named object in the system.

Until the introduction of Windows 2000, the part of the kernel object namespace that Win32 exposed was a fairly limited and simple subset of the full object namespace available to drivers and programs using the native system call interfaces.

First, file-related APIs interact with the \DosDevices object directory (otherwise known as \??). This is the object directory that holds anything that you might open with CreateFile() and related calls, such as drive letter links (say, C:), serial ports (COM1), other standard DOS devices, and custom devices created by kernel drivers. This is why, if you are a driver, you need to explicitly specify \DosDevices\DeviceName instead of that being automatically assumed (as it is in Win32, if you call CreateFile). Otherwise, the created object name will not be easily accessible to Win32.

Secondly, there is the \BaseNamedObjects object directory. This object directory is where named Event, Mutex, Semaphore, and Section (file mapping) objects are based at when created with the Win32 API.

\BaseNamedObjects is managed and created by the Base API server dll (basesrv.dll) running in the context of CSRSS at boot time. This means that, in particular, boot start drivers cannot rely on \BaseNamedObjects as being present early in the boot process (which can be a problem if you want to share a named event object with a user mode program, from a boot start driver). \DosDevices, however, is created by the kernel itself at boot time and is generally always accessible.

In general, that is the limit to how much of the kernel namespace is directly exposed to (and used to support) Win32 prior to Windows 2000. (This is technically not quite true. There is a little used pair of kernel32 APIs called DefineDosDevice and QueryDosDevices that allow limited manipulation of symbolic links based within the \DosDevices object directory. Using these APIs, you can discover the native target names of many of the internal symbolic links (for example, C: -> \Device\HarddiskVolume2). You can also create symbolic links based in \DosDevices that point to other parts of the NT object namespace with the DDD_RAW_TARGET_PATH flag using DefineDosDevice.).

Next time I’ll go into a bit more detail as to how some of the changes to the object manager namespace work with Windows 2000, and then Windows XP, which both introduce some significant changes to how Win32 interacts with object names (first with improved multi-session support for Terminal Server and Fast User Switching, and then with how mapped drive letters work with LSA logon sessions).

Beware of stack usage with the new network stack in Windows Vista

October 24th, 2006

In Windows Vista, much of the network stack that ships with the OS uses much more stack than in previous versions of the operating system.

From my experience, just indicating a UDP datagram up to NDIS can require you to have over 4K of kernel stack available on x86, or you risk taking a double fault and causing the system to bugcheck.

For example, here’s a portion of the stack that I ran into while debugging an unrelated problem at the Vista compatibility lab:

0: kd> k100
ChildEBP RetAddr  
818e6bdc 818ad19b RtlpBreakWithStatusInstruction
818e6c2c 818adc08 KiBugCheckDebugBreak+0x1c
818e6fdc 8184845e KeBugCheck2+0x5f4
818e6fdc 81871d35 KiTrap08+0x75
9c9cb084 8186dd14 SepAccessCheck+0x1e0
9c9cb0e0 81887907 SeAccessCheck+0x1a4
9c9cb51c 8715474c SeAccessCheckFromState+0xe4
9c9cb55c 871546d6 CompareSecurityContexts+0x47
9c9cb57c 87153b1a MatchValues+0xd4
9c9cb59c 87153aa7 CheckEqualConditionEnumMatch+0x3f
9c9cb63c 87153a1b MatchConditionOverlap+0x72
9c9cb660 87153774 FilterMatchEnum+0x6c
9c9cb674 8715948b FilterMatchEnumVisible+0x28
9c9cb6ac 87159520 IndexHashFastEnum+0x4d
9c9cb6f8 87158624 IndexHashEnum+0x139
9c9cb724 87159362 FeEnumLayer+0x7a
9c9cb7ac 87159b16 KfdGetLayerActionFromEnumTemplate+0x50
9c9cb7cc 8d6af9e4 KfdCheckAndCacheAcceptBypass+0x27
9c9cb8c4 8d6afc87 CheckAcceptBypass+0x146
9c9cb9a0 8d6b185d WfpAleAuthorizeReceive+0x82
9c9cba08 8d6ad542 WfpAleConnectAcceptIndicate+0x98
9c9cba74 8d6ad432 ProcessALEForTransportPacket+0xc5
9c9cbaf0 8d6ae6b3 ProcessAleForNonTcpIn+0x6f
9c9cbd28 8d6b0df0 WfpProcessInTransportStackIndication+0x2ab
9c9cbd78 8d6b0ae0 InetInspectReceiveDatagram+0x9a
9c9cbdfc 8d6b091c UdpBeginMessageIndication+0x33
9c9cbe44 8d6aecf3 UdpDeliverDatagrams+0xce
9c9cbe90 8d6aec40 UdpReceiveDatagrams+0xab
9c9cbea0 8d6acdd4 UdpNlClientReceiveDatagrams+0x12
9c9cbecc 8d6acba4 IppDeliverListToProtocol+0x49
9c9cbeec 8d6acad3 IppProcessDeliverList+0x2a
9c9cbf40 8d6ab443 IppReceiveHeaderBatch+0x1da
9c9cbfd0 8d6ac61d IpFlcReceivePackets+0xc06
9c9cc04c 8d6abf36 FlpReceiveNonPreValidatedNetBufferListChain
                  +0x6db
9c9cc074 8727b0b0 FlReceiveNetBufferListChain+0x104
9c9cc0a8 8726d737 ndisMIndicateNetBufferListsToOpen+0xab
9c9cc0d0 8726d6ae ndisIndicateSortedNetBufferLists+0x4a
9c9cc24c 871b53c3 ndisMDispatchReceiveNetBufferLists+0x129
9c9cc268 872802c4 ndisMTopReceiveNetBufferLists+0x2c
9c9cc2b4 b0a3fb4d ndisMIndicatePacketsToNetBufferLists+0xe9

From ndisMIndicatePacketsToNetBufferLists to where the system double faulted (in my case) inside of SeAccessCheck, a whopping
4656 bytes
of kernel stack were consumed.

So, now is the time to slim down your stack usage in your NDIS-related drivers, or you might be in for some unpleasant surprises when your drivers are used in conjunction with multiple third party IM drivers or the like (even better, you might investigate switching away from IM drivers and to the new filtering architecture). You should also be especially wary of any code that loops a packet that might potentially go back into tcpip.sys in a receive calling context (or any other context where you might have limited stack space available), as this can prove an unexpectedly expensive operation on Vista (and potentially beyond).

Oh, and a tip for finding stack hog functions with stack overflow problems: Use the ‘f’ flag with the ‘k’ command in WinDbg. For example:

0: kd> knf
 #   Memory  ChildEBP RetAddr  
00           818e6bdc 818ad19b RtlpBreakWithStatusInstruction
01        50 818e6c2c 818adc08 KiBugCheckDebugBreak+0x1c
02       3b0 818e6fdc 8184845e KeBugCheck2+0x5f4
03         0 818e6fdc 81871d35 KiTrap08+0x75
[...]

This has the debugger compute the stack (arguments + locals) usage at each call frame point for you, saving you a bit of work with the calculator.

Debugging (or reverse engineering…) a real life Windows Vista compatibility problem: CreateIpForwardEntry in iphlpapi

October 24th, 2006

Since I’m at the Microsoft Vista compatibity lab, it only makes sense that I’ve fixed a few Vista compatibility bugs in our product today.

Some of these are real bugs, but I ran into one in particular that is particularly infuriating: a completely undocumented, seemingly completely arbitrary restriction placed on a publicly documented API that has been around since Windows 98.

In this particular case, I was running into a problem where one of our products was being unable to add routes on Vista. This worked fine on prior platforms we supported, and so I started looking into it as a compatibility problem. First things first, I narrowed the problem down to a particular API that was failing.

We have a function that wrappers the various details about creating routes. The function in question went approximately like so:

//
// Add a route through the desired gateway.
//

DWORD
AddRoute(
	__in unsigned long Network,
	__in unsigned long Mask,
	__in unsigned long Gateway
	)
{
	MIB_IPFORWARDROW Row;
	DWORD            Status, ForwardType;
	unsigned long    InterfaceIp, InterfaceIndex;

[...]	// (Code to determine the local
	// interface to add the route on)

	//
	// Setup the IP forward row.
	//

	ZeroMemory(&Row,
		sizeof(Row));

	Row.dwForwardDest    = Network;
	Row.dwForwardMask    = Mask;
	Row.dwForwardPolicy  = 0;
	Row.dwForwardNextHop = Gateway;
	Row.dwForwardIfIndex = InterfaceIndex;
	Row.dwForwardType    = ForwardType;
	Row.dwForwardProto   = PROTO_IP_NETMGMT;
	Row.dwForwardAge     = INFINITE;
	Row.dwForwardMetric1 = 0;

	//
	// Create the route.
	//

	if ((Status = CreateIpForwardEntry(&Row))
		!= NO_ERROR)
	{
		wprintf(L"Creation failed, %lu.\\n",
			Status);
		return Status;
	}

[...]	// (More unrelated boilerplate code)

	return Status;
}

Essentially, the problem here was that CreateIpForwardEntry was failing. Checking logs, the error code logged was 0xA0.

Using the handy Microsoft error code lookup utility (err.exe), it was easy to determine what this error code means:

C:\\>err a0
# for hex 0xa0 / decimal 160 :
  INTERNAL_POWER_ERROR                            bugcodes.h
  LLC_STATUS_BIND_ERROR                           dlcapi.h
  SQL_160_severity_15                             sql_err
# Rule does not contain a variable.
  ERROR_BAD_ARGUMENTS                             winerror.h
# One or more arguments are not correct.
  SCW_E_TOOMUCHDATAIN                             wpscoserr.mc
# Too much incoming data%0
# 5 matches found for "a0"

The only error that makes sense in this context is ERROR_BAD_ARGUMENTS. Unfortunately, that is not really all that helpful. Checking the latest MSDN documentation for CreateIpForwardEntry, there is, of course, no mention of this error code whatsoever.

Additionally, looking at the Microsoft documentation, nothing immediately jumped to mind as to what the problem is.

Although the Microsoft people here for the Vista lab did offer to see about getting me in touch with someone in the product team who might have an explanation for this behavior, I eventually decided that I would just take a crack at digging into the internals of CreateIpForwardEntry and understand the problem myself in the meanwhile to see if I might be able to come up with a fix sooner. After searching around a bit on Google and not coming up with any good explanation for what was going wrong, I eventually decided to step into iphlpapi!CreateIpForwardEntry in the debugger and see just what was going wrong first-hand.

0:000> bu iphlpapi!CreateIpForwardEntry
breakpoint 0 redefined
0:000> g
Breakpoint 0 hit
eax=0012fd6c ebx=00000004 ecx=00000000 edx=00000000
esi=01040a0a edi=00000003
eip=751bdfc1 esp=0012fd58 ebp=0012fdb0 iopl=0
nv up ei pl nz ac pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000
efl=00000216
iphlpapi!CreateIpForwardEntry:
751bdfc1 8bff            mov     edi,edi

Looking at the disassembly of CreateIpForwardEntry, it’s clear that this function is now just a stub that forwards the call onto another function that performs the real work:

0:000> u @eip
iphlpapi!CreateIpForwardEntry:
751bdfc1 8bff       mov     edi,edi
751bdfc3 55         push    ebp
751bdfc4 8bec       mov     ebp,esp
751bdfc6 6a01       push    1
751bdfc8 ff7508     push    dword ptr [ebp+8]
751bdfcb e820ffffff call    CreateOrSetIpForwardEntry
751bdfd0 5d         pop     ebp
751bdfd1 c20400     ret     4

So, I pressed onward, stepping into iphlpapi!CreateOrSetIpForwardEntry

0:000> tc
iphlpapi!CreateIpForwardEntry+0xa:
751bdfcb e820ffffff call    CreateOrSetIpForwardEntry
0:000> t
eax=0012fd6c ebx=00000004 ecx=00000000 edx=00000000
esi=01040a0a edi=00000003
eip=751bdef0 esp=0012fd48 ebp=0012fd54 iopl=0
nv up ei pl nz ac pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000
efl=00000216
iphlpapi!CreateOrSetIpForwardEntry:
751bdef0 8bff            mov     edi,edi

Looking at the disassembly, there appears to be only one place where the error code ERROR_BAD_ARGUMENTS (disassembly truncated for better viewing):

0:000> uf @eip
iphlpapi!CreateOrSetIpForwardEntry:
751bdef0 8bff            mov     edi,edi
751bdef2 55              push    ebp
751bdef3 8bec            mov     ebp,esp
751bdef5 83ec48          sub     esp,48h
751bdef8 8365b800        and     dword ptr [ebp-48h],0
751bdefc 56              push    esi
751bdefd 6a2c            push    2Ch
751bdeff 8d45bc          lea     eax,[ebp-44h]
751bdf02 6a00            push    0
751bdf04 50              push    eax
751bdf05 e8f053ffff      call    memset
751bdf0a 8b7508          mov     esi,dword ptr [ebp+8]

[...]

;
; Convert the interface metric we passed in with
; the pRoute structure into an interface LUID,
; stored at [ebp-30].
;

751bdf36 8d45d0          lea     eax,[ebp-30h]
751bdf39 50              push    eax
751bdf3a ff7610          push    dword ptr [esi+10h]
751bdf3d e86590ffff      call    ConvertInterfaceIndexToLuid
751bdf42 85c0            test    eax,eax
751bdf44 7571            jne     751bdfb7


;
; Get the interface metric for the requested interface,
; and store it at [ebp+8].  We pass in the address of
; the LUID of the requested interface in order to make
; the check.
;

iphlpapi!CreateOrSetIpForwardEntry+0x56:
751bdf46 8d4508          lea     eax,[ebp+8]
751bdf49 50              push    eax
751bdf4a 8d45d0          lea     eax,[ebp-30h]
751bdf4d 50              push    eax
751bdf4e e802f4ffff      call    GetInterfaceMetric

[...]

;
; Load esi with pRoute->dwForwardMetric1
;

751bdf6c 8b7624          mov     esi,dword ptr [esi+24h]
751bdf6f 6a06            push    6
751bdf71 8945e0          mov     dword ptr [ebp-20h],eax
751bdf74 83c8ff          or      eax,0FFFFFFFFh
751bdf77 3b7508          cmp     esi,dword ptr [ebp+8]
751bdf7a 59              pop     ecx
751bdf7b 8d7de8          lea     edi,[ebp-18h]
751bdf7e f3ab            rep stos dword ptr es:[edi]
751bdf80 8945ec          mov     dword ptr [ebp-14h],eax
751bdf83 8945f0          mov     dword ptr [ebp-10h],eax
751bdf86 5f              pop     edi

;
; Check that esi is not less than [ebp+8]
; ... in other words, verify that
; pRoute->dwForwardMetric1 >= InterfaceMetric,
; where InterfaceMetric is set by GetInterfaceMetric()
;

751bdf87 7229            jb      751bdfb2 ; failure

iphlpapi!CreateOrSetIpForwardEntry+0x99:
751bdf89 2b7508          sub     esi,dword ptr [ebp+8]
751bdf8c 6a18            push    18h
751bdf8e 8d45e8          lea     eax,[ebp-18h]
751bdf91 50              push    eax
751bdf92 6a30            push    30h
751bdf94 8d45b8          lea     eax,[ebp-48h]
751bdf97 50              push    eax
751bdf98 6a10            push    10h
751bdf9a 6864331b75      push    751b3364
751bdf9f ff750c          push    dword ptr [ebp+0Ch]
751bdfa2 8975f4          mov     dword ptr [ebp-0Ch],esi
751bdfa5 6a01            push    1
751bdfa7 c645ff01        mov     byte ptr [ebp-1],1

;
; Call the NsiSetAllParameters internal API to create the
; route, and return its return value to the caller.
;

751bdfab e86857ffff      call    NsiSetAllParameters
751bdfb0 eb05            jmp     751bdfb7
[...]

iphlpapi!CreateOrSetIpForwardEntry+0xc2:
;
; Return ERROR_BAD_ARGUMENTS
;
751bdfb2 b8a0000000      mov     eax,0A0h

iphlpapi!CreateOrSetIpForwardEntry+0xc7:
751bdfb7 5e              pop     esi
751bdfb8 c9              leave
751bdfb9 c20800          ret     8

From this annotated disassembly, we can conclude that there are only two possibilities that might result in this behavior. The first is that GetInterfaceMetric(InterfaceIndex, &InterfaceMetric) is returning an InterfaceMetric greater than the metric we are supplying. The second is that NsiSetAllParameters is returning ERROR_BAD_ARGUMENTS.

To test this theory, we need to examine the comparison at 751bdf87 to determine if that is taking the failure branch, and we need to check the return value of NsiSetAllParameters. This is fairly easy to do with a couple of breakpoints:

0:000> bu 751bdf87 
0:000> bu 751bdfb0 
0:000> g
Breakpoint 1 hit
eax=ffffffff ebx=00000004 ecx=00000000 edx=7707e524
esi=00000000 edi=00000003
eip=751bdf87 esp=0012fcf8 ebp=0012fd44 iopl=0
nv up ei ng nz ac pe cy
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000
efl=00000297
iphlpapi!CreateOrSetIpForwardEntry+0x97:
751bdf87 7229            jb      751bdfb2 [br=1]

Our first breakpoint, the one on the comparison with the “Interface Metric” and the route metric we supplied in pRoute->dwForwardMetric1, was the one that hit first (as expected). Looking at the register context supplied by WinDbg, though, we can clearly see that the program is going to take the branch and head down the code path that returns ERROR_BAD_ARGUMENTS. Problem identified!

There still remains the issue of solving the problem, though. Looking at [ebp+8], it appears that the undocumented iphlpapi!GetInterfaceMetric returned 10:

0:000> ? dwo(@ebp+8)
Evaluate expression: 10 = 0000000a

This makes sense. We supplied a metric of 0, which is obviously less than 10. Unfortunately, now we need a good way to determine whether we should use a zero metric (for previous OS versions) or a different metric (for Vista), assuming we want our route to be the most precedent for a particular network/mask value.

Unfortunately, MSDN doesn’t turn up any hits on GetInterfaceMetric, and neither does Google. Well, that sucks – it looks like that for Vista, unless I want to hardcode 10, I’ll have to go off into undocumented land to use a publicly documented API. There seems to be something a bit ironic about that to me, but, nonetheless, the problem remains to be solved.

Update: There is a (minimally) documented solution that was very recently made available. See the bottom of the post for details.

So, all that we need to do is reverse engineer the parameters to this undocumented GetInterfaceMetric function and call it, right?

Well, no, not exactly – things actually get worse. It turns out that GetInterfaceMeteric is not even exported from iphlpapi.dll – it’s a purely internal function!

The only other option at this point, aside from hardcoding 10 as a minimum metric, is to reimplement all of the functionality of GetInterfaceMetric ourselves. Taking a look at GetInterfaceMetric, things look unfortunately rather complicated:

0:000> uf iphlpapi!GetInterfaceMetric
iphlpapi!GetInterfaceMetric:
751bd355 8bff            mov     edi,edi
751bd357 55              push    ebp
751bd358 8bec            mov     ebp,esp
751bd35a 6a1c            push    1Ch
751bd35c 6a04            push    4
751bd35e ff750c          push    dword ptr [ebp+0Ch]
751bd361 6a00            push    0
751bd363 6a08            push    8
751bd365 ff7508          push    dword ptr [ebp+8]
751bd368 6a07            push    7
751bd36a 6864331b75      push    NPI_MS_IPV4_MODULEID
751bd36f 6a01            push    1
751bd371 e88f5fffff      call    NsiGetParameter
751bd376 5d              pop     ebp
751bd377 c20800          ret     8

NPI_MS_IPV4_MODULEID is a global variable of some sort in iphlpapi:

0:000> db iphlpapi!NPI_MS_IPV4_MODULEID l8
751b3364  18 00 00 00 01 00 00 00  ........

Using the x command with ascending order, we can make an educated guess as to the size of this global by enumerating all symbols in iphlpapi in address space order:

0:000> x /a iphlpapi!*
[...]
751b3364 iphlpapi!NPI_MS_IPV4_MODULEID = <no type information>
751b3381 iphlpapi!NsiAllocateAndGetTable = <no type information>
[...]

So, we know that NPI_MS_IPV4_MODULEID must be no more than 0x1d bytes long. Taking a look around NPI_MS_IPV4_MODULE_ID, we see that past 0x18 bytes in, there appears to be code (nop instructions), making it likely that the global is 0x18 bytes long.

0:000> db 751b3364 
751b3364  18 00 00 00 01 00 00 00-00 4a 00 eb 1a 9b d4 11
751b3374  91 23 00 50 04 77 59 bc-90 90 90 90 90 ff 25 94

(The repeated 90 90 90 90 bytes are a typical sign of code. 90 is the opcode for the nop instruction on x86, which the compiler typically uses for padding out function start offsets for alignment.)

Given this, we should be able to replicate the behavior of GetInterfaceMetrics, as the only function it calls, NsiGetParameter, is exported by nsi.dll (of course, it isn’t documented…). From the above disassembly, we can see that NsiGetParameter takes a ulong-sized argument (constant 0x1), a pointer argument (address of NPI_MS_IPV4_MODULEID), a ulong-sized argument (constant 0x7), a pointer that is the address of the interface LUID (argument 1 of GetInterfaceMetrics, which we saw earlier), a ulong-sized argument (constant 0x8), a ulong or pointer-sized argument (constant 0x0), a pointer-sized argument (address of a ULONG containing the “interface metric”), a ulong-sized argument (constant 0x4), and (finally!) a ulong-sized argument (constant 0x1c). I would surmise that the 0x8 and 0x4 constants are the sizes of the LUID and output buffer, though I haven’t bothered to confirm that at this point.

From our knowledge of __stdcall, we can identify NsiGetParameter as __stdcall quickly by looking at the disassembly of GetInterfaceMetrics and noticing the behavior after the function call (not removing arguments from the stack space, assuming the callee (NsiGetParameter) performs that task.

Given all of this, we can make our own function that implements GetInterfaceMetric. Now, just to be clear, I would not recommend actually using this, unless Microsoft fails to provide a documented mechanism to determine the minimum metric permitted for CreateIpForwardEntry (or removes the restriction) prior to Vista RTM. I am going to try and do whatever I can to see what ISV’s are supposed to do with this particular problem (and whether it can be fixed before RTM) before this week is up, but in the event that I don’t get anywhere, I’ll have a backup plan (as ugly and hackish as it may be) – better than not being able to manipulate the route table, period, on Vista.

Anyway, the basic idea is that we call ConvertInterfaceIndexToLuid on the InterfaceIndex that we already have from iphlpapi, to convert this into a NET_LUID structure (new to Vista). It does so happen that ConvertInterfaceIndexToLuid is a documented API, which makes that the easy part.

Then, we simply replicate the call that we saw in GetInterfaceMetric inside iphlpapi.dll. For brevity, I am not posting the entire source code for my implementation of GetInterfaceMetric inline; you can, however, download it. With this reverse engineered implementation, all that is left is to call it to get the minimum metric for the interface we are about to add a route on, and place that metric in the MIB_IPFORWARDROW that we pass to CreateIpForwardEntry.

I’ll post back when I hear from Microsoft as to the official word as to how one is to handle this situation; I fully expect that there will be a documented API (or the restriction will go away) before RTM, at this point, given that this is a rather bad compatibility bug that breaks a long-existing documented API in such a way that requires you to go into undocumented hackery to continue to use it (especially since there is no other good way that I know of to replicate the functionality of the API in question).

Update: You can use the GetIpInterfaceEntry routine (new to Vista, in iphlpapi) to find the minimum metric for an interface. Note that you will very likely need to search on MSDN to find information on this function, as it’s not been included in recent SDKs to my knowledge.

(Note: Some of the debugger output was slightly modified or truncated by me to keep the formatting sane.)

Useful WinDbg commands: .formats

October 23rd, 2006

One of the many things that you end up having to do while debugging a program is displaying data types. While you probably know many of the basic commands like db, da, du, and soforth, one perhaps little-used command is useful for displaying a four or eight byte quantity in a number of different data types: the “.formats” command. This command is useful for viewing various primative/built-in data types, where you cannot display as a structure via the “dt” command.

In particular, you can use .formats to translate a number of different data types into readable values, including floating point or various time formats (time_t if you provide a 32-bit value, or FILETIME if you give a 64-bit value). For instance:

0:001> .formats 41414141
Evaluate expression:
  Hex:     41414141
  Decimal: 1094795585
  Octal:   10120240501
  Binary:  01000001 01000001 01000001 01000001
  Chars:   AAAA
  Time:    Fri Sep 10 01:53:05 2004
  Float:   low 12.0784 high 0
  Double:  5.40901e-315

The command also supports 64-bit filetime quantities:

0:001> .formats 01010101`01010101
Evaluate expression:
  Hex:     01010101`01010101
  Decimal: 72340172838076673
  Octal:   0004010020040100200401
  Binary:  00000001 00000001 00000001 00000001
           00000001 00000001 00000001 00000001
  Chars:   ........
  Time:    Sun Mar 28 21:14:43.807 1830 (GMT-4)
  Float:   low 2.36943e-038 high 2.36943e-038
  Double:  7.7486e-304

.formats is primarily useful for saving you a bit of time poking around in a calculator to translate times, or convert perhaps an overwritten eip into text if you are examining a stack buffer string overflow. In conjunction with db and dt, you should be able to format most any data you’ll come across in a debugging session into a readable format (provided you have symbols, of course, in the case of complex user-defined data types).

I don’t think that is what they really meant to say…

October 22nd, 2006

While trying to identify just what kind of device Steve’s Mac appears when plugged into my laptop over a 1394 cable, I ran into this charming result from Google:

Performance of 1394 devices may decrease after you install Windows ...

“Performance of 1394 devices may decrease after you install Windows …” [support.microsoft.com]

I don’t think that is what they really meant to say. I suppose Google summaries can be bad at times…

This is but one of many strange or poorly designed things I’ve encountered in the past few days…

Annoyances with IE7

October 22nd, 2006

Since installing IE7, I’ve ran into a couple of annoyances.

The largest of which is that you can no longer use the trick to to launch an instance of iexplore.exe under Run As, and then navigate to the Control Panel to get an administrator view of Control Panel if you are logged on as a limited user (for pre-Vista). Now, instead, the admin IE instance will just tell the already-running explorer instance (which is running as your limited user account) to open a window at Control Panel. This is of course not what I want, which leaves me stuck with remembering the names of the individual .cpl files and launching them from an admin. Unfortunately for me, this just made running as a limited user on Windows XP and Windows Server 2003 much more painful; not a good thing from the perspective of a browser that is supposed to make things more secure. (In case you were wondering, you can’t just launch an admin explorer.exe while you already have explorer running under your user account. If you try to do this, the admin explorer instance will tell the already running explorer instance to open a new window, and then exit.) Alternatively, I could configure explorer to use a different process for every window, which does actually allow you to run explorer directly with Run As, but this has the unfortunate side effect of dramatically increasing memory usage if you have multiple explorer folder windows open.

The other things I have ran into so far are site compatibility problems, like lists breaking for WordPress. I am not sure if this particular problem is a WordPress one or an IE7 one, having not been particularly inclined to delve into HTML DOM debugging, but WordPress does appear to validate cleanly under the W3C XHTML validator. Some compatibility things are to be expected, of course, but it’s a bit disappointing to see them so glaringly obvious without either WordPress or Microsoft having done something to fix (or even acknowledge) the problem by now. Sigh.

As for tabbed browsing, I’m not sure if I really like this much yet. Up till now, I’ve pretty much always used “old-fashioned”, windowed browsing. I’ll see if tabbed browsing grows on me, but I wish I didn’t have to sacrifice ease of running as non-admin for it…

(Update: a commenter, jpassing, suggested using “explorer.exe /separate” with Run As, which appears to work nicely as a replacement for starting iexplore.exe when IE7 is installed.)

Heading to Redmond…

October 21st, 2006

I’m going out to Redmond for this coming week for a Vista compatibility lab focused on helping ISVs with getting their applications running well on Vista. I’ll try to keep the blog updated with anything interesting that I find out on the way. Besides the standard stuff about UAC and non-admin users, I’m hoping to get some more obscure and difficult things cleared up that I have thus far never really got an answer to, such as how some of the UAC changes to how tokens work will affect the ability to alter network credentials of running processes.

Unordered list items broken on the blog in IE7

October 21st, 2006

Taking a little segway from the usual topics on the blog, today I got around to installing IE7 for the first time. Unfortunately, it seems that I have run into my first site compatibility issue that I really care about: a problem with WordPress.

For some reason, unordered list items appear to be not showing up as bulleted items on the blog when you are using IE7. The list items are still indented, but they don’t have a bullet prefixing them (just whitespace). I haven’t yet spent much time debugging this issue, which is for now just a minor annoyance. It doesn’t seem to be specific to my blog, as IE7 is having this problem for me with other WordPress blogs.

For example, when using IE7, these list items do not have bullet prefixes presently:

  • test list item 1
  • test list item 2

Anyone have a workaround or fix for this particular annoyance? Comment away if so…

Win32 calling conventions: __stdcall in assembler

October 20th, 2006

It’s been awhile since my last post, unfortunately, primarily due to my being a bit swamped with work and a couple of other things as of late. With that said, I’m going to start by picking up where I had previously left off with the Win32 calling conventions series. Without further ado, here’s the stuff on __stdcall as you’ll see it in assembler…

Like __cdecl, __stdcall is completely stack-based.  The semantics of __stdcall are very similar to __cdecl, except that the arguments are cleaned off the stack by the callee instead of the caller.  Because the number of arguments removed from the stack is burned into the target function at compile time, there is no support for variadic functions (functions that take a variable number of arguments, such as printf) that use the __stdcall calling convention.  The rules for register usage and return values are otherwise identical to __cdecl.

In practice, this typically means that an __stdcall function call will look much like a __cdecl function call until you examine the ret instruction that returns transfer to the caller at the end of the __stdcall function in question.  (Alternatively, you can look to see if it appears as if stack arguments are cleaned after the function call.  However, the compiler/optimizer sometimes likes to be tricky with __cdecl functions, and defer argument removal until several function calls later, so this method is less reliable.)

Because the callee cleans the arguments off the stack in an __stdcall function, you will always[1] see a ret instruction terminating a __stdcall function.  For most functions, this count is four times the number of arguments to the function, but this can vary if arguments that are larger than 32-bits are passed.  On Win32, this argument count in bytes value is virtually always[2] a multiple of four, as the compiler will always generate code that aligns the stack to at least four bytes for x86 targets.

Given this information, it is usually fairly easy to distinguish an __stdcall function from a __cdecl function, as a __cdecl function will never use an argument to ret.  Note that this does imply, however, that it is generally not possible to disinguish between an __stdcall function and a __cdecl function in the case that both take zero arguments (without any other outside information other than disassembly); in this special case, the calling conventions have the same semantics.  This also means that if you have a function that does not clean any bytes off the stack with ret, you’ll technically have to examine any callers of the function to see if any pass more than zero arguments (or the actual function implementation itself, to see if it ever expects more than zero arguments) in order to be absolutely sure if the function is __cdecl or __stdcall.

Here’s an example of a simple __stdcall function call for the following C function:
 

__declspec(noinline)
int __stdcall StdcallFunction1(int a, int b, int c)
{
 return (a + b) * c;
}

If we call the function like this:

StdcallFunction1(1, 2, 3);

… we can expect to see something like so, for the call:

push    3
push    2
push    1
call    StdcallFunction1

(There will be no add esp instruction after the call.)

This is quite similar to a __cdecl declared function with the same implementation.  The only difference is the lack of an add esp instruction following the call.

Looking at the function implementation, we can see that unlike the __cdecl version of this function, StdcallFunction1 removes the arguments from the stack:

StdcallFunction1 proc near

a= dword ptr  4 b= dword ptr  8 c= dword ptr  0Ch mov     eax, [esp+8] ; eax = b mov     ecx, [esp+4] ; ecx = a add     eax, ecx     ; eax = eax + ecx imul    eax, [esp+c] ; eax = eax * c retn    0Ch          ; (return value = eax) StdcallFunction1 endp

As expected, the only difference here is that the __stdcall version of the function cleans the three arguments off the stack.  The function is otherwise identical to the __cdecl version, with the return value stored in eax.

With all of this information, you should be able to rather reliably identify most __stdcall functions.  The key things to look out for are:

  • All arguments are on the stack.
  • The ret instruction terminating the function has a non-zero argument count if the number of arguments for the function is non-zero.
  • The ret instruction terminating the function has an argument count that is at least four times the number of arguments for the function.  (If the count is less than four, then the function might be a __fastcall function with three or more arguments.  The __fastcall calling convention passes the first two 32-bit or smaller arguments in registers.)
  • The function does not depend on the state of the ecx and edx volatile variables.  (If the function expects these registers to have a meaningful value initially, then the function is probably a __fastcall or __thiscall function, as those calling conventions pass arguments in the ecx and edx registers.) 

In the next post in this series, I’ll cover the __fastcall calling convention (and hopefully it won’t be such a long wait this time).  Stay tuned…

 

[1]: For functions declared as __declspec(noreturn) or that otherwise never normally return execution control directly to the caller (i.e. a function that always throws an exception), the ret instruction is typically omitted.  There are a couple of other rare cases where you may see no terminating ret, such as if there are two functions, where one function calls the second, and both have very similar prototypes (such as argument ordering or an additional defaulted argument).  In this case, the compiler may combine two functions by having one perform minor adjustments to the stack and then “falling through” directly to the second function.

[2]: If you see a function with a ret instruction that does not take a multiple of four as its argument, then the function was most likely hand-written in assembler.  The Microsoft compiler will never, to my knowledge, generate code like this (and neither should any sane Win32 compiler).

DxWnd 1.034 released

September 24th, 2006

I’ve released a new version of DxWnd (requires the VC8SP0 CRT) – version 1.034. This is a minor release that, among fixing a couple of various bugs and some internal code cleanup and reorganization to build under VC8, adds a new feature: Video output rescaling.

I recently got a nice 20.1″ LCD to use as a second monitor for my main laptop at home. Unfortunately, I discovered that a lot of my old favorite classic games tended to do not-so-great things to your desktop color depth when you run them natively (in fullscreen mode), which while you might normally not care about, turns out to be a real bummer if you have something like an IM client or whatnot up on a second monitor.

So, I turned to a program I had written a couple of years ago – DxWnd. DxWnd is a program that lets you run DirectDraw 7 (or below) programs that only support fullscreen mode in a window. It accomplishes this by hooking various DirectDraw APIs and tricking the program into thinking that it is running at 640×480 (or whatever resolution it wants) fullscreen, when it is in fact running in a plain window at that resolution. Unfortunately, while DxWnd solves the color depth issue, running games at 640×480 on a 1920×1200 desktop is not really the best experience. Thus, I set out to make a couple of minor modifications to DxWnd to support rescaling the output. These are fairly simple in principle:

  • Use StretchBlt instead of BitBlt to copy data from the DirectDraw surface that the program writes to into the GDI device context associated with the actual window I am displaying on screen. The reason why I perform this extra buffering step in the first place is that GDI provides nice automatic palette conversions from DirectDraw surface DCs to plain desktop window DCs. Changing the BitBlt to a StretchBlt simply rescales the current video image to a new resolution as it is copied for display purposes.
  • For programs that call ScreenToClient / ClientToScreen / MapWindowPoints (or deal with mouse cursor coordinates), but do not correctly handle the fact that their program’s client area may not be centered at (0, 0) (after all, the program was written to only run in fullscreen mode, so normally this shortcut can be taken), DxWnd needs to alter the lie it tells in these functions. Previously, DxWnd would “fix up” the coordinates that get returned to a program (or that a program gives to Windows) so that the program only sees things centered at (0, 0). Now, in addition to that, DxWnd needs to scale these coordinates either from the real output resolution to the resolution that the program appears to be running at, or vice versa, depending on whether the coordinates are going “into” or “out of” the program. This does have one unfortunate side effect, which is that relative to a program that natively supports a given resolution, there is a perceived loss of precision when you move the mouse pointer in the rescaled video output window. This is because mouse cursor coordinates must be rescaled to values that are relative to the resolution that the program is expecting to be running at. For example, if you are running at twice the program’s native resolution, and the program draws a custom mouse cursor, then the cursor may only appear to move every two pixels that you move it instead of every one pixel (like you might expect).
  • For programs that use DirectInput for mouse coordinates, these coordinates also need to be scaled so that they are relative to the virtual screen at (0, 0) that the program expects all coordinates to be relative to.
  • Since we are scaling the output of a program, DxWnd can now allow the user to resize, maximize, or restore the window it creates to contain the video data from the program being hooked. For programs where the user has asked DxWnd to capture the mouse to the client area of the video output window, the mouse cursor capture needs to be recalculated if the window size changes (otherwise, you could not move the mouse cursor outside of the original window size).

With the new DxWnd, I can play some old classics like Master of Orion 2 or Privateer 2 rescaled to my desktop resolution on one monitor while still using a second monitor for things like e-mail or IM – and, without the color depth on my auxiliary display being reduced to 8-bit (or worse). There is some more information about DxWnd on the corresponding topic on the Valhalla Legends forum, if you are interested.