Since I’m at the Microsoft Vista compatibity lab, it only makes sense that I’ve fixed a few Vista compatibility bugs in our product today.
Some of these are real bugs, but I ran into one in particular that is particularly infuriating: a completely undocumented, seemingly completely arbitrary restriction placed on a publicly documented API that has been around since Windows 98.
In this particular case, I was running into a problem where one of our products was being unable to add routes on Vista. This worked fine on prior platforms we supported, and so I started looking into it as a compatibility problem. First things first, I narrowed the problem down to a particular API that was failing.
We have a function that wrappers the various details about creating routes. The function in question went approximately like so:
//
// Add a route through the desired gateway.
//
DWORD
AddRoute(
__in unsigned long Network,
__in unsigned long Mask,
__in unsigned long Gateway
)
{
MIB_IPFORWARDROW Row;
DWORD Status, ForwardType;
unsigned long InterfaceIp, InterfaceIndex;
[...] // (Code to determine the local
// interface to add the route on)
//
// Setup the IP forward row.
//
ZeroMemory(&Row,
sizeof(Row));
Row.dwForwardDest = Network;
Row.dwForwardMask = Mask;
Row.dwForwardPolicy = 0;
Row.dwForwardNextHop = Gateway;
Row.dwForwardIfIndex = InterfaceIndex;
Row.dwForwardType = ForwardType;
Row.dwForwardProto = PROTO_IP_NETMGMT;
Row.dwForwardAge = INFINITE;
Row.dwForwardMetric1 = 0;
//
// Create the route.
//
if ((Status = CreateIpForwardEntry(&Row))
!= NO_ERROR)
{
wprintf(L"Creation failed, %lu.\\n",
Status);
return Status;
}
[...] // (More unrelated boilerplate code)
return Status;
}
Essentially, the problem here was that CreateIpForwardEntry was failing. Checking logs, the error code logged was 0xA0.
Using the handy Microsoft error code lookup utility (err.exe), it was easy to determine what this error code means:
C:\\>err a0
# for hex 0xa0 / decimal 160 :
INTERNAL_POWER_ERROR bugcodes.h
LLC_STATUS_BIND_ERROR dlcapi.h
SQL_160_severity_15 sql_err
# Rule does not contain a variable.
ERROR_BAD_ARGUMENTS winerror.h
# One or more arguments are not correct.
SCW_E_TOOMUCHDATAIN wpscoserr.mc
# Too much incoming data%0
# 5 matches found for "a0"
The only error that makes sense in this context is ERROR_BAD_ARGUMENTS. Unfortunately, that is not really all that helpful. Checking the latest MSDN documentation for CreateIpForwardEntry, there is, of course, no mention of this error code whatsoever.
Additionally, looking at the Microsoft documentation, nothing immediately jumped to mind as to what the problem is.
Although the Microsoft people here for the Vista lab did offer to see about getting me in touch with someone in the product team who might have an explanation for this behavior, I eventually decided that I would just take a crack at digging into the internals of CreateIpForwardEntry and understand the problem myself in the meanwhile to see if I might be able to come up with a fix sooner. After searching around a bit on Google and not coming up with any good explanation for what was going wrong, I eventually decided to step into iphlpapi!CreateIpForwardEntry in the debugger and see just what was going wrong first-hand.
0:000> bu iphlpapi!CreateIpForwardEntry
breakpoint 0 redefined
0:000> g
Breakpoint 0 hit
eax=0012fd6c ebx=00000004 ecx=00000000 edx=00000000
esi=01040a0a edi=00000003
eip=751bdfc1 esp=0012fd58 ebp=0012fdb0 iopl=0
nv up ei pl nz ac pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000
efl=00000216
iphlpapi!CreateIpForwardEntry:
751bdfc1 8bff mov edi,edi
Looking at the disassembly of CreateIpForwardEntry, it’s clear that this function is now just a stub that forwards the call onto another function that performs the real work:
0:000> u @eip
iphlpapi!CreateIpForwardEntry:
751bdfc1 8bff mov edi,edi
751bdfc3 55 push ebp
751bdfc4 8bec mov ebp,esp
751bdfc6 6a01 push 1
751bdfc8 ff7508 push dword ptr [ebp+8]
751bdfcb e820ffffff call CreateOrSetIpForwardEntry
751bdfd0 5d pop ebp
751bdfd1 c20400 ret 4
So, I pressed onward, stepping into iphlpapi!CreateOrSetIpForwardEntry…
0:000> tc
iphlpapi!CreateIpForwardEntry+0xa:
751bdfcb e820ffffff call CreateOrSetIpForwardEntry
0:000> t
eax=0012fd6c ebx=00000004 ecx=00000000 edx=00000000
esi=01040a0a edi=00000003
eip=751bdef0 esp=0012fd48 ebp=0012fd54 iopl=0
nv up ei pl nz ac pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000
efl=00000216
iphlpapi!CreateOrSetIpForwardEntry:
751bdef0 8bff mov edi,edi
Looking at the disassembly, there appears to be only one place where the error code ERROR_BAD_ARGUMENTS (disassembly truncated for better viewing):
0:000> uf @eip
iphlpapi!CreateOrSetIpForwardEntry:
751bdef0 8bff mov edi,edi
751bdef2 55 push ebp
751bdef3 8bec mov ebp,esp
751bdef5 83ec48 sub esp,48h
751bdef8 8365b800 and dword ptr [ebp-48h],0
751bdefc 56 push esi
751bdefd 6a2c push 2Ch
751bdeff 8d45bc lea eax,[ebp-44h]
751bdf02 6a00 push 0
751bdf04 50 push eax
751bdf05 e8f053ffff call memset
751bdf0a 8b7508 mov esi,dword ptr [ebp+8]
[...]
;
; Convert the interface metric we passed in with
; the pRoute structure into an interface LUID,
; stored at [ebp-30].
;
751bdf36 8d45d0 lea eax,[ebp-30h]
751bdf39 50 push eax
751bdf3a ff7610 push dword ptr [esi+10h]
751bdf3d e86590ffff call ConvertInterfaceIndexToLuid
751bdf42 85c0 test eax,eax
751bdf44 7571 jne 751bdfb7
;
; Get the interface metric for the requested interface,
; and store it at [ebp+8]. We pass in the address of
; the LUID of the requested interface in order to make
; the check.
;
iphlpapi!CreateOrSetIpForwardEntry+0x56:
751bdf46 8d4508 lea eax,[ebp+8]
751bdf49 50 push eax
751bdf4a 8d45d0 lea eax,[ebp-30h]
751bdf4d 50 push eax
751bdf4e e802f4ffff call GetInterfaceMetric
[...]
;
; Load esi with pRoute->dwForwardMetric1
;
751bdf6c 8b7624 mov esi,dword ptr [esi+24h]
751bdf6f 6a06 push 6
751bdf71 8945e0 mov dword ptr [ebp-20h],eax
751bdf74 83c8ff or eax,0FFFFFFFFh
751bdf77 3b7508 cmp esi,dword ptr [ebp+8]
751bdf7a 59 pop ecx
751bdf7b 8d7de8 lea edi,[ebp-18h]
751bdf7e f3ab rep stos dword ptr es:[edi]
751bdf80 8945ec mov dword ptr [ebp-14h],eax
751bdf83 8945f0 mov dword ptr [ebp-10h],eax
751bdf86 5f pop edi
;
; Check that esi is not less than [ebp+8]
; ... in other words, verify that
; pRoute->dwForwardMetric1 >= InterfaceMetric,
; where InterfaceMetric is set by GetInterfaceMetric()
;
751bdf87 7229 jb 751bdfb2 ; failure
iphlpapi!CreateOrSetIpForwardEntry+0x99:
751bdf89 2b7508 sub esi,dword ptr [ebp+8]
751bdf8c 6a18 push 18h
751bdf8e 8d45e8 lea eax,[ebp-18h]
751bdf91 50 push eax
751bdf92 6a30 push 30h
751bdf94 8d45b8 lea eax,[ebp-48h]
751bdf97 50 push eax
751bdf98 6a10 push 10h
751bdf9a 6864331b75 push 751b3364
751bdf9f ff750c push dword ptr [ebp+0Ch]
751bdfa2 8975f4 mov dword ptr [ebp-0Ch],esi
751bdfa5 6a01 push 1
751bdfa7 c645ff01 mov byte ptr [ebp-1],1
;
; Call the NsiSetAllParameters internal API to create the
; route, and return its return value to the caller.
;
751bdfab e86857ffff call NsiSetAllParameters
751bdfb0 eb05 jmp 751bdfb7
[...]
iphlpapi!CreateOrSetIpForwardEntry+0xc2:
;
; Return ERROR_BAD_ARGUMENTS
;
751bdfb2 b8a0000000 mov eax,0A0h
iphlpapi!CreateOrSetIpForwardEntry+0xc7:
751bdfb7 5e pop esi
751bdfb8 c9 leave
751bdfb9 c20800 ret 8
From this annotated disassembly, we can conclude that there are only two possibilities that might result in this behavior. The first is that GetInterfaceMetric(InterfaceIndex, &InterfaceMetric) is returning an InterfaceMetric greater than the metric we are supplying. The second is that NsiSetAllParameters is returning ERROR_BAD_ARGUMENTS.
To test this theory, we need to examine the comparison at 751bdf87 to determine if that is taking the failure branch, and we need to check the return value of NsiSetAllParameters. This is fairly easy to do with a couple of breakpoints:
0:000> bu 751bdf87
0:000> bu 751bdfb0
0:000> g
Breakpoint 1 hit
eax=ffffffff ebx=00000004 ecx=00000000 edx=7707e524
esi=00000000 edi=00000003
eip=751bdf87 esp=0012fcf8 ebp=0012fd44 iopl=0
nv up ei ng nz ac pe cy
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000
efl=00000297
iphlpapi!CreateOrSetIpForwardEntry+0x97:
751bdf87 7229 jb 751bdfb2 [br=1]
Our first breakpoint, the one on the comparison with the “Interface Metric” and the route metric we supplied in pRoute->dwForwardMetric1, was the one that hit first (as expected). Looking at the register context supplied by WinDbg, though, we can clearly see that the program is going to take the branch and head down the code path that returns ERROR_BAD_ARGUMENTS. Problem identified!
There still remains the issue of solving the problem, though. Looking at [ebp+8], it appears that the undocumented iphlpapi!GetInterfaceMetric returned 10:
0:000> ? dwo(@ebp+8)
Evaluate expression: 10 = 0000000a
This makes sense. We supplied a metric of 0, which is obviously less than 10. Unfortunately, now we need a good way to determine whether we should use a zero metric (for previous OS versions) or a different metric (for Vista), assuming we want our route to be the most precedent for a particular network/mask value.
Unfortunately, MSDN doesn’t turn up any hits on GetInterfaceMetric, and neither does Google. Well, that sucks – it looks like that for Vista, unless I want to hardcode 10, I’ll have to go off into undocumented land to use a publicly documented API. There seems to be something a bit ironic about that to me, but, nonetheless, the problem remains to be solved.
Update: There is a (minimally) documented solution that was very recently made available. See the bottom of the post for details.
So, all that we need to do is reverse engineer the parameters to this undocumented GetInterfaceMetric function and call it, right?
Well, no, not exactly – things actually get worse. It turns out that GetInterfaceMeteric is not even exported from iphlpapi.dll – it’s a purely internal function!
The only other option at this point, aside from hardcoding 10 as a minimum metric, is to reimplement all of the functionality of GetInterfaceMetric ourselves. Taking a look at GetInterfaceMetric, things look unfortunately rather complicated:
0:000> uf iphlpapi!GetInterfaceMetric
iphlpapi!GetInterfaceMetric:
751bd355 8bff mov edi,edi
751bd357 55 push ebp
751bd358 8bec mov ebp,esp
751bd35a 6a1c push 1Ch
751bd35c 6a04 push 4
751bd35e ff750c push dword ptr [ebp+0Ch]
751bd361 6a00 push 0
751bd363 6a08 push 8
751bd365 ff7508 push dword ptr [ebp+8]
751bd368 6a07 push 7
751bd36a 6864331b75 push NPI_MS_IPV4_MODULEID
751bd36f 6a01 push 1
751bd371 e88f5fffff call NsiGetParameter
751bd376 5d pop ebp
751bd377 c20800 ret 8
NPI_MS_IPV4_MODULEID is a global variable of some sort in iphlpapi:
0:000> db iphlpapi!NPI_MS_IPV4_MODULEID l8
751b3364 18 00 00 00 01 00 00 00 ........
Using the x command with ascending order, we can make an educated guess as to the size of this global by enumerating all symbols in iphlpapi in address space order:
0:000> x /a iphlpapi!*
[...]
751b3364 iphlpapi!NPI_MS_IPV4_MODULEID = <no type information>
751b3381 iphlpapi!NsiAllocateAndGetTable = <no type information>
[...]
So, we know that NPI_MS_IPV4_MODULEID must be no more than 0x1d bytes long. Taking a look around NPI_MS_IPV4_MODULE_ID, we see that past 0x18 bytes in, there appears to be code (nop instructions), making it likely that the global is 0x18 bytes long.
0:000> db 751b3364
751b3364 18 00 00 00 01 00 00 00-00 4a 00 eb 1a 9b d4 11
751b3374 91 23 00 50 04 77 59 bc-90 90 90 90 90 ff 25 94
(The repeated 90 90 90 90 bytes are a typical sign of code. 90 is the opcode for the nop instruction on x86, which the compiler typically uses for padding out function start offsets for alignment.)
Given this, we should be able to replicate the behavior of GetInterfaceMetrics, as the only function it calls, NsiGetParameter, is exported by nsi.dll (of course, it isn’t documented…). From the above disassembly, we can see that NsiGetParameter takes a ulong-sized argument (constant 0x1), a pointer argument (address of NPI_MS_IPV4_MODULEID), a ulong-sized argument (constant 0x7), a pointer that is the address of the interface LUID (argument 1 of GetInterfaceMetrics, which we saw earlier), a ulong-sized argument (constant 0x8), a ulong or pointer-sized argument (constant 0x0), a pointer-sized argument (address of a ULONG containing the “interface metric”), a ulong-sized argument (constant 0x4), and (finally!) a ulong-sized argument (constant 0x1c). I would surmise that the 0x8 and 0x4 constants are the sizes of the LUID and output buffer, though I haven’t bothered to confirm that at this point.
From our knowledge of __stdcall, we can identify NsiGetParameter as __stdcall quickly by looking at the disassembly of GetInterfaceMetrics and noticing the behavior after the function call (not removing arguments from the stack space, assuming the callee (NsiGetParameter) performs that task.
Given all of this, we can make our own function that implements GetInterfaceMetric. Now, just to be clear, I would not recommend actually using this, unless Microsoft fails to provide a documented mechanism to determine the minimum metric permitted for CreateIpForwardEntry (or removes the restriction) prior to Vista RTM. I am going to try and do whatever I can to see what ISV’s are supposed to do with this particular problem (and whether it can be fixed before RTM) before this week is up, but in the event that I don’t get anywhere, I’ll have a backup plan (as ugly and hackish as it may be) – better than not being able to manipulate the route table, period, on Vista.
Anyway, the basic idea is that we call ConvertInterfaceIndexToLuid on the InterfaceIndex that we already have from iphlpapi, to convert this into a NET_LUID structure (new to Vista). It does so happen that ConvertInterfaceIndexToLuid is a documented API, which makes that the easy part.
Then, we simply replicate the call that we saw in GetInterfaceMetric inside iphlpapi.dll. For brevity, I am not posting the entire source code for my implementation of GetInterfaceMetric inline; you can, however, download it. With this reverse engineered implementation, all that is left is to call it to get the minimum metric for the interface we are about to add a route on, and place that metric in the MIB_IPFORWARDROW that we pass to CreateIpForwardEntry.
I’ll post back when I hear from Microsoft as to the official word as to how one is to handle this situation; I fully expect that there will be a documented API (or the restriction will go away) before RTM, at this point, given that this is a rather bad compatibility bug that breaks a long-existing documented API in such a way that requires you to go into undocumented hackery to continue to use it (especially since there is no other good way that I know of to replicate the functionality of the API in question).
Update: You can use the GetIpInterfaceEntry routine (new to Vista, in iphlpapi) to find the minimum metric for an interface. Note that you will very likely need to search on MSDN to find information on this function, as it’s not been included in recent SDKs to my knowledge.
(Note: Some of the debugger output was slightly modified or truncated by me to keep the formatting sane.)