In Windows Vista, much of the network stack that ships with the OS uses much more stack than in previous versions of the operating system.
From my experience, just indicating a UDP datagram up to NDIS can require you to have over 4K of kernel stack available on x86, or you risk taking a double fault and causing the system to bugcheck.
For example, here’s a portion of the stack that I ran into while debugging an unrelated problem at the Vista compatibility lab:
0: kd> k100 ChildEBP RetAddr 818e6bdc 818ad19b RtlpBreakWithStatusInstruction 818e6c2c 818adc08 KiBugCheckDebugBreak+0x1c 818e6fdc 8184845e KeBugCheck2+0x5f4 818e6fdc 81871d35 KiTrap08+0x75 9c9cb084 8186dd14 SepAccessCheck+0x1e0 9c9cb0e0 81887907 SeAccessCheck+0x1a4 9c9cb51c 8715474c SeAccessCheckFromState+0xe4 9c9cb55c 871546d6 CompareSecurityContexts+0x47 9c9cb57c 87153b1a MatchValues+0xd4 9c9cb59c 87153aa7 CheckEqualConditionEnumMatch+0x3f 9c9cb63c 87153a1b MatchConditionOverlap+0x72 9c9cb660 87153774 FilterMatchEnum+0x6c 9c9cb674 8715948b FilterMatchEnumVisible+0x28 9c9cb6ac 87159520 IndexHashFastEnum+0x4d 9c9cb6f8 87158624 IndexHashEnum+0x139 9c9cb724 87159362 FeEnumLayer+0x7a 9c9cb7ac 87159b16 KfdGetLayerActionFromEnumTemplate+0x50 9c9cb7cc 8d6af9e4 KfdCheckAndCacheAcceptBypass+0x27 9c9cb8c4 8d6afc87 CheckAcceptBypass+0x146 9c9cb9a0 8d6b185d WfpAleAuthorizeReceive+0x82 9c9cba08 8d6ad542 WfpAleConnectAcceptIndicate+0x98 9c9cba74 8d6ad432 ProcessALEForTransportPacket+0xc5 9c9cbaf0 8d6ae6b3 ProcessAleForNonTcpIn+0x6f 9c9cbd28 8d6b0df0 WfpProcessInTransportStackIndication+0x2ab 9c9cbd78 8d6b0ae0 InetInspectReceiveDatagram+0x9a 9c9cbdfc 8d6b091c UdpBeginMessageIndication+0x33 9c9cbe44 8d6aecf3 UdpDeliverDatagrams+0xce 9c9cbe90 8d6aec40 UdpReceiveDatagrams+0xab 9c9cbea0 8d6acdd4 UdpNlClientReceiveDatagrams+0x12 9c9cbecc 8d6acba4 IppDeliverListToProtocol+0x49 9c9cbeec 8d6acad3 IppProcessDeliverList+0x2a 9c9cbf40 8d6ab443 IppReceiveHeaderBatch+0x1da 9c9cbfd0 8d6ac61d IpFlcReceivePackets+0xc06 9c9cc04c 8d6abf36 FlpReceiveNonPreValidatedNetBufferListChain +0x6db 9c9cc074 8727b0b0 FlReceiveNetBufferListChain+0x104 9c9cc0a8 8726d737 ndisMIndicateNetBufferListsToOpen+0xab 9c9cc0d0 8726d6ae ndisIndicateSortedNetBufferLists+0x4a 9c9cc24c 871b53c3 ndisMDispatchReceiveNetBufferLists+0x129 9c9cc268 872802c4 ndisMTopReceiveNetBufferLists+0x2c 9c9cc2b4 b0a3fb4d ndisMIndicatePacketsToNetBufferLists+0xe9
From ndisMIndicatePacketsToNetBufferLists to where the system double faulted (in my case) inside of SeAccessCheck, a whopping
4656 bytes of kernel stack were consumed.
So, now is the time to slim down your stack usage in your NDIS-related drivers, or you might be in for some unpleasant surprises when your drivers are used in conjunction with multiple third party IM drivers or the like (even better, you might investigate switching away from IM drivers and to the new filtering architecture). You should also be especially wary of any code that loops a packet that might potentially go back into tcpip.sys in a receive calling context (or any other context where you might have limited stack space available), as this can prove an unexpectedly expensive operation on Vista (and potentially beyond).
Oh, and a tip for finding stack hog functions with stack overflow problems: Use the ‘f’ flag with the ‘k’ command in WinDbg. For example:
0: kd> knf # Memory ChildEBP RetAddr 00 818e6bdc 818ad19b RtlpBreakWithStatusInstruction 01 50 818e6c2c 818adc08 KiBugCheckDebugBreak+0x1c 02 3b0 818e6fdc 8184845e KeBugCheck2+0x5f4 03 0 818e6fdc 81871d35 KiTrap08+0x75 [...]
This has the debugger compute the stack (arguments + locals) usage at each call frame point for you, saving you a bit of work with the calculator.
I wonder how much stack space they could have saved by using a couple strategically placed lookaside lists or if they are able to do any passive level resubmissions from a workitem. That will help with the problem. Too late now I guess.