One of the more useful tools for tracking down memory leaks in Windows is a utility called UMDH that ships with the WinDbg distribution. Although I’ve previously covered what UMDH does at a high level, and how it functions, the basic principle for it, in a nutshell, is that it uses special instrumentation in the heap manager that is designed to log stack traces when heap operations occur.
UMDH utilizes the heap manager’s stack trace instrumentation to associate call stacks with outstanding allocations. More specifically, UMDH is capable of taking a “snapshot” of the current state of all heaps in a process, associating like-sized allocations from like-sized callstacks, and aggregrating them in a useful form.
The general principle of operation is that UMDH is typically run two (or more times), once to capture a “baseline” snapshot of the process after it has finished initializing (as there are expected to always be a number of outstanding allocations while the process is running that would not be normally expected to be freed until process exit time, for example, any allocations used to build the command line parameter arrays provided to the main function of a C program, or any other application-derived allocations that would be expected to remain checked out for the lifetime of the program.
This first “baseline” snapshot is essentially intended to be a means to filter out all of these expected, long-running allocations that would otherwise show up as useless noise if one were to simply take a single snapshot of the heap after the process had leaked memory.
The second (and potentially subsequent) snapshots are intended to be taken after the process has leaked a noticeable amount of memory. UMDH is then run again in a special mode that is designed to essentially do a logical “diff” between the “baseline” snapshot and the “leaked” snapshot, filtering out any allocations that were present in both of them and returning a list of new, outstanding allocations, which would generally include any leaked heap blocks (although there may well be legitimate outstanding allocations as well, which is why it is important to ensure that the “leaked” snapshot is taken only after a non-trivial amount of memory has been leaked, if at all possible).
Now, this is all well and good, and while UMDH proves to be a very effective tool for tracking down memory leaks with this strategy, taking a “before” and “after” diff of a problem and analyzing the two to determine what’s gone wrong is hardly a new, ground-breaking concept.
While the theory behind UMDH is sound, however, there are some situations where it can work less than optimally. The most common failure case of UMDH in my experience is not actually so much related to UMDH itself, but rather the heap manager instrumentation code that is responsible for logging stack traces in the first place.
As I had previously discussed, the heap manager stack trace instrumentation logic does not have access to symbols, and on x86, “perfect” stack traces are not generally possible, as there is no metadata attached with a particular function (outside of debug symbols) that describes how to unwind past it.
The typical approach taken on x86 is to assume that all functions in the call stack do not use frame pointer omission (FPO) optimizations that allow the compiler to eliminate the usage of ebp for a function entirely, or even repurpose it for a scratch register.
Now, most of the libraries that ship with the operating system in recent OS releases have FPO explicitly turned off for x86 builds, with the sole intent of allowing the built-in stack trace instrumentation logic to be able to traverse through system-supplied library functions up through to application code (after all, if every heap stack trace dead-ended at kernel32!HeapAlloc, the whole concept of heap allocation traces would be fairly useless).
Unfortunately, there happens to be a notable exception to this rule, one that actually came around to bite me at work recently. I was attempting to track down a suspected leak with UMDH in one of our programs, and noticed that all of the allocations were grouped into a single stack trace that dead-ended in a rather spectacularly unhelpful way. Digging in a bit deeper, in the individual snapshot dumps from UMDH contained scores of allocations with the following backtrace logged:
00000488 bytes in 0x1 allocations
(@ 0x00000428 + 0x00000018) by: BackTrace01786
7C96D6DC : ntdll!RtlDebugAllocateHeap+000000E1
7C949D18 : ntdll!RtlAllocateHeapSlowly+00000044
7C91B298 : ntdll!RtlAllocateHeap+00000E64
211A179A : program!malloc+0000007A
This particular outcome happened to be rather unfortunate, as in the specific case of the program I was debugging at work, virtually all memory allocations in the program (including the ones I suspected of leaking) happened to ultimately get funneled through malloc.
Obviously, getting told that “yes, every leaked memory allocation goes through malloc” isn’t really all that helpful if (most) every allocation in the program in question happened to go through malloc. The UMDH output begged the question, however, as to why exactly malloc was breaking the stack traces. Digging in a bit deeper, I discovered the following gem while disassembling the implementation of malloc:
0:011> u program!malloc program!malloc [f:\sp\vctools\crt_bld\self_x86\crt\src\malloc.c @ 155]: 211a1720 55 push ebp 211a1721 8b6c2408 mov ebp,dword ptr [esp+8] 211a1725 83fde0 cmp ebp,0FFFFFFE0h [...]
In particular, it would appear that the default malloc implementation on the static link CRT on Visual C++ 2005 not only doesn’t use a frame pointer, but it trashes ebp as a scratch register (here, using it as an alias register for the first parameter, the count in bytes of memory to allocate). Disassembling the DLL version of the CRT revealed the same problem; ebp was reused as a scratch register.
What does this all mean? Well, anything using malloc that’s built with Visual C++ 2005 won’t be diagnosable with UMDH or anything else that relies on ebp-based stack traces, at least not on x86 builds. Given that many things internally go through malloc, including operator new (at least in the default implementation), this means that in the default configuration, things get a whole lot harder to debug than they should be.
One workaround here would be to build your own copy of the CRT with /Oy- (force frame pointer usage), but I don’t really consider building the CRT a very viable option, as that’s a whole lot of manual work to do and get up and running correctly on every developer’s machine, not to mention all the headaches that service releases that will require rebuilds will bring with such an approach.
For operator new, it’s fortunately relatively doable to overload it in a relatively supported way to be implemented against a different allocation strategy. In the case of malloc, however, things don’t really have such a happy ending; one is either forced to re-alias the name using preprocessor macro hackery to a custom implementation that does not suffer from a lack of frame pointer usage, or otherwise change all references to malloc/free to refer to a custom allocator function (perhaps implemented against the process heap directly instead of the CRT heap a-la malloc).
So, the next time you use UMDH and get stuck scratching your head while trying to figure out why your stack traces are all dead-ending somewhere less than useful, keep in mind that the CRT itself may be to blame, especially if you’re relying on CRT allocators. Hopefully, in a future release of Visual Studio, the folks responsible for turning off FPO in the standard OS libraries can get in touch with the persons responsible for CRT builds and arrange for the same to be done, if not for the entire CRT, then at least for all the code paths in the standard heap routines. Until then, however, these CRT allocator routines remain roadblocks for effective leak diagnosis, at least when using the better tools available for the job (UMDH).
Ugh, never liked this ABI breaking optimization. Death to FPO!
Shouldn’t the heap instrumentation code be more clever and be able to use PDBs (if you have them) for FPO code, at the very least, to get decent x86 stack traces? After all, DbgHelp is right there for it to use.
I found DebugDiag more reliable and informative that UMDH.
Speaking of future releases of Visual Studio, how does it look for VS2008?
From “Umdhtools.exe: How to use Umdh.exe to find memory leaks” (http://support.microsoft.com/kb/268343):
Note Because the malloc function in the C run-time (CRT) module uses the frame pointer omission (FPO) on the original release version of Windows Server 2003, you may not see the complete stack information of the malloc function by using the UMDH tool. This issue is fixed in the CRT module of Windows Server 2003 Service Pack 1 (SP1). Therefore, you can see the complete stack information of the malloc function in Windows Server 2003 SP1.
Nice find.
I believe this just applies to MSVCRT.dll that ships with the OS. MSVCR80.dll and the VC8 CRT are isolated from the OS MSVCRT however and as far as I know are controlled by the VC/tools group and not the core/OS group.
Soren: I haven’t tried VS2008 yet, so I’m not sure. It’s on the list of things to do at some point :)
Koby: Well, yes and no. The architectural purity question there is that the heap tracing code lives in NTDLL, and having NTDLL depend upon DbgHelp is an undesirable thing. It could certainly technically be to make that work in some fashion or another, but it’d be an ugly hack at best, I’d imagine. Some code that uses the same stack tracing logic lives in kernel mode (handle operation tracing), and that code would have an even harder time taking advantage of symbols. Granted, that’s not so much related to UMDH, but simply another thing that will be broken without ebp frames.
Hello,
Enabling FPO for the 8.0 CRT was not deliberate. The Visual Studio 2008 CRT (9.0) does NOT have FPO enabled, and UMDH should function normally.
For 8.0, an alternative to UMDH would be to use LeakDiag. LeakDiag will actually instrument memory allocators to obtain stack traces. This makes it more versatile than UMDH as it can hook several different allocator types at different granularities (Ranging from the c runtime to raw virtual memory allocations).
By default, LeakDiag simply walks the stack base pointers, but it can be modified to use the Dbghlp StackWalkAPI to resolve FPO data. This will produce full stacks, though the performance penalty is higher. On the flip side, you can customize the stack walking behavior to only go to a certain depth, etc to minimize the perf penalty.
Please find LeakDiag here:
ftp://ftp.microsoft.com/PSS/Tools/Developer%20Support%20Tools/LeakDiag/leakdiag125.msi
Thanks,
Mark
For what it’s worth, another workaround would be to use an LD_PRELOAD style mechanism to bypass the crt malloc assuming you have an alternative implementation available. Detours is certainly capable of it (http://research.microsoft.com/sn/detours/) and a quick google shows a couple of others that are more lightweight.
For an alternative imlementation of malloc well…
pt-malloc from glibc is pretty straightfoward in terms of code an is dual licenced under BSD but while I’d expect it to compile on windows easily I’ve never actually tried.
I believe that’s what leakdiag (that Mark Roberts mentioned above) does in principle.
I have been trying to solve this problem since we migrated our code base to VS2005. My first take at detouring msvcr80!malloc and the relatives was not successful due to intricacies with calloc implementation. I don’t remember the details – crt source did not help much, neither did the reversing. After awhile I’ve settled over a simplified approach in that I only detoured operator new/delete (and vector variants) to go over the system-provided msvcrt.dll.
This worked like a charm for me. Not sure if it’d helpful to anyone or work in any other environment w/o changes (it relies on ordinals in case of mfc80.dll which might be different on different VS service packs) than the one I had it tested on.
Hi,
I know this article is fairly old, but I used UMDH quite a lot a while ago, and was confused why some stacks stopped at malloc. Now I know. Thanks for the info.
Rich
Although this may be inferred from some of the above comments, I thought it may be helpful to explicitly say that enabling DebugDiag for memory leak detection does patch around the FPO-enabled malloc in msvc80.dll.
While it may seem strange to do so, I have used this characteristic of DebugDiag to work around the FPO problem while still doing the actual leak detection by other means (UMDH and/or my own debugger extension that reads the user stack trace database directly).
Amazing how useful this article has been to me. I’ve been trying to use windbg’s !heap commands to determine the cause of a heap overflow in an app that uses MSVCR80 and have had absolutely no luck in determining from that command the allocation stack trace (yeah, thanks a lot windbg for telling me that malloc was part of the problem).
I’ve had to reverse engineer the program from what I know, which has obviously been a less desirable way to track the issue.
I guess I will have to look at leakdiag and see if that helps.
Hey Ken, I actually looked into leakdiag, and I don’t think it’s going to help me with my problem… seems to have issues with a crashing process in that it won’t log the results… any other ideas?
Pretty classic heap corruption, but I hate the idea of having to reverse engineer all of this out if I don’t have to.