Last week, I described how the loader handles implicit TLS (as of Windows Server 2003). Although the loader’s support for implicit TLS works out well enough for what it was originally designed for, there are some cases where things do not turn out so happily. If you’ve been following along closely so far, you’ve probably already noticed some of the deficiencies relating to the design of implicit TLS. These defects in the design and implementation of TLS eventually spurred Microsoft to significantly revamp the loader’s implicit TLS support in Windows Vista.
The primary problem with respect to how Windows Server 2003 and earlier Windows versions support implicit TLS is that it just plain doesn’t work at all with DLLs that are dynamically loaded (via LoadLibrary, or LdrLoadDll). In fact, the way that implicit TLS fails if you try to dynamically load a DLL written to utilize it is actually rather spectacularly catastrophic.
What ends up happening is that the new DLL will have no TLS processing by the loader happen whatsoever. With our knowledge of how implicit TLS works at this point, the unfortunate consequences of this behavior should be readily apparent.
When a DLL using implicit TLS is loaded, because the loader doesn’t process the TLS directory, the _tls_index value is not initialized by the loader, nor is there space allocated for module’s TLS data in the ThreadLocalStoragePointer arrays of running threads. The DLL continues to load, however, and things will appear to work… until the first access to a __declspec(thread) variable occurs, that is.
The compiler typically initializes _tls_index to zero by default, so the value retains the value zero in the case where an implicit TLS using DLL is loaded after process initialization time. When an access to a __declspec(thread) variable occurs, the typical implicit TLS variable resolution process occurs. That is, ThreadLocalStoragePointer is fetched from the TEB and is indexed by _tls_index (which will always be zero), and the resultant pointer is assumed to be a pointer to the current thread’s thread local variables. Unfortunately, because the loader didn’t actually set _tls_index to a valid value, the DLL will reference the thread local variable storage of whichever module was legitimately assigned TLS index zero. This is typically going to be the main process executable, although it could be a DLL if the main process executable doesn’t use TLS but is static linked to a DLL that does use TLS.
This results in one of the absolute worst possible kinds of problems to debug. Now you’ve got one module trampling all over another module’s state, with the guilty module under the (mistaken) belief that the state that it’s referencing is really the guilty module’s own state. If you’re lucky, the process has no implicit TLS using at all (at process initialization time), and the ThreadLocalStoragePointer will not be allocated for the current thread and the initial access to a __declspec(thread) variable will simply result in an immediate null pointer dereference. More common, however, is the case that there is somebody in the process already using implicit TLS, in which case the module owning TLS index zero will have its thread local variables corrupted by the newly loaded module.
In this situation, the actual crash is typically long delayed until the first module finally gets around to using its thread local variable stage and fails due to the fact that it’s been overwritten, far after the fact. It is also possible that you’ll get lucky and the newly loaded module’s TLS variables will be much larger in size than the module with TLS index zero, in which case the initial access to the __declspec(thread) variable might immediately fault if it is sufficiently beyond the length of the heap allocation used for the already loaded module’s TLS variable storage. Of course, the offset of the variable accessed might be somewhere in between the edge of the current heap segment (page) and the end of the allocation used for the original module’s TLS variable storage, in which case heap corruption will occur instead of original module’s TLS variables for the current thread. (The loader uses the process heap to satisfy module TLS variable block allocations.)
Perhaps the only saving grace of the loader’s limitation with respect to implicit TLS and demand loaded DLLs is that due to the fact that the loader’s support for this situation has (not) operated correctly for so long now, many programmers know well enough to stay away from implicit TLS when used in conjunction with DLLs (or so I would hope).
These dire consequences of demand loading a module using __declspec(thread) variables are the reason for the seemingly after-the-fact warning about using implicit TLS with demand loaded DLLs in the LoadLibrary documentation on MSDN:
Windows Server 2003 and Windows XP: The Visual C++ compiler supports a syntax that enables you to declare thread-local variables: _declspec(thread). If you use this syntax in a DLL, you will not be able to load the DLL explicitly using LoadLibrary on versions of Windows prior to Windows Vista. If your DLL will be loaded explicitly, you must use the thread local storage functions instead of _declspec(thread). For an example, see Using Thread Local Storage in a Dynamic Link Library.
Clearly, the failure mode of demand loaded DLLs using implicit TLS is far from acceptable from a debugging perspective. Furthermore, this restriction puts a serious crimp in the practical usefulness of the otherwise highly useful __declspec(thread) support that has been baked into the compiler and linker, at least with respect to its usage in DLLs.
Fortunately, the Windows Vista loader takes some steps to address this problem, such that it becomes possible to use __declspec(thread) safely on Windows Vista and future operating system versions. The new loader support for implicit TLS in demand loaded DLLs is fairly complicated, though, due to some unfortunate design consequences of how implicit TLS works.
Next time, I’ll go in to some more details on just how the Windows Vista loader supports this scenario, as well as some of the caveats behind the implementation that is used in the loader going forward with Vista.