Last time, I discussed the mechanisms by which so-called explicit TLS operates (the TlsGetValue, TlsSetValue and other associated supporting routines).
Although explicit TLS is certainly fairly heavily used, many of the more “interesting” pieces about how TLS works in fact relate to the work that the loader does to support implicit TLS, or __declspec(thread) variables (in CL). While both TLS mechanisms are designed to provide a similar effect, namely the capability to store information on a per-thread basis, many aspects of the implementations of the two different mechanisms are very different.
When you declare a variable with the __declspec(thread) extended storage class, the compiler and linker cooperate to allocate storage for the variable in a special region in the executable image. By convention, all variables with the __declspec(thread) storage class are placed in the .tls section of a PE image, although this is not technically required (in fact, the thread local variables do not even really need to be in their own section, merely contiguous in memory, at least from the loader’s perspective). On disk, this region of memory contains the initializer data for all thread local variables in a particular image. However, this data is never actually modified and references to a particular thread local variable will never refer to an address within this section of the PE image; the data is merely a “template” to be used when allocating storage for thread local variables after a thread has been created.
The compiler and linker also make use of several special variables in the context of implicit TLS support. Specifically, a variable by the name of _tls_used (of the type IMAGE_TLS_DIRECTORY) is created by a portion of the C runtime that is static linked into every program to represent the TLS directory that will be used in the final image (references to this variable should be extern “C” in C++ code for name decoration purposes, and storage for the variable need not be allocated as the supporting CRT stub code already creates the variable). The TLS directory is a part of the PE header of an executable image which describes to the loader how the image’s thread local variables are to be managed. The linker looks for a variable by the name of _tls_used and ensures that in the on-disk image, it overlaps with the actual TLS directory in the final image.
The source code for the particular section of C runtime logic that declares _tls_used lives in the tlssup.c file (which comes with Visual Studio), making the variable pseudo-documented. The standard declaration for _tls_used is as so:
_CRTALLOC(".rdata$T") const IMAGE_TLS_DIRECTORY _tls_used = { (ULONG)(ULONG_PTR) &_tls_start, // start of tls data (ULONG)(ULONG_PTR) &_tls_end, // end of tls data (ULONG)(ULONG_PTR) &_tls_index, // address of tls_index (ULONG)(ULONG_PTR) (&__xl_a+1), // pointer to callbacks (ULONG) 0, // size of tls zero fill (ULONG) 0 // characteristics };
The CRT code also provides a mechanism to allow a program to register a set of TLS callbacks, which are functions with a similar prototype to DllMain that are called when a thread starts or exits (cleanly) in the current process. (These callbacks can even be registered for a main process image, where there is no DllMain routine.) The callbacks are typed as PIMAGE_TLS_CALLBACK, and the TLS directory points to a null-terminated array of callbacks (called in sequence).
For a typical image, there will not exist any TLS callbacks (in practice, almost everything uses DllMain to perform per-thread initialization tasks). However, the support is retained and is fully functional. To use the support that the CRT provides for TLS callbacks, one needs to declare a variable that is stored in the specially named “.CRT$XLx” section, where x is a value between A and Z. For example, one might write the following code:
#pragma section(".CRT$XLY",long,read) extern "C" __declspec(allocate(".CRT$XLY")) PIMAGE_TLS_CALLBACK _xl_y = MyTlsCallback;
The strange business with the special section names is required because the in-memory ordering of the TLS callback pointers is significant. To understand what is happening with this peculiar looking declaration, it is first necessary to understand a bit about the compiler and linker organize data in the final PE image that is produced.
Non-header data in a PE image is placed into one or more sections, which are regions of memory with a common set of attributes (such as page protection). The __declspec(allocate(“section-name”)) keyword (CL-specific) tells the compiler that a particular variable is to be placed in a specific section in the final executable. The compiler additionally has support for concatenating similarly-named sections into one larger section. This support is activated by prefixing a section name with a $ character followed by any other text. The compiler concatenates the resulting section with the section of the same name, truncated at the $ character (inclusive).
The compiler alphabetically orders individual sections when concatenating them (due to the usage of the $ character in the section name). This means that in-memory (in the final executable image), a variable in the “.CRT$XLB” section will be after a variable in the “.CRT$XLA” section but before a variable in “.CRT$XLZ” section. The C runtime uses this quirk of the compiler to create an array of null terminated function pointers to TLS callbacks (with the pointer stored in the “.CRT$XLZ” section being the null terminator). Thus, in order to ensure that the declared function pointer resides within the confines of the TLS callback array being referenced by _tls_used, it is necessary place in a section of the form “.CRT$XLx“.
The creation of the TLS directory is, however, only one portion of how the compiler and linker work together to support __declspec(thread) variables. Next time, I’ll discuss just how the compiler and linker manage accesses to such variables.
Update: Phil mentions that this support for TLS callbacks does not work before the Visual Studio 2005 release. Be warned if you are still using an old compiler package.