Moving is a pain

September 6th, 2006

Finally done moving stuff to my new place, as of yesterday. What a pain. Moving mattresses without a truck is certainly fun stuff . . .

In any case, I should now have some time to get back to the regularly scheduled blog posts – sorry for the gap in content there.

Status of the Blog

August 29th, 2006

Apologies for the lack of posts the last few days. I’ve not had sufficient free time to do really do justice to some articles yet, hence the apparent lack of activity. I should have some new content (incl. a further continuation of the calling convention series) up shortly.

That being said, there is going to be some short planned downtime of the blog for a short duration between this week and next week while I move apartments (as the blog is hosted on a box in my closet, this means it’ll be offline for a short while starting sometime in the next couple days until I get my landline Internet connection established at new place of residence). If the downtime is going to be for more than a day or so, I’ll point the domain at a static mirror of the blog until the regular blog server has an Internet connection again.

The system call dispatcher on x86

August 23rd, 2006

The system call dispatcher on x86 NT has undergone several revisions over the years.

Until recently, the primary method used to make system calls was the int 2e instruction (software interrupt, vector 0x2e). This is a fairly quick way to enter CPL 0 (kernel mode), and it is backwards compatible with all 32-bit capable x86 processors.

With Windows XP, the mainstream mechanism used to do system calls changed; From this point forward, the operating system selects a more optimized kernel transition mechanism based on your processor type. Pentium II and later processors will instead use the sysenter instruction, which is a more efficient mechanism of switching to CPL 0 (kernel mode), as it dispenses with some needless (in this case) overhead of usual interrupt dispatching.

How is this switch accomplished? Well, starting with Windows XP, the system service call stubs do not hardcode a particular instruction (say, int 2e) anymore. Instead, they indirect through a field in the KUSER_SHARED_DATA block (“SystemCall”). The meaning of this field changed in Windows XP SP2 and Windows Server 2003 SP1; in prior versions, the SystemCall field held the actual code used to make the system call (and was filled in at runtime with the proper values). In XP SP2 and Srv03 SP1, in the interests of reducing system attack surface, the KUSER_SHARED_DATA region was marked non-executable, and SystemCall becomes a pointer to a stub residing in NTDLL (with the pointer value being adjusted at runtime based on the processor type, to refer to an appropriate system call stub).

What this means for you today is that on modern systems, you can expect to see a sequence like so for system calls:

0:001> u ntdll!NtClose
ntdll!ZwClose:
7c821138 b81b000000       mov     eax,0x1b
7c82113d ba0003fe7f       mov     edx,0x7ffe0300
7c821142 ff12             call    dword ptr [edx]
7c821144 c20400           ret     0x4
7c821147 90               nop

0x7ffe0300 is +0x300 bytes into KUSER_SHARED_DATA. Looking at the structure definition, we can see that this is “SystemCall”:

0:001> dt ntdll!_KUSER_SHARED_DATA
   +0x000 TickCountLowDeprecated : Uint4B
   +0x004 TickCountMultiplier : Uint4B
   +0x008 InterruptTime    : _KSYSTEM_TIME
   [...]
   +0x300 SystemCall       : Uint4B
   +0x304 SystemCallReturn : Uint4B
   +0x308 SystemCallPad    : [3] Uint8B
   [...]

Since my system is Srv03 SP1, SystemCall is a pointer to a stub in NTDLL.

0:001> u poi(0x7ffe0300)
ntdll!KiFastSystemCall:
7c82ed50 8bd4             mov     edx,esp
7c82ed52 0f34             sysenter
ntdll!KiFastSystemCallRet:
7c82ed54 c3               ret

On my system, the system call dispatcher is using sysenter. You can look at the old int 2e dispatcher if you wish, as it is still supported for compatibility with older processors:

0:001> u ntdll!KiIntsystemCall
ntdll!KiIntSystemCall:
7c82ed60 8d542408         lea     edx,[esp+0x8]
7c82ed64 cd2e             int     2e
7c82ed66 c3               ret

The actual calling convention used by the system call dispatcher is thus:

  • eax contains the system call ordinal.
  • edx points to either the argument array of the system call on the stack (for int 2e), or the return address plus argument array (for sysenter).

For most of the time, though, you’ll probably not be dealing directly with the system call dispatching mechanism itself. If you are, however, now you know how it works.

Beware of the sign bit

August 22nd, 2006

Signedness can trip you up in unexpected ways.

One example of this is the fairly innocuous islower function in the C standard library. This function takes an int and returns a value indicating to you whether it represents a lowercase or uppercase character.

The prototype for this function is:

int islower(
   int c 
);

Unfortunately, due to how C works, this code is likely to introduce subtle bugs depending on the input you give it. For instance, if you have a char array and iterate through it checking if each character is lowercase, you might naively write a program like so:

char ch[] = "...";
for (int = 0; ch[i]; i++)
{
 if (islower(ch[i]))
 {
   ...
 }
}

This code has a nasty bug in it, though, in that if your compiler defaults to char as an 8-bit signed value (most every mainstream compiler on mainstream platforms does nowadays), and if you are given a character value that has more than 7 significant bits (say, 150), you will go off into undefined-behavior-land because the compiler will sign extend ch[i] to a negative int value of -150 instead of an int value of 150. Depending on the implementation of islower, this could have various different (bad) effects; for the Microsoft C implementation, islower indexes into an array based on the given argument, so you’ll underrun an array and get garbage results back.

“Find all references” in VS 8

August 21st, 2006

There is a cool new feature in Visual Studio 8’s editor UI that is pretty useful if you are trying to determine the scope of a particular change or crossreference a variable or function to determine where it is used: “find all references”.

You can activate this by right clicking a symbol in the source code editor and selecting “find all references”.  A window will open near the build output that lists all source references to the symbol in the current solution (read: workspace).  This is also pretty useful in debugging, if you want to determine where a particular variable might be modified in a particular way.  For example, I used this recently to pull up a list of all locations that might set a particular class pointer to null when debugging a null pointer dereference bug.

Another good use for this is if you are working in unfamiliar code and want to get a feel for the scope of a change you make.  Of course, this utility has some limitations (it only searches within the current workspace), but it can be very handy to quickly gauge how impactful a change might be.

You might be using unhandled exception filters without even knowing it.

August 18th, 2006

In a previous posting, I discussed some of the pitfalls of unhandled exception filters (and how they can become a security problem for your application). I mentioned some guidelines you can use to help work around these problems and minimize the risk, but, as I alluded to earlier, the problem is actually worse than it might appear on the surface.

The real gotcha about unhandled exception filters is that you have probably used them before in programs or DLLs and not even known that you were using them, which makes it very hard to not use them in dangerous situations. How can this be, you might ask? Well, it turns out that the Microsoft C runtime library uses an unhandled exception filter to catch unhandled C++ exceptions and call the terminate handler registered by set_terminate.

This unhandled exception filter is setup by the internal CRT functions _cinit (via _initterm_e). If you have the CRT source handy, this lives in crt0dat.c. The call looks like:

/*
* do initializations
*/
initret = _initterm_e( __xi_a, __xi_z );

Here, “__xi_a” and “__xi_z” define the bounds of an array of function pointers to initializers called during the CRT’s initialization. There is a pointer to a function (_CxxSetUnhandledExceptionFilter) that sets up the unhandled exception filter for C++ exceptions in this array. Unfortunately, source code for the function used to setup _CxxUnhandledExceptionFilter is not present, but you can find it by looking at the CRT in a disassembler.

push    offset CxxUnhandledExceptionFilter
call    SetUnhandledExceptionFilter
mov     lpTopLevelExceptionFilter, eax
xor     eax, eax
retn

This is pretty standard; it is just saving away the old exception filter and registering its new exception filter. The unhandled exception filter itself checks for a C++ exception – if found, it calls terminate, otherwise it tries to verify that the previous exception filter points to executable code, and if so, it will call it.

push    esi
mov     esi, [esp+arg_0]
mov     eax, [esi]
cmp     dword ptr [eax], 0E06D7363h
jnz     short not_cpp_except
cmp     dword ptr [eax+10h], 3
jnz     short not_cpp_except
mov     eax, [eax+14h]
cmp     eax, 19930520h
jz      short is_cpp_except
cmp     eax, 19930521h
jnz     short not_cpp_except 

is_cpp_except:
call    terminate

not_cpp_except:
mov     eax, lpTopLevelExceptionFilter
test    eax, eax
jz      short old_filter_unloaded
push    eax             ; lpfn
call    _ValidateExecute
test    eax, eax
pop     ecx
jz      short old_filter_unloaded
push    esi
call    lpTopLevelExceptionFilter
jmp     short done

old_filter_unloaded:
xor     eax, eax

done:
pop     esi
retn    4

The problem with the latter validation is there is no way to tell if the code is part of a legitimate DLL, or part of the heap or some other allocation that has moved over where a DLL had previously been unloaded, which is where the security risk is introduced.

So, we have established that the CRT potentially does bad things by installing an unhandled exception filter – so what? Well, if you link to the DLL version of the CRT, you are probably fine. The CRT DLL is unlikely to be unloaded during the process lifetime and will only be initialized once.

The kicker is if you linked to the static (non-DLL) version of the CRT. This is where things start to get dicey. The dangerous combination here is that each image linked to the static version of the CRT will have its own copy of _cinit, and its own copy of _CxxSetUnhandledExceptionFilter, its own copy of _CxxUnhandledExceptionFilter, and soforth. What this boils down to is that every image linked to the static version of the Microsoft C runtime installs an unhandled exception filter. So, if you have a DLL (say one that hosts an ActiveX object) which links to the static CRT (which is pretty attractive, as for plugin type DLLs you don’t want to have to write a separate installer to ensure that end users have that cumbersome msvcr80.dll), then you’re in trouble. Since this is an especially common scenario (plugin DLL linking to the static CRT), you have probably ended up using an unhandled exception filter without knowing it (and probably without realizing the implications of doing so) – simply by making an ActiveX control usable by Internet Explorer, for example. This really turns into a worst case scenario when it comes to DLLs that host ActiveX objects. These are DLLs that are going to be frequently loaded and unloaded, are controllable by untrusted script, and are very likely to link to the static CRT to get out of the headache of having to manage installation of the DLL CRT version. If you put all of these things together and throw in any kind of crash bug, you’ve got a recipie for remote code execution. What is even worse is that this isn’t just quick-fixable with a patch to the CRT, as the vulnerable CRT version is compiled into your binaries and not in its own hotfixable standalone DLL.

So, in order to be truly safe from the dangers of unhandled exception filters, you also need to rid your programs of the static CRT. Yes, it does make setup more of a pain, but the DLL CRT is superior in many ways (not to mention that it doesn’t suffer from this security problem!).

Win32 calling conventions: __cdecl in assembler

August 17th, 2006

Continuing on the series about Win32 calling conventions, the next topic of discussion is how the various calling conventions look from an assembler level.

This is useful to know for a variety of reasons; if you are reverse engineering (or debugging) something, one of the first steps is figuring out the calling convention for a function you are working with, so that you know how to find the arguments for it, how it deals with the stack, and soforth.

For this post, I’ll concentrate primarily on __cdecl.  Future posts will cover the other major calling conventions.

As I have previously described, __cdecl is an entirely stack based calling convention, in which arguments are cleaned off the stack by the caller.  Given this, you can expect to see all of the arguments for a function placed onto the stack before a function call is main.  If you are using CL, then this is almost always done by using the “push” instruction to place arguments on the stack.

Consider the following simple example function:

__declspec(noinline)
int __cdecl CdeclFunction1(int a, int b, int c)
{
 return (a + b) * c;
}

First, we’ll take a look at what calls to a __cdecl function look like. For example, if we look at a call to the function described above like so:

CdeclFunction1(1, 2, 3);

… we’ll see something like this:

; 119  : 	int v = CdeclFunction1(1, 2, 3);

  00000	6a 03		 push	 3
  00002	6a 02		 push	 2
  00004	6a 01		 push	 1
  00006	e8 00 00 00 00	 call	 CdeclFunction1
  0000b	83 c4 0c	 add	 esp, 12

There are basically three different things going on here.

  1. Setting up arguments for the target function. This is what the three different “push” instructions do. Note that the arguments are pushed in reverse order – you’ll always see them in reverse order if they are placed on the stack via push.
  2. Making the actual function call itself. After all the arguments are in place, the “call” instruction is used to transfer execution to the target. Remember that on x86, the call instruction implicitly pushes the return address on the stack. After the function call returns, the return value of the function is stored in the eax (or edx:eax) registers, typically.
  3. Cleaning arguments off the stack after the function returns. This is the purpose of the “add esp, 0xc” instruction following the “call” instruction. Since the target function does not adjust the stack to remove arguments after the call, this is up to the calller. Sometimes, you may see multiple __cdecl function calls be made in rapid succession, with the compiler only cleaning arguments from the stack after all of the function calls have been made (turning many different “add esp” instructions into just one “add esp” instruction).

It is also worth looking at the implementation of the function to see what it does with the arguments passed in and how it sets up a return value. The assembler for CdeclFunction1 is as so:

CdeclFunction1 proc near

a= dword ptr  4
b= dword ptr  8
c= dword ptr  0Ch

mov     eax, [esp+8]    ; eax = b
mov     ecx, [esp+4]    ; ecx = a
add     eax, ecx        ; eax = eax + ecx
imul    eax, [esp+0Ch]  ; eax = eax * c
retn                    ; (return value = eax)
CdeclFunction1 endp

This function is fairly straightforward. Since __cdecl is stack based for argument passing, all of the parameters are on the stack. Recall that the “call” instruction pushes the return address onto the stack, so the stack will begin with the return value at [esp+0] and have the first argument at [esp+4]. A graphical view of the stack layout (relative to “esp”) of this function is thus:

+00000000  r              db 4 dup(?)      ; (Return address)
+00000004 a               dd ?
+00000008 b               dd ?
+0000000C c               dd ?

In this case, there is no frame pointer in use, so the function accesses all of the arguments directly relative to “esp”. The steps taken are:

  1. The function fills eax with the value of the second argument (b), located at [esp+8] according to our stack layout.
  2. Next, the function loads ecx with the value of the first argument (a), which is located at [esp+4].
  3. Next, the function adds to eax the value of the first argument (a), now stored in the ecx register.
  4. Finally, the function multiplies eax by the value of the third argument (c), located at [esp+c].
  5. After finishing with all of the computations needed to implement the function, it simply returns with a “retn” instruction. Since the caller cleans the stack, the “retn” intruction (with a stack adjustment) is not used here; __cdecl functions never use “retn <displacement>“, only “retn”. Additionally, because the result of the “mul” instruction happened to be stored in the eax register here, no extra instructions are needed to set up the return value, as it is already stored in the return value register (eax) at the end of the function.

Most __cdecl function calls are very similar to the one discussed above, although there will typically be much more code to the actual function and the function call (if there are many arguments), and the compiler may play some optimization tricks (such as deferring cleaning the stack across several function calls). The basic things to look for with a __cdecl function are:

  • All arguments are on the stack.
  • The return instruction is “retn” and not “retn <displacement>“, even when there are a non-zero number of arguments.
  • Shortly after the function call returns, the caller cleans the stack of arguments pushed. This may be deferred later, depending on how the compiler assembled the caller.

Note that if you have been paying attention, given the above criteria, you’ve probably noticed that a __cdecl function with zero arguments will look identical to an __stdcall function with zero arguments. If you don’t have symbols or decorated function names, there is no way to tell the two apart when there are no arguments, as the semantics are the same in that special case.

That’s all for a basic overview of __cdecl from an assembler perspective. Next time: more on the other calling conventions at an assembly level.

Beware of custom unhandled exception filters in DLLs

August 16th, 2006

Previously, I had discussed some techniques for debugging unhandled exception filters.  There are some more gotchas relating to unhandled exception filters than just debugging them, though.

The problem with unhandled exception filters is that they are broken by design.  The API (SetUnhandledExceptionFilter) used to install them allows you to build a chain of unhandled exception filters as multiple images within a process install their own filter.  While this may seem fine in practice, it actually turns out to be a serious flaw.  The problem is that there is no support for removing these unhandled exception filters out of order.  If you do so, you often end up with a previous unhandled exception filter pointer used by some DLL that points to a now-unloaded DLL, because some DLL with an unhandled exception filter was unloaded, but the unhandled exception filter registered after it still has a pointer to the previous filter in the now unloaded DLL.

This turns out to (at best) cause your custom crash handling logic to appear to randomly fail to operate, and at worst, introduce serious security holes in your program.  You can read more about the security hole this introduces in the paper on Uninformed.org, but the basic idea is that if unhandled exception filters are unregistered out of order, you have a “dangling” function pointer that points to no-mans-land.  If an attacker can fill your process address space with shell code and then cause an exception (perhaps an otherwise “harmless” null pointer dereference that would cause your program to crash), he or she can take control of your process and run arbitrary code.

Unfortunately, there isn’t a good way to fix this from an application perspective.  I would recommend just not ever calling the previous unhandled exception filter, as there is no way to know whether it points to the real code that registered it or malicious code that someone allocated all over your process address space (called “heap spraying” in exploitation terminology).

You still have to deal with the fact that someone else might later install an unhandled exception filter ahead of yours, though, and then cause the unhandled exception filter chain to be broken upstream of you.  There is no real good solution for this; you might investigate patching SetUnhandledExceptionFilter or UnhandledExceptionFilter to always call you, but you can imagine what would happen if two functions try to do this at the same time.

So, the moral of the story is as follows:

  1. Don’t trust unhandled exception filters, as the model is brittle and easily breaks in processes that load and unload DLLs frequently.
  2. If you must register an unhandled exception filter, do it in a DLL that is never unloaded (or even the main .exe) to prevent the unhandled exception filter from being used as an attack vector.
  3. Don’t try to call the previous unhandled exception filter pointer that you get back from SetUnhandledExceptionFilter, as this introduces a security risk.
  4. Don’t install an unhandled exception filter from within a plugin type DLL that is loaded in a third party application, and especially don’t install an unhandled exception filter in a plugin type DLL that gets unloaded on the fly.

Unfortunately, it turns out to be even harder than this to not get burned by unhandled exception filter chaining.  More on that in a future posting.

Win32 calling conventions: Usage cases

August 15th, 2006

Last time, I talked about some of the general concepts behind the varying calling conventions in use on Win32 (x86).  This posting focuses on the implications and usage cases behind each of the calling conventions, in an effort to provide a better understanding as to when you’ll see them used.

When looking at the different calling conventions, we can see that there are a number of differences between them.  Stack usage vs register parameters, caller vs callee cleans stack, member function calls vs “plain C” function calls, and soforth.  These differences lend each calling convention to specific cases where they are best suited.

To begin with, consider the __cdecl calling convention.  We know that it is a stack based calling convention where the caller cleans the stack.  Furthermore, we know that it is the default calling convention for CL (big hint as to when you’ll see this being used!).  These attributes make it well suited for a couple of cases:

  • Variadic functions, or functions with an ellipsis (…) terminating the argument list.  These functions have a variable number of arguments, which is not known at compile time of the callee.  __cdecl is useful for these functions because the compiler needs to implement a stack displacement to clean arguments off of the stack, but it doesn’t know how many arguments there are.  By leaving the argument-disposal up to the caller, who does know the number of arguments at compile time, the compiler doesn’t need special help from the programmer to correctly adjust the stack when the variadic function is going to return – it “just works”.  __cdecl is the only calling convention on Win32 x86 that supports variadic functions when used with CL.
  • Old-style C functions without prototypes.  For compatibility with legacy C code, the C compiler needs to support making function calls to unprototyped functions.  These must be treated as if they were variadic functions, because the compiler doesn’t know whether the function takes a fixed number of arguments or not (because there is no prototyped argument list).
  • Any other case where the programmer does not explicitly override the calling convention.  The default Visual Studio build environment will use the compiler default calling convention if you do not explicitly tell it otherwise, and this goes to __cdecl.  Some build environments (the DDK/build.exe platform in particular) default to different calling conventions, but Visual Studio built programs will always default to __cdecl if you are using CL.

Next, we’ll take a look at __stdcall.  This calling convention is the standard for Win32 APIs; virtually all system APIs are __stdcall (typically decorated as “WINAPI”, “NTAPI”, or “CALLBACK” in the headers, which are macros that expand to __stdcall).  Here are the typcial usage cases for __stdcall (and the “why” behind them):

  • Library functions.  Excepting the C runtime libraries, virtually all Microsoft-shipped Win32 libraries use __stdcall.  The main reason for this (if you discount the “that’s the way it has always been”) is that you save some instruction code space by using __stdcall and not __cdecl for library functions.  The reason for this is that for __cdecl functions, the caller typically needs to adjust the stack pointer after every “call” instruction to a __cdecl function (which takes up instruction code space – typically an “add esp, imm8” opcode).  For __stdcall functions, you only pay this penality once, in the “retn imm16” opcode at the end of the function (as opposed to once for every caller).  For frequently called functions (say, ReadFile), this begins to add up.  You also theoretically save a bit of processor time and cache space, as there is one less instruction to be executed per “call”.
  • COM functions.  COM uses __stdcall with the “this” pointer being the first argument, which is a required part of the COM API contract for publicly accessible functions.
  • Functions that need to be called from a language other than C/C++.  This also ties back into the COM and library function purposes, but of all of the calling conventions discussed here, only __stdcall has practically universal support among non-Microsoft or non-C/C++ compilers for x86 Win32 (such as Visual Basic, or Delphi).  As a result, it is advantageous to use __stdcall if you are expecting to be called from other languages.
  • Microsoft-built programs.  Microsoft defaults their programs to __stdcall and not __cdecl virtually everywhere, even in images that don’t export functions, or in internally, non-exported funtions within a system library.  This also applies to Microsoft kernel mode code, such as the HAL and the kernel itself.
  • Programs built with the DDK.  The DDK defaults to __stdcall and not __cdecl.
  • NT kernel drivers.  These are always (or at least should always be!) built with the DDK, which again, defaults to __stdcall.

There yet remains __fastcall to discuss.  This calling convention is not used as extensively as the other two (no Microsoft build environment that I am aware of defaults to it), so most of the cases for it being used are the result of a programmer explicitly requesting it.

  • Functions that do not call other functions (“leaf functions”).  These are good candidates for __fastcall because the register arguments are passed in volatile registers, so there is a penalty associated with __fastcall functions that call subfunctions and need to use their arguments across those function calls, as this requires the __fastcall function to save arguments to somewhere nonvolatile (i.e. the stack), and that defeats the whole purpose of __fastcall entirely.
  • Functions that do not use their arguments after the first subfunction call.  These can still benefit from __fastcall without the penalty mentioned above relating to preserving arguments across function calls.
  • Short functions that call other functions and then return.  If you can make both functions __fastcall, then sometimes the compiler can be clever and not need to re-load the argument registers when a __fastcall function calls a __fastcall subfunction.  This can be useful for “wrapper” functions in some cases.
  • Functions that interface with assembly code.  Sometimes it can be more convenient to make a C function called by assembler code __fastcall, because this can save you the work of manually tracking stack displacements.  It can also sometimes be more convenient to make assembler functions called by C code __fastcall as well, for similar reasons.

In general, __fastcall is relatively rare.  There are a couple of kernel functions that use it (KfRaiseIrql, for instance).  A couple of software vendors (such as Blizzard Entertainment) seem to like to ship things compiled with __fastcall as the default calling convention, but this not the common case, and usually not a good idea.

Finally, there is __thiscall.  This calling convention is only used if you are using the default calling convention for member functions.  Note that for member functions that are not accessible cross-program (e.g. not exported somehow), the compiler will sometimes replace ecx with ebx for the “this” pointer as a custom calling convention, depending on your optimization settings.

That’s all for this installment.  In the next posting, I’ll discuss what the various calling conventions look like at a low level (assembly), and what this means to you.

Win32 calling conventions: Concepts

August 14th, 2006

If you have done debugging work on Win32 (x86) for any length of time, you probably know that there are many different calling conventions.  While I have already covered the Win64 (X64) calling convention, I haven’t yet gone into details about how to work with the various calling conventions present on x86 Windows.  This miniseries is going to go into some detail relating to how the calling conventions work from the perspective of debugging and reverse engineering.

There are three major calling conventions in use in modern Win32 (on x86, anyway): __stdcall, __cdecl, and __fastcall.  All three have different characteristics that are important if you are debugging or reverse engineering something, and it is important to be able to recognize and work with functions written against all three calling conventions – even if you don’t have access to symbols.  To start off, it’s probably best to get a feel for what all of the different calling conventions do, how they work, and soforth.

Most of the calling conventions have some things in common.  All of the calling conventions have the same set of volatile registers, which are not required to remain the same across a call site: eax, ecx, and edx.  Additionally, all three calling conventions use eax for 32-bit return values, and eax:edx for 64-bit return values (high 32 bits in edx).  For large return values (e.g. functions that return a structure, and not a pointer to a structure), a hidden argument is passed to a hidden local variable in the caller’s stack frame that represents the large return value, and the return value is filled in using the hidden argument (that is, large return values are actually implemented as a hidden pointer parameter).

  • __cdecl is the default calling convention for the Microsoft C/C++ compiler.  This is an entirely stack based calling convention for parameter passing; the caller cleans the arguments off of the stack when the call returns.
  • __stdcall has the same semantics as __cdecl, except that the callee (target function) cleans the arguments off of the stack instead of the caller.
  • __fastcall passes the first two register-sized arguments in ecx and edx, with the remaining arguments passed on the stack like __stdcall.  If any stack based arguments were present, the callee cleans them off of the stack.

Additionally, there is a fourth calling convention implemented by CL, known as __thiscall, which is like __stdcall except that there is a hidden pointer argument representing a class object (“this” in C++) that is passed in ecx.  As you might imagine, __thiscall is used for non-static class member function calls (by default).  If you explicitly override the calling convention of member functions, then “this” becomes a hidden (first) argument that is passed according to the conventions of the overriding calling convention.  In general, __thiscall is not really used in the Windows API.

That’s a general overview of the basic concepts behind all of the major Win32 x86 calling convention.  In some upcoming posts, I’ll explore the implications behind the different attributes of these calling conventions, their usage cases, and how they apply to you when you are debugging or reverse engineering something.