It’s been awhile since my last post, unfortunately, primarily due to my being a bit swamped with work and a couple of other things as of late. With that said, I’m going to start by picking up where I had previously left off with the Win32 calling conventions series. Without further ado, here’s the stuff on __stdcall as you’ll see it in assembler…
Like __cdecl, __stdcall is completely stack-based. The semantics of __stdcall are very similar to __cdecl, except that the arguments are cleaned off the stack by the callee instead of the caller. Because the number of arguments removed from the stack is burned into the target function at compile time, there is no support for variadic functions (functions that take a variable number of arguments, such as printf) that use the __stdcall calling convention. The rules for register usage and return values are otherwise identical to __cdecl.
In practice, this typically means that an __stdcall function call will look much like a __cdecl function call until you examine the ret instruction that returns transfer to the caller at the end of the __stdcall function in question. (Alternatively, you can look to see if it appears as if stack arguments are cleaned after the function call. However, the compiler/optimizer sometimes likes to be tricky with __cdecl functions, and defer argument removal until several function calls later, so this method is less reliable.)
Because the callee cleans the arguments off the stack in an __stdcall function, you will always[1] see a ret instruction terminating a __stdcall function. For most functions, this count is four times the number of arguments to the function, but this can vary if arguments that are larger than 32-bits are passed. On Win32, this argument count in bytes value is virtually always[2] a multiple of four, as the compiler will always generate code that aligns the stack to at least four bytes for x86 targets.
Given this information, it is usually fairly easy to distinguish an __stdcall function from a __cdecl function, as a __cdecl function will never use an argument to ret. Note that this does imply, however, that it is generally not possible to disinguish between an __stdcall function and a __cdecl function in the case that both take zero arguments (without any other outside information other than disassembly); in this special case, the calling conventions have the same semantics. This also means that if you have a function that does not clean any bytes off the stack with ret, you’ll technically have to examine any callers of the function to see if any pass more than zero arguments (or the actual function implementation itself, to see if it ever expects more than zero arguments) in order to be absolutely sure if the function is __cdecl or __stdcall.
Here’s an example of a simple __stdcall function call for the following C function:
Â
__declspec(noinline)
int __stdcall StdcallFunction1(int a, int b, int c)
{
 return (a + b) * c;
}
If we call the function like this:
StdcallFunction1(1, 2, 3);
… we can expect to see something like so, for the call:
push   3
push   2
push   1
call   StdcallFunction1
(There will be no add esp instruction after the call.)
This is quite similar to a __cdecl declared function with the same implementation. The only difference is the lack of an add esp instruction following the call.
Looking at the function implementation, we can see that unlike the __cdecl version of this function, StdcallFunction1 removes the arguments from the stack:
StdcallFunction1 proc near
a= dword ptr 4
b= dword ptr 8
c= dword ptr 0Ch
mov    eax, [esp+8] ; eax = b
mov    ecx, [esp+4] ; ecx = a
add    eax, ecx    ; eax = eax + ecx
imul   eax, [esp+c] ; eax = eax * c
retn   0Ch         ; (return value = eax)
StdcallFunction1 endp
As expected, the only difference here is that the __stdcall version of the function cleans the three arguments off the stack. The function is otherwise identical to the __cdecl version, with the return value stored in eax.
With all of this information, you should be able to rather reliably identify most __stdcall functions. The key things to look out for are:
- All arguments are on the stack.
- The ret instruction terminating the function has a non-zero argument count if the number of arguments for the function is non-zero.
- The ret instruction terminating the function has an argument count that is at least four times the number of arguments for the function.  (If the count is less than four, then the function might be a __fastcall function with three or more arguments. The __fastcall calling convention passes the first two 32-bit or smaller arguments in registers.)
- The function does not depend on the state of the ecx and edx volatile variables. (If the function expects these registers to have a meaningful value initially, then the function is probably a __fastcall or __thiscall function, as those calling conventions pass arguments in the ecx and edx registers.)Â
In the next post in this series, I’ll cover the __fastcall calling convention (and hopefully it won’t be such a long wait this time). Stay tuned…
Â
[1]: For functions declared as __declspec(noreturn) or that otherwise never normally return execution control directly to the caller (i.e. a function that always throws an exception), the ret instruction is typically omitted. There are a couple of other rare cases where you may see no terminating ret, such as if there are two functions, where one function calls the second, and both have very similar prototypes (such as argument ordering or an additional defaulted argument).  In this case, the compiler may combine two functions by having one perform minor adjustments to the stack and then “falling through” directly to the second function.
[2]: If you see a function with a ret instruction that does not take a multiple of four as its argument, then the function was most likely hand-written in assembler. The Microsoft compiler will never, to my knowledge, generate code like this (and neither should any sane Win32 compiler).