Last time, I talked about some of the basic differences you’ll see when switching to an x64 system if you are doing debugging using the Debugging Tools for Windows package. In this installment, I’ll run through some of the other differences with debugging that you’ll likely run into – in particular, how changes to the x64 calling convention will make your life much easier when debugging.
Although the x64 architecture is in many respects very similar to x86, many of the conventions of x86-Win32 that you might be familiar with have changed. Microsoft took the opportunity to “clean house” with many aspects of Win64, since for native x64 programs, there is no concern of backwards binary compatibility.
One of the major changes that you will quickly discover is that the calling conventions that x86 used (__fastcall, __cdecl, __stdcall) are not applicable to x64. Instead of many different calling conventions, x64 unifies everything into a single calling conention that all functions use. You can read the full details of the new calling convention on MSDN, but I’ll give you the executive summary as it applies to debugging programs here.
-  The first four arguments of a function are passed as registers; rcx, rdx, r8, and r9 respectively. Subsequent arguments are passed on the stack.
- The caller allocates the space on the stack for parameter passing, like for __stdcall on x86. However, the caller must allocate at least 32 bytes of stack space for the callee to use a “register home space” the first four parameters (or scratch space). This must be done even if the callee has no arguments or less than four arguments.
- The caller always cleans the stack of arguments passed (like __cdecl on x86) if necessary.
- Stack unwinding and exception handling are significantly different on x64; more details on that later. The new stack unwinding model is data-driven rather than code-driven (like on x86).
- Except for dynamic stack adjustments (like _alloca), all stack space must be allocated in the prologue. Effectively, for most functions, the stack pointer will remain constant throughout the execution process.
- The rax register is used for return values. For return values larger than 64 bits, a hidden pointer argument is used. There is no more spillover into a second register for large return values (like edx:eax, on x86).
- The rax, rcx, rdx, r8, r9, r10, r11 registers are volatile, all other registers must be preserved. For floating point usage, the xmm0, xmm1, xmml2, xmm3, xmm4, xmm5 registers are volatile, and the other registers must be preserved.
- For floating point arguments, the xmm0 through xmm3 registers are used for the first four arguments, after which stack spillover is performed.
- The instructions permitted in function prologues and epilogues are highly restricted to a very small subset of the instruction set to facilitate unwinding operations.
The main takeaways here from a debugging pespective are thus:
- Even though a register calling convention like __fastcall is used, the register arguments are often spilled to the “home area” and so are typically visible in call stacks, especially in debug builds.
- Due to the nature of parameter passing on x64, the “push” instruction is seldom used for setting up arguments. Instead, the compiler allocates all space up front (like for local variables on x86) and uses the “mov” instruction to write stack parameters onto the stack for function calls. This also means that you typically will not see an “add rsp” (or equivalent) after each function call, despite the fact that the caller cleans the stack space.
- The first stack arguments (argument 5, etc) will appear at [rsp+28h] instead of [rsp+08h], because of the mandatory register home area. This is a departure from how __fastcall worked on x86, where the first stack argument would be at [esp+04h].
- Because of the data driven unwind semantics, you will see perfect stack unwinding even without symbols. This means that even if you don’t have any symbols at all for a third party binary, you should always get a complete stack trace all the way back to the thread start routine. As a side effect, this means that the stack traces captured by PageHeap or handle traces will be much more reliable than on x86, where they tended break at the first function that did not use ebp (because those stack traces never used symbols).
- Because of the restrictions on the prologue and epilogue instruction usage, it is very easy to recognize where the actual important function code begins and the boilerplate prologue/epilogue code ends.
If you’ve been debugging on x86 for a long time, then you are probably pretty excited about the features of the new calling convention. Because of the perfect unwind semantics and constant stack pointer throughout function execution model, debugging code that you don’t have symbols for (and using the built-in heap and handle verification utilities) is much more reliable than x86. Additionally, compiler generated code is usually easier to understand, because you don’t have to manually track the value of the stack pointer changing throughout the function call like you often did on x86 functions compiled with frame pointer omission (FPO) optimizations.
 There are some exceptions to the rules I laid out above for the x64 calling convention. For functions that do not call any other functions (called “leaf” functions), it is permissible to utilize custom calling conventions so long as the stack pointer (rsp) is not modified. If the stack pointer is modified then regular calling convention semantics are required.
Next time, I’ll go into more detail on how exception handling and unwinding is different on x64 from the perspective of what the changes mean to you if you are debugging programs, and how you can access some of the metadata associated with unwinding/exception handling and use it to your advantage within the debugger.
Hi,
Thanks for sharing this useful information.
I am currently porting 64 bit application with C and Assembly code (x86-64) from linux to windows.
I have modified the assembly files as per the 64-bit ABI specification for windows and i do get expected results when i use “RTCs” flag which is meant for stack pointer verification. This flag also requires optimization flags to be turned off.
The problem starts when i dont use the “RTCs” (which is the case in release version) it tries to access initial locations (0x04a0) in memory
in C-code.
I suspect that it has something to do with calling convention mismatch when control gets transferred from assembly to C-function.
I would really appreciate if you have any pointers.
With warm regards,
KEDAR
Can you paste a (small) fragment of assembler in question that you were having the problem with? It’s not clear from your description just what you’re doing here and what’s going wrong.
Great article! can you please let us know if there are going to be more articles under 64 bit debugging sometime soon?