Yesterday, we learned how the standard action service call path operates in the MSIL JIT backend for the NWScript JIT engine. This time, we’ll examine the ‘fast’ action service call path.
As I alluded to last time, the fast action service call path attempts to cut down on the overhead of making multiple managed/native transitions for each action service handler call. While a standard path action service call may need to make multiple managed/native transitions depending on the count of arguments to a particular action service call, a fast action service call makes only one managed/native transition.
The fast action service call interface has two components:
- An extension, INWScriptActions::OnExecuteActionFromJITFast, to the C++-level interface that NWNScriptJIT.dll (and the interpretive NWScriptVM) use to communciate with the script host. This extension comes in the form of a new interface API that takes an action service ordinal to invoke, a count of source-level arguments to the action, and a list of commands and parameters. The commands and parameters describe a set of push or pop operations to perform on the VM stack in order to set up a call/return pair to the action service handler. These operations all happen entirely in native code, embedded in the script host.
- A new JIT intrinsic on the INWScriptProgram interface, Intrinsic_ExecuteActionServiceFast, which returns the action service handler’s return value (boxed), if any, takes an array of (boxed) arguments to pass to the action service handler.<.li>
It’s important to note that the current version of the fast action service call interface isn’t quite as fast as one would hope, due to in no small part the fact that it sticks to verifiable IL. In fact, it’s not always faster than the standard path, which is why it’s currently only used if there are six or more VMStackPush/Pop
Internally, Intrinsic_ExecuteActionServiceFast essentially looks at a set of data tables provided by the script host which describe the effective prototype of each action handler. Based on this information, it translates the managed parameter array into a command and parameter array to pass to the C++-level INWScriptActions::OnExecuteActionFromJITFast API and calls the script host.
Next, the script host then does all of the associated operations (pushing items onto the VM stack, calling the action service handler, and popping the return value, if any, off the VM stack) “locally”. Finally, Intrinsic_ExecuteActionServiceFast repackages any return value into its managed equivalent and returns back to the JIT’d program code.
If all of that sounded like a mouthful, it certainly was — there is extra overhead here; the fast action service mechanism is competing with the overhead of managed/native code.
Before we continue, let’s look at how this all plays out in the underlying IL. Here’s the same “Hello, world” subroutine we had before:
void PrintHello() { PrintString( "Hello, world (from NWScript)." ); }
If I were to override the cost/benefit heuristics in the JIT engine and force it to always use the fast action service handler call interface, we will see the following IL emitted:
IL_0025: ldstr "Hello, world (from NWScript)." IL_002a: stloc.1 IL_002b: ldarg.0 IL_002c: ldfld m_ProgramJITIntrinsics IL_0031: ldc.i4 0x1 IL_0036: conv.u4 IL_0037: ldc.i4 0x1 IL_003c: conv.u4 IL_003d: ldc.i4 0x1 IL_0042: newarr [mscorlib]System.Object IL_0047: stloc.2 IL_0048: ldloc.2 IL_0049: ldc.i4 0x0 IL_004e: ldloc.1 IL_004f: stelem.ref IL_0050: ldloc.2 IL_0051: callvirt instance object Intrinsic_ExecuteActionServiceFast(uint32, uint32, object[]) IL_0056: ldnull IL_0057: stloc.2 IL_0058: pop
We have the following operations going on here:
String ^ s = "Hello, world (from NWScript)"; array< Object ^ > ^ a = gcnew array< Object ^ >{ s }; m_ProgramJITIntrinsics->ExecuteActionServiceFast( 1, 1, a );
Clearly, the fast action service path as it is implemented today is a tradeoff. When there are a large number of parameters and return values (this isn’t as uncommon as you think when you consider that NWScript passes and returns structures, such as ‘vector’ (3 floats), by value), the overhead of the fast action service call mechanism appears to be less than that of many managed/native switches (at least under .NET 4.0 on amd64).
However, when fewer intrinsic calls (leading to managed/native switches) are involved, then the standard path ends up winning out.
Now, there are some improvements that could be made here on the JIT side of things, above and beyond the fast action call mechanism. If we look at the generated logic and examine it under the profiler, the bulk of the overhead involved in the fast action service call interface as it’s implemented in its prototype stage today comes from the need to allocate an array of object GC pointers, box arguments up to place them into the array, unboxing the array contents when copying the array contents to create the command table for OnExecuteActionFromJIT, and boxing/unboxing the return value from Intrinsic_ExecuteActionFast.
All of these are limitations of the JIT (intrinsic) interface and not the C++-level interface; furthermore, essentially all of these steps could be eliminated if the JIT backend could avoid the usage of the object GC pointer array in the variadic intrinsic call. While I was unable to find a clean way to do this in verifiable IL (without interposing a large amount of automatically generated C++/CLI code emitted by some other generation program), it is possible to circumvent much of this overhead — if we are willing to emit non-verifiable IL.
This leads us to the next topic, direct fast action service handler calls, which we’ll discuss in detail in the next post.
Tags: NWN2
[…] Nynaeve Adventures in Windows debugging and reverse engineering. « NWScript JIT engine: JIT intrinsics, and JIT’d action service handler calls, part 3: Fast acti… […]