Previously, I explained how the ‘fast’ action service call interface worked — and why it doesn’t always live up to its namesake.
This time, we’ll examine the no-holds-barred, non-verifiable direct fast action call path. This action service call mechanism is designed for maximum performance at the expense of type-safe, verifiable IL; as you’ll see, several punches are pulled in the name of performance here.
The direct fast action call mechanism operates on a similar principle to the regular fast action call mechanism that we saw previously. however, instead of doing the work to package up parameters into a boxed array and performing the final conversion to native types in a generic fashion at runtime, the direct fast action call system takes a different approach — deal with these tasks at compile time, using static typing.
In both cases, we’ll end up calling through the OnExecuteActionFromJITFast C++ virtual interface function on the INWScriptActions interface, but how we get there is quite different with the direct fast call interface.
Now, recall again that the OnExecuteActionFromJITFast interface is essentially structured in such a way to combine every VM stack manipulation operation and the actual call to the action service handler into a single call to native code. This is accomplished by passing two arrays to OnExecuteActionFromJITFast — a “command” (ULONG) array, describing the underlying operations to perform, and a “command parameter” (uintptr_t) array, describing data to perform the operations upon.
Where the direct fast action service call mechanism differs from the (normal) fast action call service mechanism is in how these two arrays are built. In the direct fast mechanism, the JIT’d code actually packages parameters up itself without relying on the intrinsic — no more boxing or array allocations.
In order to accomplish this, the direct call interface creates a custom value type for each action service call. This value type, named something like NWScript.JITCode.<ScriptName>. DirectActionServiceCmdDescriptors. ExecuteActionService_<ServiceName>, accomplishes a dual purpose. It represents both the “command” and “command parameter” arrays that will be used to call OnExecuteActionFromJITFast. Conversely, each of the individual fields in the value type need to remain strongly typed so that they can be accessed by generated code without involving boxing or other low-performance constructs.
Essentially, the value type is constructed so that it can be accessed using strongly typed individual fields in .NET, but accessed as two arrays — one of ULONGs, and one of uintptr_ts, in native code. Let’s look at an example:
Say we have an action that we would like to call, with the following source-level prototype in NWScript:
string IntToString(int nInteger);
The command and parameter arrays that we’ll want to set up for a call to OnExecuteActionFromJITFast would be as follows:
Cmds (NWFASTACTION_CMD) | CmdParams (uintptr_t) | Description |
---|---|---|
NWFASTACTION_PUSHINT | (nInteger value) | Push nInteger on the stack |
NWFASTACTION_CALL | (None) | Invoke OnAction_IntToString |
NWFASTACTION_POPSTRING | &ReturnString | Pop return value string from the stack |
Both of Cmds and CmdParams represent parallel arrays from the point of view of the native code in OnExecuteActionFromJITFast. The data structure that the direct fast action call mechanism used to represent these two arrays would thus be akin to the following:
[StructLayout(LayoutKind::Sequential)] value struct CmdDesc { // &Cmd_0 represents the // "Cmds" array: // NWFASTACTION_PUSHINT System::UInt32 Cmd_0; // NWFASTACTION_CALL System::UInt32 Cmd_1; // NWFASTACTION_POPSTRING System::UInt32 Cmd_2; // Padding for alignment. If // there were an odd number of // commands, we must introduce // an alignment field here on // 64-bit platforms. #ifdef _WIN64 System::UInt32 CmdPadding_Tail; #endif // &CmdParam_0 represents // the "CmdParams" array: // nInteger System::UInt64 CmdParam_0; // ReturnString NeutralString * CmdParam_Ret_1; // Floating point fields are // represented as a System::Single // with an optional System::Int32 // padding field on 64-bit systems. // Remaining fields are storage // for strings if we had any. // CmdParam_Ret_1 points to // StringStorage_0. NeutralString StringStorage_0; };
The NeutralString type represents the data format for a string that is passed cross-module to and from the script host; internally, it is simply a pair of (char * String, size_t Length), allocated from the process heap. A set of JIT intrinsics are used to allocate and delete NeutralStrings should they be referenced for an action service call.
From a .NET perspective, the following wrapper suffices for the NeutralString (layout-compatible with the actual C++ structure):
[StructLayout(LayoutKind::Sequential)] public value struct NeutralString { System::IntPtr StrPtr; System::IntPtr Length; };
With this structure layout in place, the backend generates IL instructions to load the appropriate constants into each of the Cmd_[n] fields. Then, the CmdParam_[n] fields are set up, followed by the CmdParam_Ret_[n] fields.
(If a NeutralString is referenced, intrinsic calls to translate to and from System::String ^’s are made as necessary.)
Finally, the backend generates a call to OnExecuteActionFromJITFast. One interesting optimization that is performed here is a de-virtualization of the function call.
Normally, OnExecuteActionFromJITFast involves loading a this pointer from a storage location, then loading a virtual function table entry for the target function. However, the backend takes advantage of the fact that the INWScriptActions object associated with a particular script cannot go away while the script’s code can be used. Instead of making a normal virtual function call, the this pointer, and the address of the OnExecuteActionFromJITFast virtual function are hardwired into the emitted IL as immediate constant operands.
(This does make the generated assembly specific to the process that it executes within; the resultant assembly can still be disassembled for debugging purposes, however.)
After the OnExecuteActionFromJITFast call returns, IL is generated to check if the action call failed. If so, then an exception is raised. (Unlike the standard action call interface, the script abort flag on the NWScriptProgram is not tested for performance purposes. Instead, OnExecuteActionFromJITFast must return false to abort the script.)
IL code is then emitted to move any return value data from its storage locations in the value structure to the appropriate IL local variable(s), if any.
Finally, if any strings were involved in the action parameter or return values, the emitted IL code is wrapped in an exception handler that releases any allocated native strings (then rethrowing the exception upwards).
Due to the amount of code generated for a direct fast action service call, all of the logic I have outlined is placed into a stub routine (similar to how one might see a system call stub for a conventional operating system). Calls to the stub are then made whenever an I_ACTION instruction is encountered, assuming that the call does not involve any engine structures.
Overall, the direct fast action call interface provides superior performance to the other two action call mechanisms; even in worst case scenarion environments, such as repeated action service calls involving a small number of string parameters, profiling has shown execution times on the order of 79% as compared to a script assembly emitted with the standard action service call system. In most cases, the performance improvement is even greater.
Tags: NWN2
[…] Nynaeve Adventures in Windows debugging and reverse engineering. « NWScript JIT engine: JIT intrinsics, and JIT’d action service handler calls, part 4: Direct fast a… […]