Last time, I outlined the MSIL JIT backend from a high level, and described some of how its external interface functions.
While knowing how the MSIL JIT backend works from the outside is all well and good, most of the interesting parts are in the internals. This time, let’s dig in deeper and see how the MSIL code generation process in the JIT backend functions (and what a generated script assembly might look like).
Script assemblies
As I mentioned, the backend generates a new .NET assembly for each script passed to NWScriptGenerateCode. This API creates a new NWScriptProgram object, which represents an execution environment for the JIT’d script program.
When a NWScriptProgram object is created, it consumes an IR representation for a script program and begins to create the MSIL version of that script, contained within a single .NET assembly tied to that NWScriptProgram instance. Each script assembly contains a single module; that module then contains a series of classes used in the MSIL representation of the script. The NWScriptProgram object internally maintains references to the script assembly and exposes a API to allow the script to then be invoked by the user.
Main program class
Each generated NWScript program contains a main class, with a name of the form NWScript.JITCode.<script name>.ScriptProgram. This class, generated via Reflection, derives from a standard interface (NWScript.IGeneratedScriptProgram). This interface exports a set of standard APIs used to call a script:
// // Define the interface that a // JIT'd program implements. // public interface class IGeneratedScriptProgram { // // Execute the script and return the // entry point return value, if any. // Int32 ExecuteScript( __in UInt32 ObjectSelf, __in array< Object ^ > ^ ScriptParameters, __in Int32 DefaultReturnCode ); // // Execute a script situation (resume label). // void ExecuteScriptSituation( __in UInt32 ScriptSituationId, __in array< Object ^ > ^ Locals ); };
When it comes time to execute the script, the NWScriptProgram object calls the IGeneratedScriptProgram::ExecuteScript method on the script’s main class. A set of parameters may be passed to the script in boxed form; these parameters are the .NET type equivalents of the NWScript IR parameters to the script’s entry point symbol.
Variable types
Each NWScript IR type has an associated distinct (strong typed) .NET type. The NWScript IR only deals with scalar (non-aggregate) types, so it is simple to map IR types to .NET types. The following mapping is defined for that purpose:
NWScript Type | IR Type | .NET Type |
---|---|---|
int | ACTIONTYPE_INT | System.Int32 |
float | ACTIONTYPE_FLOAT | System.Single |
object | ACTIONTYPE_OBJECT | System.UInt32 |
string | ACTIONTYPE_STRING | System.String |
void | ACTIONTYPE_VOID | System.Void |
Engine structs (event, talent, etc) | ACTIONTYPE_ENGINE_0 … ACTIONTYPE_ENGINE_9 | NWScript.NWScriptEngineStructure0 … NWScript.NWScriptEngineStructure9 |
At the IR-level, user defined structures do not exist and are simply individual scalar variables, drawn from one of the above fundamental types. (The NWScript.NWScriptEngineStructure[0-9] types simply wrap a C++ reference counted pointer to a script-host-defined data structure. There’s a bit more to it, but for the most part, then can be thought of in that fashion.)
Subroutine structure
The JIT backend turns each IR-level subroutine into a class instance method on the main program type during code generation. IR parameters and return values translate directly to .NET parameters and return values, such that a .NET subroutine equivalent simply takes parameters and returns values as one would naturally expect.
If there was a script debug symbol table available during the IR generation phase, the .NET subroutines are even given recognizable names corresponding to their source level names (note that reading the NWScript symbol table is optional; a script can still be JIT’d even without symbol names). For example, consider the following NWScript source level function:
void PrintMessage(string s) { ... }
The backend emits a function prototype for this NWScript function like so (were we to disassembly the resultant assembly with ILDasm):
.method private instance void NWScriptSubroutine_PrintMessage(string A_1) cil managed
[…]
There is one catch with this model of directly converting parameters and return types to .NET equivalents; if a script subroutine returns a structure in source level, this turns into multiple scalar return values in the NWScript IR. However, .NET methods cannot return more than one value.
If an IR subroutine does return more than one value, the backend generates a .NET structure type to contain the return value. The fields on the structure correspond to the return values. When it comes time to return a value from such a subroutine, the backend generates code to load the return value variables into the return structure fields, then returns an instance of the return structure.
Similarly, when a subroutine returning multiple return values is invoked, the caller immediately unpacks the structure’s contents into their local variables.
Globals, locals, and other variables
There are several classes of IR variables that the backend concerns itself with. These variable classes describe how the variable is stored. The backend supports several different storage mechanisms for variables, as outlined in the following mapping table:
IR variable class | .NET variable storage mechanism |
---|---|
NWScriptVariable::Global | Instance member variable on program class |
NWScriptVariable::Local | IL local (LocalBuilder ^) |
NWScriptVariable::CallParameter | IL local (LocalBuilder ^) |
NWScriptVariable::CallReturnValue | IL local (LocalBuilder ^) |
NWScriptVariable::ReturnValue | IL local (LocalBuilder ^) (*type may be an aggregate return type as described above) |
NWScriptVariable::Constant | Immediate constant operand |
NWScriptVariable::Parameter | IL argument slot (first is slot 1, etc) |
Except for aggregate return types (as noted above), IR variables always take on their direct .NET equivalents for their corresponding IR types.
Translating IR instructions to MSIL instructions
With this mapping in place, translating IR instructions to .NET instructions becomes more or less straightforward; one need only load the corresponding parameter variables to an IR instruction onto the IL execution stack, emit the appropriate IL opcode, and then store the top of the IL execution stack to the result variable of the IR instruction.
For example, the I_XOR instruction can be mapped to MSIL by generating a load for the two IR parameter variables (using a helper that emits the correct code depending on the class of variable), then generating an OpCodes::Xor instruction, and finally, generating a store (again using a helper that emits the correct code for the destination variable class) to the IR result variable. A similar process can be performed for most data-processing IR instructions to create their MSIL equivalents.
Local variable management
While it would be possible to simply create every IL-level variable corresponding to a “local-like” IR variable up front at the start of every subroutine, the MSIL backend avoids doing this so as to conserve local variable slots. Instead, a local variable pool is maintained while generating MSIL code for IR instructions. The local variable pool can be thought of as a stack of LocalBuilder instances, grouped by their associated types, which are available for use.
IR variables that have been flagged by the code analysis phase as local to a particular control flow (meaning, that their lifetimes are constrained to a single control flow) are eligible to be allocated from the current local variable pool, when it comes time to instantiate said variable (in the form of a I_CREATE IR instructions).
If there is a free IL local of the given type available in the local pool, that IL local is checked out of the local pool and used for the lifetime of the given IR variable (until a I_DELETE IR instruction causes the IL local to be freed back to the local pool).
Only “local-like” variables that are constrained within a single control flow may be pooled in this fashion; other variables have fixed assignments (either at first use, or in the case of a variable created in multiple control flows and then merged, up front at subroutine entry). Temporary, internal variables created by code generation but not present in the IR generally also fall into the category of poolable IL variables.
This restriction is in place to ensure that IR variables always map to consistent IL locals when merging between control flows. For example, consider two divergent control flows which both create an IR variable (call it variable V), and then merge together later, causing the IR variable V to be merged across the control flows. In this case, for the merge to be seamless, the IR variable V must be allocated to the same IL local (call it local L) in all such merging branches. The simplest way to ensure this is to not pool IR variables that aren’t local to a control flow (and thus do not participate in merging).
Fortunately, many temporary variables created during calculations tend to be local to a control flow, thus allowing for notable savings from variable pooling.
Pooling of locals is important given that not all temporary variables might be removed by the code analysis phase, and NWScript programs emit large quantities of temporary variables (in the form of copies to the top of stack for use as a NWScript instruction operand).
At this point, most of the basics of the MSIL code generator have been covered at a high level (with the exception of action service handler calls — we’ll get to those later). Next time, we’ll look at an example subroutine (in NWScript source text and then MSIL forms) in order to see how everything fits together. Stay tuned! (There is, in fact, light at the end of the NWScript tunnel.)
Tags: NWN2
[…] Nynaeve Adventures in Windows debugging and reverse engineering. « NWScript JIT engine: Generating a .NET assembly for a JIT’d script […]