Yesterday, I outlined some of the general principles behind how guest to host communication in VMs work, and why the virtual serial port isn’t really all that great of a way to talk to the outside world from a VM. Keeping this information in mind, it should be possible to do much better in a VM, but it is first necessary to develop a way to communicate with the outside world from within a VMware guest.
It turns out that, as previously mentioned, that there happen to be a lot of things already built-in to VMware that need to escape from the guest in order to notify the host of some special event. Aside from the enhanced (virtualization-aware) hardware drivers that ship with VMware Tools (the VMware virtualization-aware addon package for VMware guests), for example, there are a number of “convenience features” that utilize specialized back-channel communication interfaces to talk to code running in the VMM.
While not publicly documented by VMware, these interfaces have been reverse engineered and pseudo-documented publicly by enterprising third parties. It turns out the VMware has a generalized interface (a “fake” I/O port) that can be accessed to essentially call a predefined function running in the VMM, which performs the requested task and returns to the VM. This “fake” I/O port does not correspond to how other I/O ports work (in particular, additional registers are used). Virtually all (no pun intended) of the VMware Tools “convenience features”, from mouse pointer tracking to host to guest time synchronization use the VMware I/O port to perform their magic.
Because there is already information publicly available regarding the I/O port, and because many of the tasks performed using it are relatively easy to find host-side in terms of the code that runs, the I/O port is an attractive target for a communication mechanism. The mechanisms by which to use it guest-side have been publicly documented enough to be fairly easy to use from a code standpoint. However, there’s still the problem of what happens once the I/O port is triggered, as there isn’t exactly a built-in command that does anything like take data and magically send it to the kernel debugger.
For this, as alluded to previously, it is necessary to do a bit of poking around in the VMware VMM in order to locate a handler for an I/O port command that would be feasible to replace for purposes of shuttling data in and out of the VM to the kernel debugger. Although the VMware Tools I/O port interface is likely not designed (VMM-side, anyway) for high performance, high speed data transfers (at least compared to the mechanisms that, say, the virtualization-aware NIC driver might use), it is at the very least orders of magnitude better than the virtual serial port, certainly enough to provide serious performance improvements with respect to kernel debugging, assuming all goes according to plan.
Looking through the list of I/O port commands that have been publicly documented (if somewhat unofficially), there are one or two that could possibly be replaced without any real negative impacts on the operation of the VM itself. One of these commands (0x12) is designed to pop-up the “Operating System Not Found” dialog. This command is actually used by the VMware BIOS code in the VM if it can’t find a bootable OS, and not typically by VMware Tools itself. Since any VM that anyone would possibly be kernel debugging must by definition have a bootable operating system installed, axing the “OS Not Found” dialog is certainly no great loss for that case. As an added bonus, because this I/O port command displays UI and accesses string resources, the handler for it happened to be fairly easy to locate in the VMM code.
In terms of the VMM code, the handler for the OS Not Found dialog command looks something like so:
int __cdecl OSNotFoundHandler() { if (!IsVMInPrivilegedMode()) /* CPL=0 check */ { log error message; return -1; } load string resources; display message box; return somefunc(); }
Our mission here is really to just patch out the existing code with something that knows how to talk to take data from the guest and move it to the kernel debugger, and vice versa. A naive approach might be to try and access the guest’s registers and use them to convey the data words to transfer (it would appear that many of the I/O port handlers do have access to the guest’s registers as many of the I/O port commands modify the register data of the guest), but this approach would incur a large number of VM exits and therefore be suboptimal.
A better approach would be to create some sort of shared memory region in the VM and then simply use the I/O port command as a signal that data is ready to be sent or that the VM is now waiting for data to be received. (The execution of the VM, or at least the current virtual CPU, appears to be suspended while the I/O port handler is running. In the case of the kernel debugger, all but one CPU would be halted while a KdSendPacket or KdReceivePacket call is being made, making the call essentially one that blocks execution of the entire VM until it returns.)
There’s a slight problem with this approach, however. There needs to be a way to communicate the address of the shared memory region from the guest to the modified VMM code, and then the modified VMM code needs to be able to translate the address supplied by the guest to an address in the VMM’s address space host-side. While the VMware VMM most assuredly has some sort of capability to do this, finding it and using it would make the (already somewhat invasive) patches to the VMM even more likely to break across VMware versions, making such an address translation approach undesirable from the perspective of someone doing this without the help of the actual vendor.
There is, however, a more unsophisticated approach that can be taken: The code running in the guest can simply allocate non-paged physical memory, fill it with a known value, and then have the host-side code (in the VMM) simply scan the entire VMM virtual address space for the known value set by the guest in order to locate the shared memory region in host-relative virtual address space. The approach is slow and about the farthest thing from elegant, but it does work (and it only needs to be done once per boot, assuming the VMM doesn’t move pinned physical pages around in its virtual address space). Even if the VMM does occasionally move pages around, it is possible to compensate for this, and assuming such moves are infrequent still achieve acceptable performance.
The astute reader might note that this introduces a slight hole whereby a user mode caller in the VM could spoof the signature used to locate the shared memory block and trick the VMM-side code into talking to it instead of the KD logic running in kernel mode (after creating the spoofed signature, a malicious user mode process would wait for the kernel mode code to try and contact the VMM, and hope that its spoofed region would be located first). This could certainly be solved by tighter integration with the VMM (and could be quite easily eliminated by having the guest code pass an address in a register which the VMM could translate instead of doing a string hunt through virtual address space), but in the interest of maintaining broad compatibility across VMware VMMs, I have not chosen to go this route for the initial release.
As it turns out, spoofing the link with the kernel debugger is not really all that much of a problem here, as due the way VMKD is designed, it is up to the guest-side code to actually act on the data that is moved into the shared memory region, and a non-privileged user mode process would have limited ability to do so. It could certainly attempt to confuse the kernel debugger, however.
After the guest-side virtual address of the shared memory region is established, the guest and the host-side code running in the VMM can now communicate by filling the shared memory region with data. The guest can then send the I/O port command in order to tell the host-side code to send the data in the shared memory region, and/or wait for and copy in data destined from a remote kernel debugger to the code running in the guest.
With this model, the guest is entirely responsible for driving the kernel debugger connection in that the VMM code is not allowed to touch the shared memory region unless it has exclusive access (which is true if and only if the VM is currently waiting on an I/O port call to the patched handler in the VMM). However, as the low-level KD data transmission model is synchronous and not event-driven, this does not pose a problem for our purposes, thus allowing a fairly simple and yet relatively performant mechanism to connect the KD stub in the kernel to the actual kernel debugger.
Now that data can be both received from the guest and sent to the guest by means of the I/O port interface combined with the shared memory region, all that remains is to interface the debugger (DbgEng.dll) with the patched I/O port handler running in the VMM.
It turns out that there’s a couple of twists relating to this final step that make it more difficult than what one might initially expect, however. Expect to see details on that (and more) on the next installment of the VMKD series…
“While not publicly documented by VMware, these interfaces have been reverse engineered and pseudo-documented publicly by enterprising third parties.”…
VMWare Tools are now open source, see http://open-vm-tools.sourceforge.net/
Nice find. I didn’t know about that :)
Definitely worth a look, and great to hear that VMware is opening that stuff up. Already had something written about that point in an upcoming post as well, FWIW.
[Disclaimer: I’m an engineer at VMware.]
Hey, just wanted to note a couple of things you might want to investigate which are new to VMware Workstation 6.0 that you might find interesting:
(1) VMCI (Virtual Machine Communication Interface)
This is a generic shared memory/datagram protocol interface that you can use to communicate with something running in your guest. It’s documented here:
http://pubs.vmware.com/vmci-sdk/VMCI_intro.html
(2) Workstation 6.0 includes an experimental debugging feature that allows you to attach to individual processes within the guest. Basically our VMM exports a stub that speaks the remote gdb protocol and interprets enough guest state to present the necessary interface to your debugger. We’re working on making this feature a lot better and more complete (e.g. so it works with Windows guests).
One of the lead engineer on this project has a blog:
http://stackframe.blogspot.com/
Yeah, a friend (Steve Dispensa) pointed out VMCI to me after I had started posting the series. It would have definitely been handy to know about that from the start, but at that point the code was already written (I don’t think I could use the VMware-supplied guest side logic, but it could be reverse engineered and reimplemented to be workable, and the host-side stuff could probably be reused – more on that later).
Also, VMCI doesn’t appear to be supported on VMware Server from that documentation, which is a big downer for me. (IMO, VMware Server is far superior from a dev/testing perspective, remote console access, in terms of all dev stuff I’ve been involved in, had been *far* more useful than multiple snapshots. But I suppose different people have different opinions of it.)
I actually had a list of things that I thought would be nice to have in terms of stuff VMware could do in an upcoming post. VMCI support on VMware Server would be great (although I’m sure that you guys view that as more of a business decision than a technical one).
Other things that I’d really like to see would be:
– The ability to interface with VMCI from kernel mode at high IRQL from the guest. Definitely required for what VMKD does in any case, especially in the highly restricted environment of being in the KD communication path.
– The ability to suspend the VM for a call to host-side code, much like how the repurposed OSNotFound handler that VMKD uses operates. In cases like VMKD, it’s not like I can just slap in a “Sleep” at high IRQL in kernel mode, so suspending the VM is a better choice there.
– The ability to access registers of a guest across a call to host-side code, and the ability to tell if a guest-to-host call originated from CPL=0 or user mode.
– The ability to read/write physical addresses, and ideally built-in support to translates virtual addresses using the current page table.
– If possible, a sanctioned OS-independent (i.e. assembly language) interface to make the call to the VMM instead of using the guest-side VMCI driver. Not everything interesting out there is doable from user mode, and the restrictions in various parts of kernel mode make it questionable whether it is always going to be practical to IOCTL to another driver (at least in terms of Windows) in order to make the necessary calls to the VMM. In fact, many of the most interesting things with respect to improvements that can be made in VMs involve kernel mode drivers that provide a better communication channel than hardware interfaces designed for physical systems, and the VMCI API (guest-side) doesn’t seem all that conducive to such things right now.
VMCI is definitely a start, but without some of the above features it would be kind of tricky to use it for what VMKD needs with the existing support that I see documented in the SDK.
[…] Nynaeve Adventures in Windows debugging and reverse engineering. « Fast kernel debugging for VMware, part 4: Communicating with the VMware VMM […]