Detecting hardware-assisted hypervisors without external timing
I’ll admit right now that this entry is a tease, because I can't tell you how I did it. However, I'll start by saying that there are some people out there who are claiming that hardware-assisted hypervisors are completely undetectable and some people who are claiming that they are not.
The people claiming that hard-assisted hypervisors are undetectable are basing their argument on several things. First, the sensitive instructions that allow detection of software-based VMMs are trapped by a hardware-assisted hypervisor so that they can be emulated appropriately, if necessary. Second, some registers already have hardware-backed shadow copies; so, as an example, trying to leave paged protected mode (which is not permitted—not even in root mode) might seem like it worked, but it didn't really, because the hypervisor will simply switch the guest into v86 mode and the shadow CR0 will be lying to you. Third, the delivery of physical memory can be intercepted and empty pages can be returned. Finally, APIs can be hooked transparently because the real breakpoint registers aren't visible to the guest (i.e., all the things that the guest would typically like to see to make a determination can be hidden from that guest).
Now, the people claiming that hard-assisted hypervisors are detectable are assuming several things. One such assumption is that calling VMRUN for the second time will fail; but, there's no reason why a hypervisor can't emulate it. The response to the emulation claim, so far, has always been "but, no hypervisor will emulate VMX properly". That's simply untrue. If someone really wants to hide it, then someone will write it. It's just a matter of time.
Speaking of time, we all understand that hypervisor execution will be slower than native execution, so the people claiming that hardware-assisted hypervisors are detectable point to the execution time as “the way to do it”. How do you measure it? With a stopwatch or a time server (i.e., external time sources). The problem with that is, when a user first brings home their shiny new VMX-capable machine, they're not going to run baseline tests to see how long 10,000,000 iterations of VMEXIT take to run—it’s just not going to happen. So, while it's easy to say "yes, it will be slower", what if your machine is compromised from day one? You're never going to know, because it's that slow already. Even if it's not compromised from day one and a user did run a baseline test to determine before and after speeds, does that really mean anything? What if the CPU load is extremely high because Windows Update is downloading an entire site? Yeah, that's going to be slow, too. What if you have no clock available? No network connection? Out of luck? Oops. In an article posted by one proponent of the “detectability” camp, it’s stated that this method needs a bit of “busy work” to get a rough idea of the (hopefully clean) tick resolution. However, RDTSC isn't used for thread scheduling, so a hypervisor can mess with it all it wants. The same article also tells us to "throw away that external timing device” and not to sweat it, unless we “had time to take a nap and go out for dinner”, at which point we could worry—that all sounds like external timing to me.
Anyway, I found something that you can do in the guest that the hypervisor can't see until after it's happened; so, it can't hide the side effects. It doesn't need a network connection and it doesn't need a user to time anything. It's also quick (it executes in one timeslice), but, I still can't tell you how I did it.