BattleChip (misc, 495)
Last updated
Last updated
BattleChip was a misc challenge that leant towards what I think could have been categorized as a fun mixture of hardware & pwn.
We were given a CHIP-8 emulator implementation in Python. CHIP-8 is an interpreted programming language designed in the 1970s meant to be run on a virtual machine, that was primarily used to create videogames on old 8-bit machines.
The emulator runs on a remote server, which we can connect to and supply a ROM in hexadecimal. A few peripherals are implemented, including display, which is sent back to the client, and timers. (keyboard is not implemented, we can't play pong, or even better, Ventriglisse :()
Example of a Ventriglisse session (Slippery Slope) on CHIP-8
BattleChip actually had a nice little prequel challenge called Chip & Fish to introduce us to CHIP-8 and communicating with the remote server.
We are told the flag is hidden in the 16 first bytes of the stack. Indeed, let us take a look at the given challenge1.py
:
Memory.O_STACK
holds the address of the beginning of the stack, which is 0xEA0
, as in most implementations.
Therefore, all we need to do is write a tiny program that will dump the contents of these 16 bytes, between 0xEA0
and 0xEB0
.
Disclaimer: I wrote all my programs directly in hexadecimal opcodes, but there are many comments to help understand them.
Let's have a look at the opcode table on Wikipedia. We notice a few instructions that will be of great help:
We also learn that I
is a general purpose 16-bit register for memory addresses.
With that in mind, we can already create our first very simple program:
We submit our ROM to the server: 68006900aea0d89fffff
and receive the display!
Convert this to binary and then hex, and we have our first flag!
FCSC{e50f91bc6418a7a79b0c3fe74bb5e600}
Now for the real challenge. We are told the implementation of the CHIP-8 virtual machine has several additional elements in its architecture.
BattleChip introduces a Secure Element which holds a secret key, and which adds a new execution context ("privileged") that can interface with it. Here's a summary of what it changes:
A 10-byte secret key is initialized at boot time (when we connect to the server)
An LRU cache (Least Recently Used) has been implemented in the ALU (Arithmetic Logic Unit). It is described as: operations in the ALU usually take up 2 cycles, but when they are cached, they only take up 1, which saves time.
Three new opcodes are added:
0000
encrypts 10 input bytes (at I
) with the secret key
0001
verifies if our input is the secret key
00E1
clears the ALU cache
When context is switched, memory is copied from unsafe space to safe space, but not the other way around (yes, we can't even know what is the result of the encryption with the 0000
opcode). Only the VF register is returned to unsafe space (to carry a return value).
What is particularly nice about this challenge is that we do not have to focus at all on the Python implementation (the source files), as all the elements we need to trust in are :
the CHIP-8 specifications (Wikipedia is largely sufficient) ;
the additional challenge specifications as they are described.
However, we are still going to have to reverse the pre-compiled routines for the new opcodes, because we don't know what they exactly do.
Basically, when the emulator is instanciated, two contexts are created; an untrusted one (the normal one) and a trusted one.
When the trusted context is created, it generates the random secret key, and sets the two pre-compiled routines encrypt
and verify
, where some bytes are dynamically replaced with bytes from the key.
These routines are hardcoded in the emulator:
Fun fact, back then I totally didn't notice what was written about the number of cycles needed being returned to the insecure context. So we are going to pretend we never read this, as we can solve the challenge without it :^)
We can see in each routine there are 10 {:02X}
, which are placeholders for the secret key bytes.
Octo (https://johnearnest.github.io/Octo/) is a slick online emulator for CHIP-8 that can also compile code and disassemble code in human readable assembly. Let's replace the placeholder secret key bytes with dummy ones (0x42), and disassemble the two routines.
As described earlier, the encrypt
routine does indeed encrypt our input with the secret key using simple XOR operations, one byte at a time.
The verify
routine is quite similar to the encrypt
one. The main change is that the XOR results are ORed together: if the final result is 0, we know the comparison was successful.
Finally, here is what happens when the verify
routine returns 1 in VF
(cu.py
, control unit code):
If we call verify with the secret key, the flag is copied in unsafe space memory and we will be able to read it. Our goal is thus to find a way to retrieve the secret key!
As there is only one processor unit, the ALU cache is shared between the two contexts.
This means the number of cycles elapsed during an ALU operation can act as a binary side channel. If we measure only one cycle instead of two during, for instance, a XOR operation, we know it means it is currently cached.
Just to be sure, let's check out how the cache is implemented inside the emulator (alu.py
):
An actual LRU cache from functools
is used, with max size 16. The ALU has two possible states: AVAILABLE
or PENDING
. In cu.py
:
If the ALU is in the PENDING
state, the PC
register is rolled back two bytes, in other words the instruction is executed twice, and the next time it'll have switched to the AVAILABLE
state. However, when the operation is cached, the ALU doesn't enter the PENDING
state.
With this in mind, it is fairly easy to leak information about the secret key during the encrypt
routine.
Denote P[k] = (0, ..., 0xFF, ..., 0)
(10 bytes) with 0xFF
at the k-th position.
We are going to ask the trusted context to perform P xor S
, where S
is the secret key, by putting P
at position I
and calling 0000
. In particular, the operation 0xFF xor S[k]
will be performed and put in cache. We can then bruteforce S[k]
by computing 0xFF xor i
for i
in [0, 255] and counting the number of cycles each of these XOR operations lasted.
Of course, we should reset the cache with the 00E1
instruction at each iteration to avoid any problems. This allows us to recover the secret key in 2560 encryptions worst case, but it will run pretty fast anyways since everything is run server-side (we'll just send our ROM exploit once).
CHIP-8 has two useful instructions for counting cycles, thanks to the timer:
The delay timer is simply a value that is decremented each cycle (and stays at 0 once it reaches it). So if we set the delay timer to some arbitrary value before performing a XOR operation, and then retrieve the value of the delay timer right after, we can use this latter value to know whether the operation was cached.
We have everything to build our exploit ROM. The following code is my commented exploit.
Exploit ROM in hexadecimal : aea061016000f055f11e6000f055f11e6000f055f11e6000f055f11e6000f055f11e6000f055f11e6000f055f11e6000f055f11e6000f055f11e6000f055f11e6200aea0f21e60fff055630000e1aea00000650866fff5158633f007af80f05568006900d89f4006127073013300124caf00f21e8030f055aea0f21e6000f0557201320a1242af000001af146800690ad89f
Send it to the server and enjoy the show!