Cubik'cipher was a challenge from the @HackDay Qualifications 2022 in the hardware category, with 3 solves. The challenge in itself was a mixture of hardware and crypto and was a fun introduction to VHDL, as well as an opportunity for me to learn playing with some HDL-specific tools, which was out of my comfort zone.
Description
A month ago, a spy managed to infiltrate the secret lair of an enemy!This one was developing his own data encryptor!Since then, it seems that the enemy has finished it, so it's up to us to build the decryptor that will allow us to spy on him without him noticing!The flag is of the form HACKDAY{...}
We were given several files:
cubik_cipher.vhd, cubik_pkg.vhd and key_randomize.vhd: VHDL sources for a cryptographic algorithm implementation, which included encryption only.
test_vector.txt: a file containing an example case of plaintext, key and corresponding ciphertext.
flag.txt: a file containing a ciphertext and a key.
The goal of the challenge is therefore clear: we have to implement the algorithm that will decrypt the flag. We also understand that this is not an actual crypto challenge — in the sense that, even though the algorithm may be original, it is not expected to find weaknesses in it in order to solve the challenge, as the key is given.
Globally, there are two main routes one can choose from:
Implement decryption directly in VHDL.
Translate the whole algorithm to another language, use the test vector to make sure encryption works correctly, and implement decryption.
Since I didn't feel comfortable enough to go for the first option, I rewrote the algorithm in Python with Sage. But first, we need to look at the sources and understand the algorithm.
Understanding the cryptography
Prior to anything, let's take a look at test_vector.txt and flag.txt.
data : 4841434B4441590000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
key : e637e7147b2911da7a812269f24da4ba853a8d07087aeea84d6c50e2914f2f8adcea672ebe45de8e458fced9a0db7559eb83b3548dc91aa612cdb6062edd6c9ac993ed607e3fc38c6e27c81cd5666b6ea147c460f3c46565de8905ebc964b683430fdcb151ec3cf2127445b56fd0079c1614677a866f17989c021cc5b53b97ca
data_ciphered : 065A9D041EA0ABB6A38764BA3DCB6B13EFE3FC67DF249BFF149A8EFEA0984D52AC295403103023537198AFDE0C64D7AC5D23F2F25BE941C1AA8149E9FC174995D491F150E518AF26
Before encryption, the plaintext is padded with null bytes to have a 576-bits block (72 bytes). The ciphertext is also a 576-bits block. The key, however, is much larger (1024 bits).
Main entity
The "main" file in the VHDL sources is cubik_cipher.vhd:
We can see it needs a clock signal clk, an input key key (1024 bits), and input data d_i which is the plaintext. We will see that d_v_i dictates when the encryption process should start. The outputs are d_c, the ciphertext, and d_v_c, a bit that notifies the encryption process has terminated.
Then, the architecture for cubik_cipher is described at RTL (Register Transfer Level), which is a way to represent a circuit at a higher level of abstraction, using registers acting as signals.
This process keeps track of a round counter incremented at each clock rising edge. We learn that when the counter reaches 9, the end_ctr signal is set to 1. This hints at the algorithm being a block cipher with around 9 rounds. However, with the way VHDL works and how difficult it is to have a clear idea in mind of how logic is sequentially unfolded, I wasn't sure at this point that the number of rounds was exactly 9, and not something like 8 or 10.
This process performs, at each clock rising edge, a round of encryption. At the beginning (d_v_i = 1), the reg_cipher signal is loaded with the plaintext (d_i). Then, each iteration, the round function is called on the reg_cipher, with a rkey parameter. The d_c signal, which is the output of the entity, is linked to the reg_cipher, and therefore contains the output ciphertext when all rounds have been computed.
This portion of code maps inputs and outputs for another entity in another file key_randomize.vhd. The rkey, which we understand means "round key", is an output of this key_randomize entity. Let's check it out.
At first, when load = 1 (d_v_i = 1), the key is loaded in the reg register, which acts as an internal state for a round key derivation function. At each clock rising edge, this internal state is updated by rotating it 1 bit to the left and flipping a few specific bits.
The output round key (key_r) is then a projection of the internal state, key_w being 576 bits. Therefore, even though the key is 1024 bits, the round keys are always 576 bits (least significant bits of the internal state).
Core cryptographic entity
Let's now focus on the core part of the algorithm: the round function in cubik_pkg.vhd.
In order to perform calculations, the input data vector is converted (slv2cubix) to another representation, called cubix, and converted back to normal at the end (cubix2slv).
A cubix is composed of four 4×4 matrices which coefficients are 9-bit integers, which I will call nonets. Indeed, even though the data vectors are 72 bytes long, 576 bits is divisible by 9, which gives 64 nonets. These nonets are split in 4 groups of 16 nonets, each group populating a square matrix.
Denoting (m0,…,m63) the input nonets, the cubix is therefore:
If the add operation is merely a XOR, the double operation is more intricate. A comment at the beginning of the file gives out a big hint:
-- Primitive polynomial = D^9+D^4+1 -- GF(512)
Operations are actually carried out in a Galois Field, GF(512), with the primitive polynomial D9+D4+1. Roughly, this means a nonet can be seen as a polynomial modulo D9+D4+1 (so max degree 8), whose coefficients are in Z/2Z (so each coefficient is a bit of the nonet).
Matrices can be defined over this space since it is a field, and thus operations such as addition, multiplication, inverse... exist as well.
Round cubix. This second step is easier to understand. It simply performs a bitwise XOR between tmp and the round key:
We should have everything needed to implement the encryption algorithm. As for the Galois Field part, SageMath makes it easy.
But obviously, it didn't work at first try. As I presented it here, the algorithm seems rather unambiguous, but when you are in the process of reversing it, there are a lot of elements for which you have uncertainties. Indeed, many issues could happen at different levels:
Actual bugs in my Python code
Misunderstanding of the VHDL algorithm
Uncertainty about the number of rounds
Uncertainty about endianness at many steps
Uncertainty about indices
Is the key randomization function applied before or after the round function?
Did I understand the key randomization function properly? (especially with the fact that in VHDL, signals are updated at the end of a process)
Do the times and times2 functions even do what I think they do?
Is my mathematical interpretation correct?
Is my translation to Sage correct?
...
With so many doubts, it felt mandatory to run the original VHDL sources with the test vector and observe the state of the signals at each iteration, in order to debug my code.
Creating a testbench for debugging
I read some documentation and wrote a testbench to interact with the cubik_cipher component:
In order to simulate the circuit, I used GHDL, an open-source compiler and simulator for VHDL.
I ran the simulation with ghdl -r testbench --vcd=out.vcd --stop-time=50ns. This generated a trace file in the VCD format, which I viewed with GTKWave. Here is what it looks like:
Thanks to this, I was able to determine that there were indeed 9 rounds and that the key started to be randomized after the first round. I could also debug my script step by step and compare my values to the signals, to locate where I made mistakes.
This allowed to have a working implementation of the encryption algorithm.
Implementing decryption
The only thing that remained was to implement decryption. This should not be too hard — we only need to generate the round keys and follow the reverse steps for the rounds.
The swap rows and round cubix steps are trivial to invert (inverse permutation, and XOR the round key). As for the mix cubix step, its inverse it is actually the same operation, but with the inverse matrices for mi (which is also trivial to do thanks to Sage).
Solution script
Here is the full script that implements encryption and decryption in Python and Sage.
It appears that the cipher is linear (or at least affine). Indeed, each round and each step is linear in the plaintext and the key. This means that for a given key, the entire encryption can be seen as a single linear map, and thus very easily reversed for an attacker that doesn't know the key but knows enough plaintext/ciphertext couples.
This is why in such block ciphers, it is crucial to have a non-linear operation inside rounds. For instance, in AES, if you remove the S-BOX operation (sub bytes), the entire cipher becomes linear and easily broken if you have access to an encryption oracle, by encrypting a basis to determine the underlying linear map uniquely.