# Cubik'cipher

**Cubik'cipher**was a challenge from the @HackDay Qualifications 2022 in the

**hardware**category, with 3 solves. The challenge in itself was a mixture of hardware and crypto and was a fun introduction to VHDL, as well as an opportunity for me to learn playing with some HDL-specific tools, which was out of my comfort zone.

A month ago, a spy managed to infiltrate the secret lair of an enemy!This one was developing his own data encryptor!Since then, it seems that the enemy has finished it, so it's up to us to build the decryptor that will allow us to spy on him without him noticing!The flag is of the form HACKDAY{...}

We were given several files:

`cubik_cipher.vhd`

,`cubik_pkg.vhd`

and`key_randomize.vhd`

: VHDL sources for a cryptographic algorithm implementation, which included encryption only.`test_vector.txt`

: a file containing an example case of plaintext, key and corresponding ciphertext.`flag.txt`

: a file containing a ciphertext and a key.

The goal of the challenge is therefore clear: we have to implement the algorithm that will decrypt the flag. We also understand that this is not an

*actual*crypto challenge — in the sense that, even though the algorithm may be original, it is not expected to find weaknesses in it in order to solve the challenge, as the key is given.Globally, there are two main routes one can choose from:

- Implement decryption directly in VHDL.
- Translate the whole algorithm to another language, use the test vector to make sure encryption works correctly, and implement decryption.

Since I didn't feel comfortable enough to go for the first option, I rewrote the algorithm in Python with Sage. But first, we need to look at the sources and understand the algorithm.

Prior to anything, let's take a look at

`test_vector.txt`

and `flag.txt`

.data : 4841434B4441590000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

key : e637e7147b2911da7a812269f24da4ba853a8d07087aeea84d6c50e2914f2f8adcea672ebe45de8e458fced9a0db7559eb83b3548dc91aa612cdb6062edd6c9ac993ed607e3fc38c6e27c81cd5666b6ea147c460f3c46565de8905ebc964b683430fdcb151ec3cf2127445b56fd0079c1614677a866f17989c021cc5b53b97ca

data_ciphered : 065A9D041EA0ABB6A38764BA3DCB6B13EFE3FC67DF249BFF149A8EFEA0984D52AC295403103023537198AFDE0C64D7AC5D23F2F25BE941C1AA8149E9FC174995D491F150E518AF26

data_ciphered : 79EEEF596B960C42262DFD1D0A2DB218FA3C71C681963F0CC389D3F0F5234C8023CA79D315186AF55621289F92AD6D9B657D999E074C84E13BFDAEDC94A3BA4FCB95B4013BFC40E5

key : f0ae2e1abee8afbe3ea424cc71f4ce17455a21d5df15cc4f6362e3af095cfb6da7188a9777c2c875ab39145a88a2142aea7b5411607110d70cd3d37c20f259b1920031990709d8e0e8d661b1a05fe8b5719aab6569835b3e52be738982608fda36549fd1e3398c725190356fbe97998b79f84f0ef23c4dea63898b52319a47a2

Before encryption, the plaintext is padded with null bytes to have a 576-bits block (72 bytes). The ciphertext is also a 576-bits block. The key, however, is much larger (1024 bits).

The "main" file in the VHDL sources is

`cubik_cipher.vhd`

:library ieee;

use ieee.std_logic_1164.all;

use ieee.numeric_std.all;

use work.cubik_pkg.all;

entity cubik_cipher is

generic (round_div_2 : positive := 8);

port (

resetn : in std_logic;

clk : in std_logic;

-- key

key : in std_logic_vector(1023 downto 0);

-- d in

d_v_i : in std_logic;

d_i : in std_logic_vector(data_width-1 downto 0);

-- dout

d_c : out std_logic_vector(data_width-1 downto 0);

d_v_c : out std_logic

);

end entity;

architecture rtl of cubik_cipher is

constant key_w : positive := 576;

type state is (idle, engine);

signal current_state,next_state : state;

signal ctr_round : natural range 0 to 9;

signal end_ctr,cmd_ctr : std_logic;

signal rkey : std_logic_vector(key_w-1 downto 0);

signal reg_cipher : std_logic_vector(d_i'range);

begin

process (clk, resetn) is

begin

if resetn = '0' then

ctr_round <= 0;

elsif rising_edge(clk) then

if cmd_ctr = '0' then

ctr_round <= 0;

else

ctr_round <= ctr_round + 1;

end if;

end if;

end process;

end_ctr <= '1' when ctr_round = 9 else '0';

process (clk, resetn) is

begin

if resetn = '0' then

current_state <= idle;

elsif rising_edge(clk) then

current_state <= next_state;

end if;

end process;

process (clk, resetn) is

begin

if resetn = '0' then

reg_cipher <= (others => '0');

elsif rising_edge(clk) then

if d_v_i = '1' and ctr_round = 0 then

reg_cipher <= d_i;

else

reg_cipher <= round(reg_cipher, rkey);

end if;

end if;

end process;

d_c <= reg_cipher;

process (current_state, end_ctr, d_v_i)

begin

d_v_c <= '0';

case current_state is

----------------------------

when idle =>

if d_v_i = '1' then

next_state <= engine;

end if;

cmd_ctr <= '0';

----------------------------

when engine =>

if end_ctr = '1' then

next_state <= idle;

d_v_c <= '1';

cmd_ctr <= '0';

else

cmd_ctr <= '1';

end if;

----------------------------

end case;

end process;

key_engine : entity work.key_randomize

generic map(key_w => key_w)

port map (

resetn => resetn,

clk => clk,

load => d_v_i,

key => key,

key_r => rkey

);

end architecture;

Note: again, I am not well-versed in hardware and HDL, so some of my explanations or the terminology that I use may be incorrect at times.

The following part describes the inputs and outputs of the

`cubik_cipher`

entity:entity cubik_cipher is

generic (round_div_2 : positive := 8);

port (

resetn : in std_logic;

clk : in std_logic;

-- key

key : in std_logic_vector(1023 downto 0);

-- d in

d_v_i : in std_logic;

d_i : in std_logic_vector(data_width-1 downto 0);

-- dout

d_c : out std_logic_vector(data_width-1 downto 0);

d_v_c : out std_logic

);

end entity;

We can see it needs a clock signal

`clk`

, an input key `key`

(1024 bits), and input data `d_i`

which is the plaintext. We will see that `d_v_i`

dictates when the encryption process should start. The outputs are `d_c`

, the ciphertext, and `d_v_c`

, a bit that notifies the encryption process has terminated.Then, the architecture for

`cubik_cipher`

is described at RTL (*Register Transfer Level*), which is a way to represent a circuit at a higher level of abstraction, using registers acting as signals.architecture rtl of cubik_cipher is

constant key_w : positive := 576;

type state is (idle, engine);

signal current_state,next_state : state;

signal ctr_round : natural range 0 to 9;

signal end_ctr,cmd_ctr : std_logic;

signal rkey : std_logic_vector(key_w-1 downto 0);

signal reg_cipher : std_logic_vector(d_i'range);

begin

[...]

The architecture comprises several processes.

process (clk, resetn) is

begin

if resetn = '0' then

ctr_round <= 0;

elsif rising_edge(clk) then

if cmd_ctr = '0' then

ctr_round <= 0;

else

ctr_round <= ctr_round + 1;

end if;

end if;

end process;

end_ctr <= '1' when ctr_round = 9 else '0';

This process keeps track of a

*round counter*incremented at each clock rising edge. We learn that when the counter reaches 9, the`end_ctr`

signal is set to 1. This hints at the algorithm being a block cipher with around 9 rounds. However, with the way VHDL works and how difficult it is to have a clear idea in mind of how logic is sequentially unfolded, I wasn't sure at this point that the number of rounds was *exactly*9, and not something like 8 or 10.process (clk, resetn) is

begin

if resetn = '0' then

reg_cipher <= (others => '0');

elsif rising_edge(clk) then

if d_v_i = '1' and ctr_round = 0 then

reg_cipher <= d_i;

else

reg_cipher <= round(reg_cipher, rkey);

end if;

end if;

end process;

d_c <= reg_cipher;

This process performs, at each clock rising edge, a round of encryption. At the beginning (

`d_v_i = 1`

), the `reg_cipher`

signal is loaded with the plaintext (`d_i`

). Then, each iteration, the `round`

function is called on the `reg_cipher`

, with a `rkey`

parameter. The `d_c`

signal, which is the output of the entity, is linked to the `reg_cipher`

, and therefore contains the output ciphertext when all rounds have been computed.Let's see what is this

`rkey`

signal:key_engine : entity work.key_randomize

generic map(key_w => key_w)

port map (

resetn => resetn,

clk => clk,

load => d_v_i,

key => key,

key_r => rkey

);

This portion of code maps inputs and outputs for another entity in another file

`key_randomize.vhd`

. The `rkey`

, which we understand means "round key", is an output of this `key_randomize`

entity. Let's check it out.library ieee;

use ieee.std_logic_1164.all;

entity key_randomize is

generic (key_w : positive := 32);

port (

resetn : in std_logic;

clk : in std_logic;

load : in std_logic;

key : in std_logic_vector(1023 downto 0);

key_r : out std_logic_vector(key_w-1 downto 0)

);

end entity;

architecture rtl of key_randomize is

signal reg : std_logic_vector(key'range);

begin

process (clk,resetn) is

begin

if resetn = '0' then

reg <= (others => '1');

elsif rising_edge(clk) then

if load = '1' then

reg <= key;

else

reg <= reg(reg'length-2 downto 0) & reg(reg'high);

reg(24) <= reg(23) xor reg(reg'high);

reg(421) <= reg(420) xor reg(reg'high);

reg(476) <= reg(475) xor reg(reg'high);

reg(545) <= reg(544) xor reg(reg'high);

reg(923) <= reg(922) xor reg(reg'high);

end if;

end if;

end process;

key_r <= reg(key_r'range);

end architecture;

At first, when

`load = 1`

(`d_v_i = 1`

), the key is loaded in the `reg`

register, which acts as an internal state for a round key derivation function. At each clock rising edge, this internal state is updated by rotating it 1 bit to the left and flipping a few specific bits.The output round key (

`key_r`

) is then a *projection*of the internal state,`key_w`

being 576 bits. Therefore, even though the key is 1024 bits, the round keys are always 576 bits (least significant bits of the internal state).Let's now focus on the core part of the algorithm: the

`round`

function in `cubik_pkg.vhd`

.function round(data : std_logic_vector(data_width-1 downto 0) ; key : std_logic_vector(575 downto 0)) return std_logic_vector is

variable tmp : cubix;

variable data_out : std_logic_vector(data'range);

begin

tmp := slv2cubix(data);

tmp := mixcubix(tmp, (m_0,m_1,m_2,m_3));

tmp := roundcubix(tmp, key);

tmp := swap_rows(tmp);

data_out := cubix2slv(tmp);

return data_out;

end round;

In order to perform calculations, the input data vector is converted (

`slv2cubix`

) to another representation, called **cubix**, and converted back to normal at the end (`cubix2slv`

).A

**cubix**is composed of four$4 \times 4$

matrices which coefficients are **9-bit**integers, which I will call*nonets*. Indeed, even though the data vectors are 72 bytes long, 576 bits is divisible by 9, which gives**64 nonets**. These nonets are split in 4 groups of 16 nonets, each group populating a square matrix.Denoting

$(m_0, \ldots, m_{63})$

the input nonets, the cubix is therefore:

$\begin{bmatrix}d_{63} & d_{62} & d_{61} & d_{60} \\ d_{59} & d_{58} & d_{57} & d_{56} \\ d_{55} & d_{54} & d_{53} & d_{52} \\ d_{51} & d_{50} & d_{49} & d_{48}\end{bmatrix} \: \ldots \: \begin{bmatrix} d_{15} & d_{14} & d_{13} & d_{12} \\ d_{11} & d_{10} & d_{9} & d_{8} \\ d_{7} & d_{6} & d_{5} & d_{4} \\ d_{3} & d_{2} & d_{1} & d_{0} \end{bmatrix}$

Then, the actual encryption part is carried through three different steps.

**Mix cubix.**This step uses four constant matrices

$m_0, m_1, m_2, m_3$

, defined as follows.

$\begin{bmatrix} 3 & 0 & 2 & 6 \\ 6 & 3 & 0 & 2 \\ 2 & 6 & 3 & 0 \\ 0 & 2 & 6 & 3 \end{bmatrix}, \begin{bmatrix} 6 & 4 & 3 & 0 \\ 0 & 6 & 4 & 3 \\ 3 & 0 & 6 & 4 \\ 4 & 3 & 0 & 6 \end{bmatrix}, \begin{bmatrix} 4 & 0 & 9 & 3 \\ 3 & 4 & 0 & 9 \\ 9 & 3 & 4 & 0 \\ 0 & 9 & 3 & 4 \end{bmatrix}, \begin{bmatrix} 2 & 4 & 0 & 9 \\ 9 & 2 & 4 & 0 \\ 0 & 9 & 2 & 4 \\ 4 & 0 & 9 & 2 \end{bmatrix}$

function mixcubix(c : cubix; mt : matrix_t) return cubix is

variable tmp : cubix;

begin

for k in cubix'range loop

tmp(k) := mixmatrix(c(k),shiftmt(mt,k));

end loop;

return tmp;

end mixcubix;

Each matrix in the cubix is mixed using 4 matrices. More especially, if

$(C_0, C_1, C_2, C_3)$

is the cubix:

$C_0 = \text{mixmatrix}(C_0, (m_0, m_1, m_2, m_3)) \\ C_1 = \text{mixmatrix}(C_1, (m_1, m_2, m_3, m_0)) \\ C_2 = \text{mixmatrix}(C_2, (m_2, m_3, m_0, m_1)) \\ C_3 = \text{mixmatrix}(C_3, (m_3, m_0, m_1, m_2))$

As for the

`mixmatrix`

function, it basically multiplies each row of the input matrix with the $m_i$

matrices:

$\text{mixmatrix}(C, (m_0, m_1, m_2, m_3)) = \begin{bmatrix} C^{(0)} m_0 \\ C^{(1)} m_1 \\ C^{(2)} m_2 \\ C^{(3)} m_3 \end{bmatrix}$

function mixrow(r : row; m : matrix_c) return row is

variable tmp : row;

begin

for i in row'range loop

tmp(i) := times(m(i,0),r(0)) xor times(m(i,1),r(1)) xor times(m(i,2),r(2)) xor times(m(i,3),r(3));

end loop;

return tmp;

end mixrow;

function mixmatrix(m : matrix; mt : matrix_t) return matrix is

variable tmp : matrix;

begin

for j in matrix'range loop

tmp(j) := mixrow(m(j),mt(j));

end loop;

return tmp;

end mixmatrix;

However, the multiplications are carried out in a specific mathematical space. The function

`times`

implements a *double-and-add*algorithm.function times2(n : nonaire) return nonaire is

variable tmp :nonaire;

begin

tmp := n(7 downto 4) & (n(3) xor n(8)) & n(2 downto 0) & n(8);

return tmp;

end times2;

function times(n1: nonaire; n2: nonaire) return nonaire is

variable tmp : nonaire;

begin

tmp := (others => '0');

for i in n1'range loop

if n1(i) = '1' then

tmp := tmp xor n2;

end if;

if i > 0 then

tmp := times2(tmp);

end if;

end loop;

return tmp;

end times;

If the

*add*operation is merely a XOR, the*double*operation is more intricate. A comment at the beginning of the file gives out a big hint:-- Primitive polynomial = D^9+D^4+1 -- GF(512)

Operations are actually carried out in a Galois Field,

$\text{GF}(512)$

, with the primitive polynomial $D^9 + D^4 + 1$

. Roughly, this means a nonet can be seen as a polynomial modulo $D^9 + D^4 + 1$

(so max degree 8), whose coefficients are in $\mathbb{Z}/2\mathbb{Z}$

(so each coefficient is a bit of the nonet).Matrices can be defined over this space since it is a field, and thus operations such as addition, multiplication, inverse... exist as well.

**Round cubix.**This second step is easier to understand. It simply performs a bitwise XOR between

`tmp`

and the round key:function roundrow(r : row; key : std_logic_vector(35 downto 0)) return row is

variable tmp : row;

begin

for i in row'range loop

tmp(i) := r(i) xor key(9*(i+1)-1 downto 9*i);

end loop;

return tmp;

end roundrow;

function roundmatrix(m : matrix; key : std_logic_vector(143 downto 0)) return matrix is

variable tmp : matrix;

begin

for j in matrix'range loop

tmp(j) := roundrow(m(j),key(36*(j+1)-1 downto 36*j));

end loop;

return tmp;

end roundmatrix;

function roundcubix(c : cubix; key : std_logic_vector(575 downto 0)) return cubix is

variable tmp : cubix;

begin

for k in cubix'range loop

tmp(k) := roundmatrix(c(k),key(144*(k+1)-1 downto 144*k));

end loop;

return tmp;

end roundcubix;

**Swap rows.**The final step is also quite simple — it only permutes the coefficients of the matrix a certain way:

function swap_rows (c: cubix) return cubix is

variable tmp :cubix;

begin

tmp (0)(0) := c(1)(1);

tmp (0)(1) := c(3)(3);

tmp (0)(2) := c(2)(2);

tmp (0)(3) := c(2)(1);

tmp (1)(0) := c(3)(1);

tmp (1)(1) := c(2)(0);

tmp (1)(2) := c(0)(1);

tmp (1)(3) := c(1)(2);

tmp (2)(0) := c(1)(0);

tmp (2)(1) := c(0)(3);

tmp (2)(2) := c(2)(3);

tmp (2)(3) := c(3)(0);

tmp (3)(0) := c(3)(2);

tmp (3)(1) := c(0)(0);

tmp (3)(2) := c(1)(3);

tmp (3)(3) := c(0)(2);

return tmp;

end swap_rows;

We should have everything needed to implement the

*encryption*algorithm. As for the Galois Field part, SageMath makes it easy.But obviously, it didn't work at first try. As I presented it here, the algorithm seems rather unambiguous, but when you are in the process of reversing it, there are a lot of elements for which you have uncertainties. Indeed, many issues could happen at different levels:

- Actual bugs in my Python code
- Misunderstanding of the VHDL algorithm
- Uncertainty about the number of rounds
- Uncertainty about endianness at many steps
- Uncertainty about indices
- Is the key randomization function applied before or after the round function?
- Did I understand the key randomization function properly? (especially with the fact that in VHDL, signals are updated at the end of a process)
- Do the
`times`

and`times2`

functions even do what I think they do? - Is my mathematical interpretation correct?
- Is my translation to Sage correct?
- ...

With so many doubts, it felt mandatory to run the original VHDL sources with the test vector and observe the state of the signals at each iteration, in order to debug my code.

I read some documentation and wrote a testbench to interact with the

`cubik_cipher`

component:library ieee;

use ieee.std_logic_1164.all;

use work.cubik_pkg.all;

entity testbench is

end testbench;

architecture behavior of testbench is

component cubik_cipher is

port (

resetn : in std_logic;

clk : in std_logic;

-- key

key : in std_logic_vector(1023 downto 0);

-- d in

d_v_i : in std_logic;

d_i : in std_logic_vector(data_width-1 downto 0);

-- dout

d_c : out std_logic_vector(data_width-1 downto 0);

d_v_c : out std_logic

);

end component;

signal resetn : std_logic := '0';

signal input : std_logic_vector(data_width-1 downto 0);

signal output : std_logic_vector(data_width-1 downto 0);

signal inputkey : std_logic_vector(1023 downto 0);

signal d_v_i : std_logic := '0';

signal d_v_c : std_logic := '0';

signal clk : std_logic := '0';

constant clk_period : time := 1 ns;

begin

uut: cubik_cipher port map (

resetn => resetn,

clk => clk,

key => inputkey,

d_v_i => d_v_i,

d_i => input,

d_c => output,

d_v_c => d_v_c

);

clk_process : process

begin

clk <= '0';

wait for clk_period;

clk <= '1';

wait for clk_period;

end process;

stim_process: process

begin

wait for 1 ns;

resetn <= '1';

inputkey <= "1110011000110111111001110001010001111011001010010001000111011010011110101000000100100010011010011111001001001101101001001011101010000101001110101000110100000111000010000111101011101110101010000100110101101100010100001110001010010001010011110010111110001010110111001110101001100111001011101011111001000101110111101000111001000101100011111100111011011001101000001101101101110101010110011110101110000011101100110101010010001101110010010001101010100110000100101100110110110110000001100010111011011101011011001001101011001001100100111110110101100000011111100011111111000011100011000110111000100111110010000001110011010101011001100110101101101110101000010100011111000100011000001111001111000100011001010110010111011110100010010000010111101011110010010110010010110110100000110100001100001111110111001011000101010001111011000011110011110010000100100111010001000101101101010110111111010000000001111001110000010110000101000110011101111010100001100110111100010111100110001001110000000010000111001100010110110101001110111001011111001010";

input <= "010010000100000101000011010010110100010001000001010110010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000";

d_v_i <= '1';

wait for 1 ns;

d_v_i <= '0';

wait for 11 ns;

end process;

end;

I ran the simulation with

`ghdl -r testbench --vcd=out.vcd --stop-time=50ns`

. This generated a trace file in the VCD format, which I viewed with GTKWave. Here is what it looks like:First, the key and the plaintext are loaded. The first round key is the 576 least significant bits of the key.

Next, the first round has been performed, and the next round key has been computed.

Thanks to this, I was able to determine that there were indeed 9 rounds and that the key started to be randomized

*after*the first round. I could also debug my script step by step and compare my values to the signals, to locate where I made mistakes.This allowed to have a working implementation of the encryption algorithm.

The only thing that remained was to implement decryption. This should not be too hard — we only need to generate the round keys and follow the reverse steps for the rounds.

The

**swap rows**and**round cubix**steps are trivial to invert (inverse permutation, and XOR the round key). As for the**mix cubix**step, its inverse it is actually the same operation, but with the**inverse matrices**for$m_i$

(which is also trivial to do thanks to Sage).Here is the full script that implements encryption and decryption in Python and Sage.

F.<D> = GF(2)[]

K.<x> = GF(512, name='x', modulus=D^9+D^4+1)

m0_ = [[3,0,2,6],[6,3,0,2],[2,6,3,0],[0,2,6,3]]

m1_ = [[6,4,3,0],[0,6,4,3],[3,0,6,4],[4,3,0,6]]

m2_ = [[4,0,9,3],[3,4,0,9],[9,3,4,0],[0,9,3,4]]

m3_ = [[2,4,0,9],[9,2,4,0],[0,9,2,4],[4,0,9,2]]

def swap_rows(c):

tmp = [[0] * 4 for i in range(4)]

tmp[0][0], tmp[0][1], tmp[0][2], tmp[0][3] = c[1][1], c[3][3], c[2][2], c[2][1]

tmp[1][0], tmp[1][1], tmp[1][2], tmp[1][3] = c[3][1], c[2][0], c[0][1], c[1][2]

tmp[2][0], tmp[2][1], tmp[2][2], tmp[2][3] = c[1][0], c[0][3], c[2][3], c[3][0]

tmp[3][0], tmp[3][1], tmp[3][2], tmp[3][3] = c[3][2], c[0][0], c[1][3], c[0][2]

return tmp

def inv_swap_rows(c):

tmp = [[0] * 4 for i in range(4)]

tmp[1][1], tmp[3][3], tmp[2][2], tmp[2][1] = c[0][0], c[0][1], c[0][2], c[0][3]

tmp[3][1], tmp[2][0], tmp[0][1], tmp[1][2] = c[1][0], c[1][1], c[1][2], c[1][3]

tmp[1][0], tmp[0][3], tmp[2][3], tmp[3][0] = c[2][0], c[2][1], c[2][2], c[2][3]

tmp[3][2], tmp[0][0], tmp[1][3], tmp[0][2] = c[3][0], c[3][1], c[3][2], c[3][3]

return tmp

def decimal2bin(d):

return [int(_) for _ in f"{d:09b}"]

def bin2nonet(d):

return K(d[::-1])

def nonet2bin(N):

L = N.polynomial().list()

while len(L) != 9:

L.append(0)

return L[::-1]

m0 = [[bin2nonet(decimal2bin(m0_[j][i])) for i in range(4)] for j in range(4)]

m1 = [[bin2nonet(decimal2bin(m1_[j][i])) for i in range(4)] for j in range(4)]

m2 = [[bin2nonet(decimal2bin(m2_[j][i])) for i in range(4)] for j in range(4)]

m3 = [[bin2nonet(decimal2bin(m3_[j][i])) for i in range(4)] for j in range(4)]

def slv2row(data):

return [bin2nonet(data[9 * i:9 * (i + 1)]) for i in range(3, -1, -1)]

def slv2matrix(data):

return [slv2row(data[36 * j:36 * (j + 1)]) for j in range(3, -1, -1)]

def slv2cubix(data):

return [slv2matrix(data[144 * k:144 * (k + 1)]) for k in range(3, -1, -1)]

def row2slv(r):

tmp = []

for i in range(4):

tmp = nonet2bin(r[i]) + tmp

return tmp

def matrix2slv(m):

tmp = []

for j in range(4):

tmp = row2slv(m[j]) + tmp

return tmp

def cubix2slv(c):

tmp = []

for k in range(4):

tmp = matrix2slv(c[k]) + tmp

return tmp

def mixrow(r, m):

return [

(m[i][0] * r[0]) + (m[i][1] * r[1]) + (m[i][2] * r[2]) + (m[i][3] * r[3])

for i in range(4)

]

def inv_mixrow(r, m):

m_ = matrix(K, m)^(-1)

return [

(m_[i][0] * r[0]) + (m_[i][1] * r[1]) + (m_[i][2] * r[2]) + (m_[i][3] * r[3])

for i in range(4)

]

def shiftmt(mt, p):

return [mt[(l + p) % 4] for l in range(4)]

def mixmatrix(m, mt):

return [mixrow(m[j], mt[j]) for j in range(4)]

def inv_mixmatrix(m, mt):

return [inv_mixrow(m[j], mt[j]) for j in range(4)]

def mixcubix(c, mt):

return