Antirdroid was a fun Android reverse engineering challenge split in 3 steps. I am not that familiar with Android RE and especially with the specific tools for dynamic analysis and debugging, hence why after desperately and unsuccessfully trying to patch the application and make it work on an emulator I decided to give up. I only believe in one god, and its name is static analysis.
Description
In this challenge, you need to find three flags.
Each flag starts with ECW_ and will be displayed in the Android logcat together with a tag indicating the flag number.
This seems to read files called step_1.dex, step_2.dex and step_3.dex from raw resources. Speaking of which, if you unzip the apk and check in the res/raw/ folder, you will find these files, but they do not look like valid dex files. Perhaps are they encrypted?
Definitely some interesting stuff going on here, we know for sure there's crypto involved now. Some base64 string is decoded then decrypted using AES CBC, but the key seems derived from a certain variable, that is the a field in a shared preferences object called flag. Shared preferences allow to read and save key/value pairs on device storage, which can also be used to keep a global state in the application. We understand that we'll probably get to this bit later in the challenge.
In the d/c/a/d/ folder, there is an interesting b.class file:
I greatly pruned the code because there were a lot of try/catches and stuff that heavily impacted readability.
Here's what we can understand from this piece:
Some string var290 is hashed with MD5 and compared to b71985397688d6f1820685dde534981b
var290 is used to derived an AES key
The file step_1.dex is decrypted with this key
A few ciphertexts are decrypted too (in base64: j04vGcW35ZUg23JsqQ+/YA==, WOtre8ObMy2nnFbqn2Kb6w== and J9vFCBjTjE6YoMI1wVDwjg==)
A longer ciphertext (bjmQcWsAN3k8NxmaYYWvy6L+SDvu3ZlDFMSFvepIycxwZLgw5qGRB5ggJLHpDvW3) is decrypted and put in the a field of the flag shared preferences object
All of this is quite approximative, but it doesn't matter; it's enough to make progress.
The md5 reverses to "jean". I found an implementation of PBKDF2WithHmacSHA256 in Python, which I used to decrypt all the ciphertexts:
from hashlib import pbkdf2_hmacfrom Crypto.Cipher import AESfrom base64 import b64decodesalt = [56,-35,119,-111,71,113,-83,70,-119,122,-92,22,124,23,-83,110]salt =list(map(lambdau: u %256, salt))salt =bytes(salt)iv = [-101,105,-107,-118,-65,117,-35,92,-47,-112,-102,-76,40,-21,69,93]iv =list(map(lambdau: u %256, iv))iv =bytes(iv)defdecrypt(blob,passwd): key =pbkdf2_hmac( hash_name='sha256', password=passwd, salt=salt, iterations=65536, dklen=32, ) aes = AES.new(key, AES.MODE_CBC, iv)return aes.decrypt(blob)C ="""j04vGcW35ZUg23JsqQ+/YA==WOtre8ObMy2nnFbqn2Kb6w==J9vFCBjTjE6YoMI1wVDwjg==bjmQcWsAN3k8NxmaYYWvy6L+SDvu3ZlDFMSFvepIycxwZLgw5qGRB5ggJLHpDvW3""".split('\n')C =list(map(b64decode, C))for c in C:print(decrypt(c, b'jean'))open('step_1_decoded.dex', 'wb').write(decrypt(open('step_1.dex', 'rb'), b'jean'))
Okay, so the first three plaintexts are not that interesting. Maybe the second one will be the path to the class in the next step. On the other hand, the fourth plaintext looks very interesting. It looks like some kind of key.
Now let's decompile the newly decrypted .dex file!
Cool, we made it to step1. We can see some weird stuff going on with the flag shared preferences object: the value "k" is affected to the key "w"...
In a/a/a/d.class, we can see a few potential new ciphertexts:
publicstaticfinalString a ="ZnoETjqJ0h3VUtdPQnzkWsqrDFtvsK4BQ+1NJGx38YHXq9QxUEmztU9CsN4vCTbI";publicstaticfinalString b ="tvEf77LVcQcHX2FtkIoSBQ==";publicstaticfinalString c ="TbQSB6aY7Ye++tVv84UPIA==";publicstaticfinalString d ="biPW3PPcH5wQHBNdE6eP2Pg4K9UAZT8guUhpNLV44RzWdYVT91LcP8WgtY+9QrUUKWfW0FIyKHVg3P7AKS9vIQ==";publicstaticfinalString e ="n/CG6W9Ilu8muE8UGJM29S/2JV4hw2O/IX8IPBartj7qvWP0MasL7ZujCyHYH1ERYd+NP+IzVaTuRwT+TbCoSA==";publicstaticfinalString f ="7wwCGcbnGp/EAusByZQYcYsxSfBxiEHP4GZPjsAHjGLYVryk6yS9xTo6GmF1J6Z6rDvp8XnuBCZ97DmURQx+lvAvrebYDXPEbiVOcSANTk4=";publicstaticfinalString g ="EIW2q6l3m0ZvO1G6+QgXDVqiFcGj5tDV9tEtCRHJ6ALV2bwYxBzUvY4S5LuERqdrqm4RGDU3xHXOJr6+buDwIg==";
In a/a/a/e/c.class, there's a new interesting piece of code:
Looks like it's the same crypto as before, but with a different key which is the string contained in the a field of flag. Luckily, we might know what this key is. Let's try it out:
C ="""ZnoETjqJ0h3VUtdPQnzkWsqrDFtvsK4BQ+1NJGx38YHXq9QxUEmztU9CsN4vCTbItvEf77LVcQcHX2FtkIoSBQ==TbQSB6aY7Ye++tVv84UPIA==biPW3PPcH5wQHBNdE6eP2Pg4K9UAZT8guUhpNLV44RzWdYVT91LcP8WgtY+9QrUUKWfW0FIyKHVg3P7AKS9vIQ==n/CG6W9Ilu8muE8UGJM29S/2JV4hw2O/IX8IPBartj7qvWP0MasL7ZujCyHYH1ERYd+NP+IzVaTuRwT+TbCoSA==7wwCGcbnGp/EAusByZQYcYsxSfBxiEHP4GZPjsAHjGLYVryk6yS9xTo6GmF1J6Z6rDvp8XnuBCZ97DmURQx+lvAvrebYDXPEbiVOcSANTk4=EIW2q6l3m0ZvO1G6+QgXDVqiFcGj5tDV9tEtCRHJ6ALV2bwYxBzUvY4S5LuERqdrqm4RGDU3xHXOJr6+buDwIg==""".split('\n')C =list(map(b64decode, C))for c in C:print(decrypt(c, b'LuKXSGlN5(%:Vk=alEbl9khIEPBo=mXu;hR7Ez7E'))
Result : password_for_step1_is:py5B<S9aT1WrbZU.VtV%bjM*'RCbO7
Really cool, what if we try this password as a key for the remaining ciphertexts that we weren't able to decrypt earlier?
C ="""ZnoETjqJ0h3VUtdPQnzkWsqrDFtvsK4BQ+1NJGx38YHXq9QxUEmztU9CsN4vCTbItvEf77LVcQcHX2FtkIoSBQ==TbQSB6aY7Ye++tVv84UPIA==""".split('\n')C =list(map(b64decode, C))K =b'py5B<S9aT1WrbZU.VtV%bjM*\'RCbO7'for c in C:print(decrypt(c, K))
If we try to decrypt step_2.dex with the last key (py5B<S9aT1WrbZU.VtV%bjM*'RCbO7), we do get a valid dex file again. Let's decompile it.
The a/a/a/d.class file contains three new ciphertexts, we're used to it at that point.
publicstaticfinalString a ="5sxJURBMWadPV+Qfj2g/WFVWcaLbXoUxyXeiIvpa4pu1SjSj0nqneJeN0tNkKbJx";publicstaticfinalString b ="gkZ6pGuoDU6Lz5bc23Y/5ZfI9XPcJd/r1PRrsE1epqc=";publicstaticfinalString c ="rbfA5lkSHq0eL4dmwH4gHg==";
A huge pyramid of conditions on some string called var1!
Let's take a look at the first one:
if (f.a(var1,4,0,2) *f.a(var1,6,0,2) ==4840)
I removed the "(Object)null" arguments which are probably useless decompilation artifacts. What does this f.a function do now? Let's take a look at the f.class file:
Wonderful, now let's say this is a key and try to decrypt the three given ciphertexts:
K = [118,118,62,75,98,86,99,79,106,121,98,40,114,81,66,84]K =bytes(K[::-1])print(K)C ="""5sxJURBMWadPV+Qfj2g/WFVWcaLbXoUxyXeiIvpa4pu1SjSj0nqneJeN0tNkKbJxgkZ6pGuoDU6Lz5bc23Y/5ZfI9XPcJd/r1PRrsE1epqc=rbfA5lkSHq0eL4dmwH4gHg==""".split('\n')C =list(map(b64decode, C))for c in C:print(decrypt(c, K))
Our intuition (since we still have no clue whatsoever what the application looks like at this point, let's remember that 😃) is that there is a way to input hand-drawn digits, and a classifier is used to recognize them. The method getNumber returns the most probable digit that was last input, and the pin will be a concatenation of these digits... or at least, those who pass the verifyNext test. But what does verifyNext do?
This is starting to get spicy. Our digit is converted into a bitmap and fed to an interpreter of a certain index. We can see that this.interpreters is initialized here:
So this is where all of this comes from...! The mnist-letter.tflite files that we noticed at the beginning are used to create an array of 12 interpreters.
For the i-th digit, we load the i-th interpreter, we run it on the bitmap and we get a result var4. We can guess the output is 2-dimensions, and we're verifying whether the first scalar is greater than the second:
if (var4[0][1] > var4[0][0]) { var5 =true;}
...which probably means these outputs are like "probability that the digit is something" and "probability that it is not". It might then very be that each of these files are models that are trained to recognize one specific digit!
We can also notice some of these files are the same, which means they are models for the same digits. This makes the search space smaller if we ever want to bruteforce the pin (don't make fun of me, I tried bruteforce for hours because I had the wrong password but we'll get to this later).
The idea now is to download a test set of images and labels from the MNIST handwritten digit database, and for each interpreter, see which letter matches the best. I am not very familiar with tensorflow, but all it takes is some copy/pasting and tweaks:
import argparse, time, sysimport numpy as npfrom PIL import Imageimport tflite_runtime.interpreter as tflitedefload_labels(filename):withopen(filename, 'r')as f:return [line.strip()for line in f.readlines()]f =open('plouf/t10k-images-idx3-ubyte', 'rb').read()f = f[4+4+4+4:]imgs = []for i inrange(500): imgs.append( np.expand_dims( np.reshape( np.array([(np.float32(x) /255) for x in f[28*28* i:28*28* (i +1)]], dtype=np.float32), (28, 28) ), axis=0 ) )f =open('plouf/t10k-labels-idx1-ubyte', 'rb').read()f = f[4+4:]labels = []for i inrange(10000): labels.append((f[i]))for letter in'abcdefghijkl': interpreter = tflite.Interpreter(model_path='mnist-%s.tflite'% letter) interpreter.allocate_tensors() input_details = interpreter.get_input_details() output_details = interpreter.get_output_details() height = input_details[0]['shape'][1] width = input_details[0]['shape'][2] scores = [0] *10for (img, label) inzip(imgs, labels): interpreter.set_tensor(input_details[0]['index'], img) interpreter.invoke() output_data = interpreter.get_tensor(output_details[0]['index']) results = np.squeeze(output_data) top_k = results.argsort()if top_k[1]==1: scores[label]+=1print(letter, scores)
All is left now is to find password. I spent a lot of time on this part because I didn't realize Bytecode Viewer had failed to decompile a few methods, which made me miss very important pieces that were used to construct this variable, because these were not exported when you asked the software to export all the classes.
Either way, here's where this.password is generated:
We're looking for keys in the flag object, from 'a' to 'z'. If the key exists, we append the value associated with this key in this.password (which is initialized as an empty string).
So now we need to retrace our own steps and find all the places where the flag object was edited (not only putString, but also delete).
The issue is, there are often edits in which the context is a bit hard to tell, and that thus might not be relevant. For instance, this class from step 2:
We can guess putString in called with key "q", but the value argument is pushed by aload1. If we take a look a bit higher up in the code, we can see where it is stored:
I do have to admit static analysis was a pain in the ass at some times, but it still allowed me to go pretty fast: it took me an hour to solve the first part and 20 minutes for the second part. If I had realized sooner that I was missing some crucial code in my decompilation export, I would have solved the last part in roughly an hour too.
It's a tradeoff between how much time it'll take you to setup a working debug environment (I tried but I failed, I couldn't get past the first "trick", I even tried to patch the apk but then I got a blank screen...) and how confident you feel about going full blind, as it takes a lot of rigor and intuition.
All in all, a solid challenge with a fun Machine Learning twist at the end!