Meet Snake, Perseus and Uroboros: A Lesson In Cryptography

Why do I love working in cybersecurity? Well, every single day brings a new puzzle to solve, and every day I have to bring together skills…

Meet Snake, Perseus and Uroboros: A Lesson In Cryptography

Why do I love working in cybersecurity? Well, every single day brings a new puzzle to solve, and every day I have to bring together skills of coding, cryptography and networking to understand and solve new puzzles. And, so the release of news of the disabling some malware provides a new puzzle to understand, and it's one where sophisticated code is let down by some novice mistakes. And, when I say novice, I mean that I would not even expect an undergraduate student to implement such trivally cracked code.

Meet Snake, Uroboros and Turla

This week the FBI announced that they had disabled some stealthy malware — Snake —and then uploaded a payload to disable it. This malware has been used for over a decade and which often focused on nation-state activity.

Snake — thought to have been created in 2003 — grew from the Uroboros malware. If you are interested, Uroboros comes from the ancient symbol that represented a serpent eating its own tail. In fact, a low-resolution image of Uroboros was — at one time — used as a backdoor into the Turla malware platform.

Figure 1: Low-resolution image of a historical illustration of an Uroboros by the German philosopher and theologian Jakob Böhme [ref]

Overall, Snake manages to run on most types of operating systems and is highly modular in building targeted code. A typical exploit allows Snake to install the Turla malware platform onto an infected device, and which then exfiltrates data. Once operated, Snake uses a P2P (Peer-to-peer) encryption method to hide its commands. This uses a network of hop-points of infected machines — and which makes it difficult to detect in operation.

Breaking the Snake

The modular coding and embedded encryption often make it extremely difficult to detect Snake with virus scanners. But, there were fundamental weaknesses in its construction:

  • The usage of an encryption key which was only 128 bits long and which used a Diffie-Hellman key exchange. These days DH is hardly ever used, and where we typically use ECDH (Elliptic Curve Diffie-Hellman), as it is more secure and efficient. But, anyone who studies cryptography will know that the cracking of a 128-bit discrete log problem is fairly trival to crack, and where we typically use over 1,024 bits as a starting point. Once the key was discovered, analysts were able to decrypt the messages that were sent between infected hosts.
  • The code was not obfuscated before it was released. Normally a problem would encrypt the strings that it uses, and obfuscate the flow of the code, but this was not the case with Snake, and where analysts could see strings in plain text. As we see in Figure 2, the comments clearly have been included in the code.
  • There was no authentication in the key exchange, which meant that once the key exchange method was cracked, the Snake nodes do not know if they are communicating with a fake bot or a real one. Normally in key exchange methods, we integrate some form of digital signing, such as with RSA or ECDSA, and which is used to authenicate the nodes we are communicating with — unfortunately Snake nodes just communicated with any node that was able to perform the key exchange method. As it was a peer-to-peer network, though, it is actually quite difficult to properly authenicate each node to each other.
Figure 2: Snake code

In the analysis, the FBI could see the machines which were affected, and created a search warrant to investigate some of the ones they found:

Figure 3: Affidavid [here]

This included computers in several states in the US:

Figure 4: Affidavit [here]

The monster slayer: Perseus

The FBI even created a fake Snake program — code-named PERSEUS— which awoke Snake implants on other systems. In Greek methology, Perseus — the son of Zeus and half-brother to Heracles — was one of the heroes who slayed monsters, such as Gorgon Medusa and Cetus. PERSEUS allowed for commands to be relayed to infected nodes, and one that allowed for the internal code to be overwritten — and this disable the malware.

Figure 5: Perseus — the monster slayer

Overall, the affidavit is a great example of explaining the operation of the malware in a way that a judge could understand. It shows how a Diffie-Hellman key exchange is used to create a session key, and which is used to encrypt the session.

The Diffie-Hellman method works with Bob creating a secret (b), and Alice creating another secret (a). Bob computes B=g^b (mod p), and where p is a large prime number. Alice computes A=g^a (mod p). They exchange these values, and Bob takes Alice’s value (A) and computes A^b (mod p) and Alice computes B^a (mod p). They should both end up with the same shared secret of K=g^{ab} (mod p) — as illustrated in Figure 6.

Figure 6: The Diffie-Hellman method [here]

This all works well, but we need to make sure we use a large enough prime number (p) to make sure that Eve cannot determine a or b from the A and B values, respectively. Overall, we pick values of the generator (g) and the prime number (p) which are safe. For this, we get group numbers of [here]:

  • Group 1. Generated with a 768-bit prime number.
  • Group 2. Generated with a 1,024-bit prime number.
  • Group 5. Generated with a 1,536-bit prime number.
  • Group 14. Generated with a 2,048-bit prime number.

Most cybersecurity professionals would know that you need to start with Group 2, but where it is highly recommended to usea at least Group 5. But, Snake used the unbelievably trival prime number of just 128 bits. If you are interested, the difference in security between using 1,536 bits and 128 bits is:

7083271603906078757501937199839970337237042072752024044272935252133021449487672037753077279976540494138725768234495819230178699615482523444525217326451885295083794903354995236859949814158861106339793498427520403285563504697437506153794873484561573759163876734171825622767620957924878367969185009358384129424236514293956457858505582263102570156257796764049361854053114869322098725721757629614488965634874516982012228724064256

more times difficult! Your electricity bill for cracking single key for 1,536 bits will be billions and billions of dollars, and not even the FBI can afford that! For 128-bits, the amount of computation required is almost trival.

Solving a discrete log problem: Baby-Step-Giant-Step

Thus, solving the discrete log problem for a 128-bit is trival, and involves finding x for g^x (mod p). A typical method for this is the Baby-Step-Giant-Step method.

Within normal logarithms we define:

h=a^x

So if we want to find the value of x, we use:

x=log_a (h)

So 10⁴ is 10,000, and the inverse log is log_10(10,000) is 4. Within discrete logarithms we introduce a finite field with a prime number. For this we have:

h=g^x (mod p)

and where p is the prime number. It is thus a difficult task to find the value of x which has been used, even if we know h, g and p. We use discrete logarithms with the Diffie-Hellman key exchange method and in ElGamal encryption.

Let’s start with an example:

20=5^x (mod 53)

In this case we have g= 5, h= 20 and p= 53, and want to find x. We first determine the square root of p-1, and we will round it up to the nearest integer. In this case it will be:

N =Ceiling(√(p-1)) = Ceiling(√(52)) = 7

Next we will pre-compute from 1 to N with the baby table. These will store g^i (mod p) and then store them in the form of {g^i (mod p), i}:

{1: 0, 3: 7, 5: 1, 51: 5, 42: 4, 43: 6, 19: 3, 25: 2}

For example:

If i=0, we get 50 (mod53) gives 1 {1:0}.

For i=1 we get 51(mod 53) gives 5 {5:1}

For i=2 we get 52(mod 53) gives 5 {25:2}

We now have a list of pairs from 0 to the square root of p-1. We now compute gN to give :

c=g^{N(p−2)} (mod p)

We then search through values of:

hc^x ( mod p)

until we find a match in the table. We then take this value and multiply it by N and add the value we have found:

for j in range(N):
y =
(h * pow(c, j, p)) % p
if y in t:
return j * N + t[y]

The completed code is [here]:

def solve(g, h, p):
N = int(math.ceil(math.sqrt(p - 1)))
print "N=",N
t = {}
# Baby step.
for i in range(N):
t[pow(g, i, p)]=i
print "Baby step",t
# Fermat's Little Theorem
c = pow(g, N * (p - 2), p)

for j in range(N):
y = (h * pow(c, j, p)) % p
if y in t:
return j * N + t[y]
return None

Now let’s try and example. If we try 22=5^x (mod 53) we get [here]:

g= 5
h= 22
p= 53
22 = 5 ^x (mod 53 )
==============
N= 8
Baby step {1: 0, 3: 7, 5: 1, 51: 5, 42: 4, 43: 6, 19: 3, 25: 2}
x= 9
Checking for h= 22

This gives a solution of x=9, and when we try it back in 59 (mod53) , we get a result of 22.

In real-life examples our prime numbers are likely to be extremely large, so we will need to store the square root of the number of values. For a 128-bit prime number we will need to store [here]:

>>> p=2**1281
>>> int(math.ceil(math.sqrt(p — 1)))
18446744073709551616L

In ElGamal the recommended prime number size is 2,048 bits which would fill all the memory on the plant and much more. For just 512 bit prime numbers we get [here]:

>>> p=2**5121
>>> int(math.ceil(math.sqrt(p — 1)))
115792089237316195423570985008 687907853269984665640564039457584007913129639936L

Which is a vast amount of memory.

We reduce the complexity of the problem by storing a square root of the prime number used. So for a prime number of 14,947, we now only need to 123 values. Unfortunately as our prime numbers get larger the amount of memory we need becomes vast.

And so discrete logarithms continue to be a hard task for a computer. If someone uses a 512-bit prime number, we will need to store 115,792, 089,237,316,195,423,570,985, 008,687,907,853,269,984,665,640, 564,039,457,584,007,913,129 ,639,936 values in our baby table. This would be much more than all the memory on the planet. For 2,048 bit prime numbers we would need vastly more memory. Unfortunately quantum computers do not struggle as much with discrete logarithms, and will be able to crack them in a reasonable amount of time.

Using Python, can you solve these?

1,849,836=11^x (mod 2,097,151)

711,087,309=9^x (mod 1,073,741,823)

https://asecuritysite.com/encryption/baby

and here is the code:

https://replit.com/@billbuchanan/baby

Conclusions

I love when something that I teach, comes alive in a real-life application. Go learn some cryptography:

https://asecuritysite.com/