Creating a Base Cipher

How can malware code get around being detected? Well, one way is to convert the strings that identify it (such as for its code) into a…

Photo by luis arias on Unsplash

Creating a Base Cipher

How can malware code get around being detected? Well, one way is to convert the strings that identify it (such as for its code) into a number format that hides the string and then for the malware to convert it back again. The skill of malware analysis is often being able to piece together different encoding methods in order to recover the code and/or data.

If you are into cybersecurity, you will know our code bases are 2 (binary) and 8 (hex). So, let’s look at converting strings into other bases, and then decode them.

Base forms

To create a base, we first need to define the character set that will represent each of the characters in the base. For Base 3, we need three characters, and Base 5 needs five characters:

if (base==2): chars="01"
if (base==3): chars="123"
if (base==5): chars="01234"
if (base==10): chars="0123456789"
if (base==11): chars="0123456789A"
if (base==26): chars="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
if (base==36): chars="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
if (base==58): chars="123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz"
if (base==62): chars="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
if (base==63): chars="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_"
if (base==67): chars="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_.!~"
if (base==81): chars="!\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstu"

Next to encode:

def encode(str, charset,base):
val=int.from_bytes( bytes(str, 'utf-8'), "big")
i=val
r=''
while i > 0:
i, c = divmod(i, base)
r = charset[c] + r
return r

In this case, we basically take all of the characters and then convert them into bytes as a Big Integer. This is then converted into an integer. For example, if we have “aa”, then the byte pattern will be:

a         a
01100001 01100001

An an integer, the binary value of 01100001 01100001 is represented as 24,929 as an integer. Now for Base 5, we continually divide by 5, and note the remainder:

5 | 24929
4985 r 4
997 r 0
199 r 2
39 r 4
7 r 4
1 r 2
0 r 1

We read it in reverse, so that we get “1244204”. A sample run is [here]:

Message:	 aa
Type: base5
Encoding: 1244204

If we have Base-26, we can use a character set of all the uppercase letters. For “aa”, we convert to an integer as 24,929. Next, we continually divide by 26:

26 | 24929
958 r 21
36 r 22
1 r 10
0 r 1

A result is 1, 10, 22 and 21. This maps to “BKWV”. A sample run is [here]:

Input: aa
Base: 26
Chars: ABCDEFGHIJKLMNOPQRSTUVWXYZ
Converted to int: 24929
958 21
36 22
1 10
0 1

Base 26 encoding: BKWV
Base 26 decoding: aa

A popular base is Base58, and which is used in Bitcoin. With this we pick 58 characters:

123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz

Notice that “0”, “I” and “O” are missing. Again we will use “aa”, and which, as an integer, is represent with 24,929.

59 | 24929
429 r 47
7 r 23
0 r 7

The result is 7, 23 and 47, and which maps to the Base64 characters of “8Qp”. A sample run is [here]:

Input: aa
Base: 58
Chars: 123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz
Converted to int: 24929
429 47
7 23
0 7
Base 58 encoding: 8Qp
Base 58 decoding: aa

Now we can try Base62, can now use all the numeric, upper and lowercase

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

Again, we will use “aa”, and with a value of 24,929. We now divide by 62, and note the remainer:

62 | 24929
402 r 5
6 r 30
0 r 6

The result is then 6, 30, 5, and which maps to “6U5”. A sample run is [here]:

Input: aa
Base: 62
Chars: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
Converted to int: 24929
402 5
6 30
0 6

Base 62 encoding: 6U5
Base 62 decoding: aa

Base2

Again, we will use “aa”, and with a value of 24,929. We now divide by 2, and note the remainer:

2 | 24929
12464 r 1
6232 r 0
3116 r 0
1558 r 0
779 r 0
389 r 1
194 r 1
97 r 0
48 r 1
24 r 0
12 r 0
6 r 0
3 r 0
1 r 1
0 r 1

A sample run is [here]:

Input: aa
Base: 2
Chars: 01
Converted to int: 24929
12464 1
6232 0
3116 0
1558 0
779 0
389 1
194 1
97 0
48 1
24 0
12 0
6 0
3 0
1 1
0 1

Base 2 encoding: 110000101100001
Base 2 decoding: aa

Here is the final program:

https://asecuritysite.com/cipher/mybase