One way of hiding information is to encode it with a different character set, and using a different base. For example, Base-3 encoding uses a character set of "123", and Base-5 uses "01234".
Base Code |
Theory
To create a base, we first need to define the character set that will represent each of the characters in the base. For Base 3, we need three characters, and Base 5 needs five characters:
if (base==2): chars="01" if (base==3): chars="123" if (base==5): chars="01234" if (base==10): chars="0123456789" if (base==11): chars="0123456789A" if (base==26): chars="ABCDEFGHIJKLMNOPQRSTUVWXYZ" if (base==36): chars="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ" if (base==58): chars="123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz" if (base==62): chars="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" if (base==63): chars="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_" if (base==67): chars="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_.!~" if (base==81): chars="!\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstu"
Next to encode:
def encode(str, charset,base): val=int.from_bytes( bytes(str, 'utf-8'), "big") i=val r='' while i > 0: i, c = divmod(i, base) r = charset[c] + r return r
In this case, we basically take all of the characters and then convert them into bytes as a Big Integer. This is then converted into an integer. For example, if we have "aa", then the byte pattern will be:
a a 01100001 01100001
An an integer, the binary value of 01100001 01100001 is represented as 24,929 as an integer. Now for Base 5, we continually divide by 5, and note the remainder:
5 | 24929 4985 r 4 997 r 0 199 r 2 39 r 4 7 r 4 1 r 2 0 r 1
We read it in reverse, so that we get "1244204". A sample run is [here]:
Message: aa Type: base5 Encoding: 1244204
If we have Base-26, we can use a character set of all the uppercase letters. For "aa", we convert to an integer as 24,929. Next, we continually divide by 26:
26 | 24929 958 r 21 36 r 22 1 r 10 0 r 1
A result is 1, 10, 22 and 21. This maps to "BKWV". A sample run is [here] :
Input: aa Base: 26 Chars: ABCDEFGHIJKLMNOPQRSTUVWXYZ Converted to int: 24929 958 21 36 22 1 10 0 1 Base 26 encoding: BKWV Base 26 decoding: aa
A popular base is Base58, and which is used in Bitcoin. With this we pick 58 characters:
123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz
Notice that "0", "I" and "O" are missing. Again we will use "aa", and which, as an integer, is represent with 24,929.
59 | 24929 429 r 47 7 r 23 0 r 7
The result is 7, 23 and 47, and which maps to the Base64 characters of "8Qp". A sample run is [here]:
Input: aa Base: 58 Chars: 123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz Converted to int: 24929 429 47 7 23 0 7 Base 58 encoding: 8Qp Base 58 decoding: aa
Now we can try Base62, can now use all the numeric, upper and lowercase
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
Again, we will use "aa", and with a value of 24,929. We now divide by 62, and note the remainer:
62 | 24929 402 r 5 6 r 30 0 r 6
The result is then 6, 30, 5, and which maps to "6U5". A sample run is [here]:
Input: aa Base: 62 Chars: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz Converted to int: 24929 402 5 6 30 0 6 Base 62 encoding: 6U5 Base 62 decoding: aa
Base2
Again, we will use "aa", and with a value of 24,929. We now divide by 2, and note the remainer:
2 | 24929 12464 r 1 6232 r 0 3116 r 0 1558 r 0 779 r 0 389 r 1 194 r 1 97 r 0 48 r 1 24 r 0 12 r 0 6 r 0 3 r 0 1 r 1 0 r 1
A sample run is [here]:
Input: aa Base: 2 Chars: 01 Converted to int: 24929 12464 1 6232 0 3116 0 1558 0 779 0 389 1 194 1 97 0 48 1 24 0 12 0 6 0 3 0 1 1 0 1 Base 2 encoding: 110000101100001 Base 2 decoding: aa
Base4
Again, we will use "aa", and with a value of 24,929. We now divide by 4, and note the remainer:
2 | 24929 6232 r 1 1558 r 0 389 r 2 97 r 1 24 r 1 6 r 0 1 r 2 0 r 1
In this case, we then map to "1234". A sample run is [here]:
Input: aa Base: 4 Chars: 1234 Converted to int: 24929 6232 1 1558 0 389 2 97 1 24 1 6 0 1 2 0 1 Base 4 encoding: 23122312 Base 4 decoding: aa
Final code
The final code is:
import sys from bitstring import BitArray def encode(str, charset,base): val=int.from_bytes( bytes(str, 'utf-8'), "big") i=val print(f"Converted to int: {i}") r='' while i > 0: i, c = divmod(i, base) r = charset[c] + r print (i,c) return r def decode(v, chars,base): long_value = 0 for i, c in enumerate(v[::-1]): pos = chars.find(c) assert pos != -1 long_value += pos * (base**i) result='' while long_value >= 256: div, mod = divmod(long_value, 256) result = chr(mod) + result long_value = div result=chr(long_value) + result return(result) def encodeBase2(str): val=int.from_bytes( bytes(str, 'utf-8'), "big") return (bin(val)[2:].rjust(8*len(str),"0")) def decodeBase2(str): value = BitArray(bin=str).int val=int.to_bytes( value,byteorder= "big",length=len(str)//8) return (val.decode()) def encodeBase8(str): val=int.from_bytes( bytes(str, 'utf-8'), "big") return (oct(val).rjust(2*len(str),"0")) base=3 mystr="aaa" chars="123" if (len(sys.argv)>1): mystr=str(sys.argv[1]) if (len(sys.argv)>2): base=int(sys.argv[2]) if (base==2): chars="01" if (base==3): chars="123" if (base==3): chars="1234" if (base==5): chars="01234" if (base==10): chars="0123456789" if (base==11): chars="0123456789A" if (base==26): chars="ABCDEFGHIJKLMNOPQRSTUVWXYZ" if (base==36): chars="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ" if (base==58): chars="123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz" if (base==62): chars="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" if (base==63): chars="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_" if (base==67): chars="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_.!~" print(f"Input: {mystr}") print(f"Base: {base}") print(f"Chars: {chars}") r= encode(mystr,chars,base) print(f"\nBase {base} encoding: {r}") res=decode(r,chars,base) print(f"Base {base} decoding: {res}")