Base64 Magic!

I was just surfing Twitter, when I found this TIL (‘Today I Learned’):

Base64 Magic!

I was just surfing Twitter, when I found this TIL (‘Today I Learned’):

With this, we take a Base64 encoded string, and decode it to a byte array and print the answer. Next, we take this byte array and convert it back into a Base64 string and print it. And — as if by magic — the value for the Base64 string generated differs from the original. Basically, this operation should just give the same as the output, so I felt a slight panic coming on.

And so I grabbed some code, and tried it with a range of values:

When I ran the program, it showed that the decoding back into Base64 most of the time gave the wrong answer, and where a “w” or a “Q” replaced the second Base64 character:

00== d3 0w== d3
10== d7 1w== d7
20== db 2w== db
30== df 3w== df
40== e3 4w== e3
50== e7 5w== e7
60== eb 6w== eb
70== ef 7w== ef
80== f3 8w== f3
90== f7 9w== f7
a0== 6b aw== 6b
b0== 6f bw== 6f
c0== 73 cw== 73
d0== 77 dw== 77
e0== 7b ew== 7b
f0== 7f fw== 7f
01== d3 0w== d3
11== d7 1w== d7
21== db 2w== db
31== df 3w== df
41== e3 4w== e3
...

ee== 79 eQ== 79
fe== 7d fQ== 7d
0f== d1 0Q== d1
1f== d5 1Q== d5
2f== d9 2Q== d9
3f== dd 3Q== dd
4f== e1 4Q== e1
5f== e5 5Q== e5
6f== e9 6Q== e9
7f== ed 7Q== ed
8f== f1 8Q== f1
9f== f5 9Q== f5
af== 69 aQ== 69
bf== 6d bQ== 6d
cf== 71 cQ== 71
df== 75 dQ== 75
ef== 79 eQ== 79
ff== 7d fQ== 7d

So, I wondered if it was a Golang problem, and so I tried some Python code:

And magically, it is wrong again (D3 is 211 in decimal):

(‘\xd3’, ‘0w==’)

Why?

Basically what is happening here relates to the conversion from Base64 into bytes. With Base64 we take six bits at a time, and encode as a Base64 character (see table below). If we do it long-handed, then a “0” is 0x52 (110100b) and so we get:

“00==” is 110100 __ 11 0100

And where “=” is just padding until we get multiples of four characters.

But, in the conversion, the Base64 converter to a byte array doesn’t like not having a multiple of eight bits, and so it chops off the 4 bits at the end, and we get (where “_” has been truncated):

“0w==” is 110100 11 _ _ _ _

This is 211 is decimal, and is the value return. We thus only return a single byte (rather than the 12 bits of the original).

And so, it’s solved.

Conclusions

Don’t just implement Base64 signatures in your code, and assume that they will work. A signature once applied, cannot be undone. Make sure that you are always dealing with multiples of eight bits in your code, and that you test your code. Any bit value which is not a multiple of 8 bits, will most likely give the wrong Base64 string value when converted back.