Can I Create A Hash Collision In An Instance? Well, Yes!

A hash collision is created when we take two different inputs of data, and then create the same hash. One way of doing with is to search…

Can I Create A Hash Collision In An Instance? Well, Yes!

A hash collision is created when we take two different inputs of data, and then create the same hash. One way of doing with is to search for two data elements and add random data in order to find the same hash. With GPUs, and with the MD5 method it is now possible to take two images and eventually create the same hash value for them.

For MD5 we have a hash of 128 bits, and so has 2¹²⁸ different hashes. Unfortunately, it doesn’t take too long to create a collision, and where we have different content producing the same hash. Recently, though, Mat McHugh showed that he could produce the same hash signature for different images, using hashclash, and for just 65 cents on the Amazon GPU Cloud, and took just 10 hours to process. He created these two images which generate the same hash signature (Figure 1). If we check the hash signatures we get:

C:\openssl>openssl md5 hash01.jpgMD5(hash01.jpg)= e06723d4961a0a3f950e7786f3766338
C:\openssl>openssl md5 hash02.jpgMD5(hash02.jpg)= e06723d4961a0a3f950e7786f3766338
Figure 1: Images

But can we find instant collisions? Well, we can if we use a natural collision. For this if we have two data elements of a and b, then if H(a)=H(b), we can also create a hash for H(a || c) = H(b || c) and where “||” is a concatenation. In the following we have a collision. An example of a collision in MD5 is:

0e306561559aa787d00bc6f70bbdfe3404cf03659e704f8534c00ffb659c4c8740cc942feb2da115a3f4155cbb8607497386656d7d1f34a42059d78f5a8dd1ef
0e306561559aa787d00bc6f70bbdfe3404cf03659e744f8534c00ffb659c4c8740cc942feb2da115a3f415dcbb8607497386656d7d1f34a42059d78f5a8dd1ef

If we now add “hello” to this data we get [here]:

b'0e306561559aa787d00bc6f70bbdfe3404cf03659e704f8534c00ffb659c4c8740cc942feb2da115a3f4155cbb8607497386656d7d1f34a42059d78f5a8dd1ef' 
Hex: cee9a457e790cf20d4bdaa6d69f01e41
 b'0e306561559aa787d00bc6f70bbdfe3404cf03659e744f8534c00ffb659c4c8740cc942feb2da115a3f415dcbb8607497386656d7d1f34a42059d78f5a8dd1ef' 
Hex: cee9a457e790cf20d4bdaa6d69f01e41

Adding:  hello
 b'0e306561559aa787d00bc6f70bbdfe3404cf03659e704f8534c00ffb659c4c8740cc942feb2da115a3f4155cbb8607497386656d7d1f34a42059d78f5a8dd1ef68656c6c6f' 
Hex: 4d0c8baa8a036cff537f00d6e26bbef5
 b'0e306561559aa787d00bc6f70bbdfe3404cf03659e744f8534c00ffb659c4c8740cc942feb2da115a3f415dcbb8607497386656d7d1f34a42059d78f5a8dd1ef68656c6c6f' 
Hex: 4d0c8baa8a036cff537f00d6e26bbef5

We see that the original data gives the same MD5 hash value (cee9a457e790cf20d4bdaa6d69f01e41), and when we add the string of “hello”, we also get a collision of “4d0c8baa8a036cff537f00d6e26bbef5”.

An outline of the code is [here]:

import hashlib
from binascii import unhexlify,hexlify
import sys
m = hashlib.md5()
m1 = unhexlify('0e306561559aa787d00bc6f70bbdfe3404cf03659e704f8534c00ffb659c4c8740cc942feb2da115a3f4155cbb8607497386656d7d1f34a42059d78f5a8dd1ef')
m2 = unhexlify('0e306561559aa787d00bc6f70bbdfe3404cf03659e744f8534c00ffb659c4c8740cc942feb2da115a3f415dcbb8607497386656d7d1f34a42059d78f5a8dd1ef')
# '0e306561559aa787d00bc6f70bbdfe3404cf03659e7X4f8534c00ffb659c4c8740cc942feb2da115a3f415dcbb8607497386656d7d1f34a42059d78f5a8dd1ef'
m1 = unhexlify('4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa200a8284bf36e8e4b55b35f427593d849676da0d1555d8360fb5f07fea2' )
m2 = unhexlify('4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa202a8284bf36e8e4b55b35f427593d849676da0d1d55d8360fb5f07fea2')
m1=unhexlify('d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f8955ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5bd8823e3156348f5bae6dacd436c919c6dd53e2b487da03fd02396306d248cda0e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f9652b6ff72a70')
m2=unhexlify('d131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f8955ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5bd8823e3156348f5bae6dacd436c919c6dd53e23487da03fd02396306d248cda0e99f33420f577ee8ce54b67080280d1ec69821bcb6a8839396f965ab6ff72a70')
# d131dd02c5e6eec4693d9a0698aff95c2fcab58-12467eab4004583eb8fb7f8955ad340609f4b30283e4888325-1415a085125e8f7cdc99fd91dbdX280373c5bd8823e3156348f5bae6dacd436c919c6dd53e2-487da03fd02396306d248cda0e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f965-b6ff72a70
word = "hello"
if (len(sys.argv)>1):
word=str(sys.argv[1])
a =word.encode()
mm1 = hashlib.md5()
mm2 = hashlib.md5()

mm1.update(m1+a)
mm2.update(m2+a)
print (hexlify(m1+a),"\nHex:",mm1.digest().hex())
print ("\n",hexlify(m2+a),"\nHex:",mm2.digest().hex())

The code used is here: