The Hashlib, OpenSSL and Length Extension Attacks

In Python 2.7, crptography grew up and supported the mighty Hashlib. Up to that point, secure hashing was supported with Python library…

Hashlib, OpenSSL and Length Extension Attacks

In Python 2.7, crptography grew up and supported the mighty Hashlib. Up to that point, secure hashing was supported with Python library integrations that could never be fully trusted. Overall, the “ground zero” of cryptography has nearly always been OpenSSL, and so Hashlib provides a way to access OpenSSL and exposes it to Python. The core integration was written by Gregory P Smith:

Overall, it is the FIPS 180–4 standard that defines the core methods which can be used in a production environment, and these include SHA1, SHA224, SHA256, SHA384, and SHA512. These hashes are based on the Merkle-Damgard construction and which can be vulnerable to length extension attacks ( SHA1, SHA256, and SHA512). In order to create a hashing method which avoided the weaknesses of previous methods, NIST defined SHA-3 as a new standard. Along with this, BLAKE2b/BLAKE2s are also free from length extension attacks.

The main constructors for hash algorithms are sha1(), sha224(), sha256(), sha384(), sha512(), sha3_224(), sha3_256(), sha3_384(), sha3_512(), shake_128(), shake_256(), blake2b(), and blake2s(). With SHAKE128 and SHAKE256 we can get a variable length hash [here]:

import hashlib
import sys

s="Nobody inspects the spammish repetition"
hex=False
showhex="No"

if (len(sys.argv)>1):
s=str(sys.argv[1])

if (len(sys.argv)>2):
showhex=str(sys.argv[2])

print("Message: ",s)

if (showhex=="yes"): hex=True

s1=s.encode()
try:
if (hex==True): s1 = binascii.a2b_hex(s)
else: s1=s.encode()

print("\nMD5:\t",hashlib.md5(s1).hexdigest())
print("SHA1:\t",hashlib.sha1(s1).hexdigest())
print("SHA224:\t",hashlib.sha224(s1).hexdigest())

print("SHA256:\t",hashlib.sha256(s1).hexdigest())
print("SHA384:\t",hashlib.sha384(s1).hexdigest())
print("SHA512:\t",hashlib.sha512(s1).hexdigest())

print("SHA3-224:\t",hashlib.sha3_224(s1).hexdigest())
print("SHA3-256:\t",hashlib.sha3_256(s1).hexdigest())
print("SHA3-384:\t",hashlib.sha3_384(s1).hexdigest())
print("SHA3-512:\t",hashlib.sha3_512(s1).hexdigest())

print("Blake2s:\t",hashlib.blake2s(s1).hexdigest())
print("Blake2b:\t",hashlib.blake2b(s1).hexdigest())

print("SHAKE128:\t",hashlib.shake_128(s1).hexdigest(16))
print("SHAKE128:\t",hashlib.shake_128(s1).hexdigest(32))
print("SHAKE128:\t",hashlib.shake_128(s1).hexdigest(64))

print("SHAKE256:\t",hashlib.shake_256(s1).hexdigest(16))
print("SHAKE256:\t",hashlib.shake_256(s1).hexdigest(32))
print("SHAKE256:\t",hashlib.shake_256(s1).hexdigest(64))
except Exception as e:
print(e)

A sample run [here]:

Message:  fred

MD5: 570a90bfbf8c7eab5dc5d4e26832d5b1
SHA1: 31017a722665e4afce586950f42944a6d331dabf
SHA224: b280f6be0fb6617394fd0db9a5ed7861820e27b816d141872f8dca69
SHA256: d0cfc2e5319b82cdc71a33873e826c93d7ee11363f8ac91c4fa3a2cfcd2286e5
SHA384: 428024be236d042b4dca337ec8090410be82863b545452834ca1eb225c7f62a207ac31bb4f1d9e2616c13ca017d26a39
SHA512: 3566c33c35c59ba2587bac2a81526cf33ea0928111ed9e1616aa43fcffbc3f5d07e058c80898cd286095b7587ad5edd3511fd943fd7d7743b1ded724262026f3
SHA3-224: 47ff7f7707e40de7e531bbf75fd5f6dfa1b1ae9a3033fae0b4c1bece
SHA3-256: 901e5b95a7c6f4c25f1dbb31931585a1aac6cac21eb1f09a39411f5ba4e710d6
SHA3-384: a0fc5aee37444122b8aec531e132633da6ed746779af999d028d9ecbba7214a453b0ca13c9cbfbda481a1d7f74341f2c
SHA3-512: 9d15efc1b71e0143a4daad34d5bb2e97d4968d80269e49c633aac69cc13d990b25295685eacb5b29eb584a1f6dd92a8e91e257c0c493a869310bee0b8a2ef440

Blake2s: 6552aced3ee8c909b537bf61057745d85f17b0ad52457091c1e508105ef912ea
Blake2b: 817bdde172952d439c9ccb1a4d91476c93cd4578038715f139d9095a9f7a18026463d67e3c666733a8e4e733195414e7888631057b3857418f51a61848467d10

SHAKE128: ed8642727f1405d9473884d2626c08ac
SHAKE128: ed8642727f1405d9473884d2626c08ac0897dc80b29519997bbd43716ecbc84d
SHAKE128: ed8642727f1405d9473884d2626c08ac0897dc80b29519997bbd43716ecbc84d24332f4aeb0bc1270d00a08a8f300091570a0be9daea95e744dd61b94fe1f375

SHAKE256: 834a5e9fefb7a7501bc9b7d7b58b1ebb
SHAKE256: 834a5e9fefb7a7501bc9b7d7b58b1ebbe68962e7007af7a3b9af3cf1c70854e5
SHAKE256: 834a5e9fefb7a7501bc9b7d7b58b1ebbe68962e7007af7a3b9af3cf1c70854e5d2e95a3b5a1b51758c30b111306eb8fc2ae3de073fdbbb5527b00b102381a622

Length attack

The foundation of trust in cybersecurity is created by the simple concept of data hashing, and where we take data and create a fixed-length hash for the data. If we cannot trust our hashing methods, we are in trouble. When we create the perfect message hash, we need to make sure we have:

  • Collision resistance. This is where it is extremely difficult to find two messages which have the same hash. Thus we should not be able to find the has of two messages (M1, and M2) that are the same within a reasonable time: H(M1)=H(M2).
  • Pre-image resistance. If we already have a hash value (h), it should be extremely difficult to find a message which will give the same hash. Thus for a given hash (h), it is difficult to find a message (M1) for H(M1)=h.

The original hash methods were often based on the Merkle-Damgård (MD) construction. With this, we create a hash function using blocks of data. Based on the MD construct, Ron Rivest created the MD5 hashing method, and it was widely adopted in the industry. It works by taking a static initialisation vector (IV) and then feeding this into a one-way function (f), along with a block of the message. We feed this output into the next stage and so on until we get to a message pad at the end:

The one-way function (f) will generally compress the data and produce fewer bits out than are fed in. Unfortunately, the MD construct has many weaknesses, and one of the most serious is the length extension attack. With this, an adversary (Eve) can take a hash for an unknown message and then add additional data to produce a new valid hash.

So Bob could take a hash of a password that he and Alice know (“qwerty123”) and append it with a message (“hello”) to produce:

H(Password || Message)

where “||” identifies the appending of one string onto another. Thus, when Bob sends a message to Alice, she will prepend the message with the shared password, and generate the same hash. In this way, Bob has proven the message and that he knows the secret password. This is a message authentication code (MAC) and validates that Bob knows a shared secret and the message. But, the MD method is flawed as it is possible for Eve to take a previous hash for a known message and then append a new method to produce:

H(Password || Original Message || New Message)

In this way, Eve does not know the password but can still generate a valid hash, and add her message onto it. An outline of the code is:

import hashpumpy
import hashlib
import sys
password=b'password'
message= b'message'
addition = b'addition'
if (len(sys.argv)>1):
password=(sys.argv[1]).encode()
if (len(sys.argv)>2):
message=(sys.argv[2]).encode()
if (len(sys.argv)>3):
addition=(sys.argv[3]).encode()

# Compute a previous hash for H(Password || Message)
m = hashlib.sha1()
m.update((password+message))
rtn=m.hexdigest()
print ("Previous hash: ",rtn)

# Compute a hash for H(Password || Message || Addition)
rtn = hashpumpy.hashpump(rtn, message, addition, len(password))
print ("New hash: ",rtn[0])
print ("New message: ",rtn[1])
m = hashlib.sha1()
m.update(password+rtn[1])
rtn=m.hexdigest()
print ("Computing new hash (password+newdata): ",rtn)

A sample run for a message of “message” and with the addition of the text of “addition” is:

Previous hash:  22583ca8f00efff6296b4b571b9c2e1bcf22a99a
New hash: dd448d0874b738ca1b85bc00e151fbf16393ce4a
New message: b'message\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00xaddition'Computing new hash (password+newdata): dd448d0874b738ca1b85bc00e151fbf16393ce4a

In this case, the hash of H(Password || “message”) is ‘22583ca8f00efff6296b4b571b9c2e1bcf22a99a’, and Eve can now use this to generate a new valid hash, without the knowledge of the password. We can see that Eve can generate some extra bytes in the message, and then add a new message and create a valid hash.

The HMAC method overcomes the problems identified in the length extension attack.