The Wonderful World of Hashing … Some Biba, Entropy Calculations, and Virtually Every Hashing…

At the core of cybersecurity is … trust. This can be human trust within a system, such as seeing that there’s a green bar on an HTTPs URL…

Photo by Alex Motoc on Unsplash

The Wonderful World of Hashing … Some Biba, Entropy Calculations, and Virtually Every Hashing Method Under The Sun

At the core of cybersecurity is … trust. This can be human trust within a system, such as seeing that there’s a green bar on an HTTPs URL, or that a user is reassured by fingerprint recognition on their banking app. But human trust is only part of it, as we need digital trust. For this, we need to make sure that our data is what it should be and that no one has changed it from the state that it is meant to be in. And for this, we turn to one magical method .. hashing.

With hashing we want to build in some form of integrity, and where we can check that the data has not been modified. This might relate to a file that you want to download, a driver on your system, or in the text in an email. At the core of this checking is a simple hashing function.

With hashing we perform a one-way function, that takes us from the data, and then produces a hashed value. We should be able to mathematically do the reverse, as we don’t want someone to reverse our hashes. But, at times, Eve can actually reverse the hash if she has already computed the match between the data and the generated hash. And so we also add a bit of salt, and where the salt value changes the hash for a given input.

So, here’s a bit of background on hashing:

Balls in bins approach (Biba)

We can analyse a hash function by looking at the balls in bins problem. In this, we imagine we are in a fairground and have a number of balls to throw in bins. We then throw the balls randomly, and they will go into one of the bins. If we get two balls in a bin we will win. So take five bins and determine the probability of us getting two balls in a bin at any given throw [here]:

Number of bins: 5
Average number of throws 3.28
Throw Prob of winning
2 0.200
3 0.520
4 0.808
5 0.962
6 1.000

We can use this as an analogy for hashing and where we create a collision for different data elements leading to the same hashed value. In this case with five hits, we will generate a collision every 3.28 attempts. With 65,536 bins (64-bits), we have an average of 321.51 attempts before we will get two balls in a bin [here]:

Number of bins: 65536
Average number of throws 321.515493347

And, what about birthdays? Well, I often do a live demo to my class around collisions within birthdays, and that you just need a class of around 25 students to have a 50% chance of two students having the same birthday. This is equivalent to having 365 bins and throwing the balls into the bin in a random way. For this, on average, we will take 24.61 throws to get two balls in a bin [here]:

Number of bins: 365
Average number of throws 24.6165858946

So, the Biba concept allows us to understand for a given number of bits that we have to represent a hash value, that we can determine how often a collision of hash values will occur.

Entropy

The true measure of the actual information within a hashed value is measured by entropy. This determines the actual amount of information contained in the data, and it is measured in bits. If the hash is truly random, the number of bits in the hash will be the entropy value. Otherwise, if we use a 32-bit hash, and only have 1,024 different values to hash, the entropy will be 10 bits.

To calculate the entropy we take the number of possible input values for our hashing (count) and compute:

Entropy=log10(count)/log10(2)

This is basically a log2() function but converted into log10 (as not many calculators have log2() on them). For a five-character password for lower case, we have 26⁵ different possible passwords, so the key entropy is:

Entropy = log10(26⁵)/log10(2) = 23.5 bits

Overall 23.5 bits is a low value, as the state-of-the-art in hash cracking is around 72-bits. Now let’s try for different password lengths [here]:

We can see now, that we get 56.41 for a 12 character password, but this is not strong. If we try upper and lower case, it improves to 68.41 bits, and which is near the limits of hash cracking [here]:

The reason we can crack for 12 characters, is that we can employ parallel processing with GPUs or ASICs. In order to measure the strength of the hash of a password, we thus turn to the true measure of its strength, the key entropy value.

Other hashing methods

And so, what about the hashing methods? We can authenticate messages, have lightweight versions of hashes, and so much more. Here are most of the main methods:

  • cryptographic hashes. These are fast and secure, such as SHA-256, Blake 3 and SHA-3. Examples here.
  • message authentication codes (MACs) — where we can have a secret key between Bob and Alice and which they sign hashes with. Examples here.
  • light-weight hashes. These run well on resource-constrained devices, such as for sensors and IoT devices. Examples here.
  • non-crpyto hashing. These are ultra-fast, and not as secure as the cryptography hashes. Examples include Farm and xxHash. Examples here.
  • one-time passwords. These can be generated so that a unique hash is generated by a count or by time. Examples here.
  • quantum robust hashes. These allow digital signatures to scale into an era with quantum computers and used hash-based signatures. Examples here.
  • similarity hashes. These look to match similar objects, such as similar words or similar images. Examples here.
  • key derivation function (KDF) hashes. These generate encryption keys from passwords. Examples here.