JP + djb = SipHash

Two of the best cybersecurity researchers around are: JP Aumasson [here]; and Daniel J Bernstein (djb) [here]. JP is famous for co-creating…

JP + djb = SipHash

Two of the best cybersecurity researchers around are: JP Aumasson [here]; and Daniel J Bernstein (djb) [here]. JP is famous for co-creating the fast hashing methods of BLAKE2 [here] and BLAKE3, and in the co-creation of the GPU-busting hashing method of Argon 2. With djb, we see a long line of innovation, including in creating ChaCha20, Salsa20 and Curve 25519.

Around 2011/2012, JP Aumasson and Daniel J Bernstein (djb) actually worked together to identify some major risks around existing hashing methods [here]:

For this, JP and djb outlined how a DoS (Denial of Service) could be created on a hashtable by sending the same hash (a multicollision) to a server. This results in a worst-case insert time and can result in a DoS against the hashing method. The example given is to send 2MB of POST data consisting of 200,000 of the same 10-byte string and which results in 40,000,000,000 string comparisons — having an impact of a computation time of around 10 seconds on a 2 GHz machine. They found MurmurHash2/3 and Google’s CityHash as being vulnerable to these attacks. Google has since released FarmHash in order to address the weakness.

But for JP and djb it was not all about outlining the weakness, but where they fixed the problem with SipHash [1]. This addressed some of the issues caused by DoS attacks against hashing methods [here]:

It contained the bold statement of:

“We propose that hash tables switch to the SipHash as a hash function”.

Overall SipHash is a key-based hashing function, and where we can use a secret key to produce a hashed value which cannot be reversed. The hash is thus generated from a key (K_in) to give SipHash(D_in,K_in). Without the knowledge of the key, it will be extremely difficult to determine the input data value.

SipHash is seen as one of the best non-cryptography hashes around and is generally free from security issues. It uses a tree hash approach, and where we start by breaking up the input data and then hashing these together in a tree structure to eventually provide a root hash value. SipHash uses Intel’s SIMD (Single Instruction, Multiple Data) extensions, and uses the 16 SIMD registers to perform fast multiplications. Overall, as with SipHash, it is a key-hashed method, and where we can apply a secret key to extend the range of hashes that are possible for a given input. SipHash uses a 128-bit key for the hash. Overall for MACs, SipHash is much faster than HMAC.

Siphash

The SipHash produces a hash function which is optimised for speed on short messages. It is typically used in networking applications and is simpler than cryptographic hashing methods. It produces a 64-bit output and can produce Gigahashes/sec. It is not quite as fast as xxHash, but is considerably faster than MD5 and SHA-1. A sample run is [here]:

======Siphash=========
Key:		0123456789ABCDE0
Message:	hello
Hash:		10442178880121868220

The performance of the Siphash is fairly good, but nowhere near as fast as xxHash64 Cityhash64, Spooky and Farm64:

The code is:

C++ implementation

In the following, we will use a key of ‘0123456789ABCDEF’, and which is 16 bytes (128 bits):

https://asecuritysite.com/hash/smh_siphash?val1=a

I have checked this against this [here] and for a key of ‘0123456789ABCDEF’ we should get 12398370950267227270 (0xac0fdcb09c334c86).

The following shows a C++ version of the code:

A sample run with a key of ‘0123456789ABCDEF’ is:

String: test
Seed: 0
=== SipHash ===
SipHash 64		5661403847363678525 Hex: 4e9158cdc424513d
SipHash13		10175878653605156576 Hex: 8d37fbb8a99d96e0
HalfSipHash		693022438 Hex: 294eaee6

Conclusions

Too often in cybersecurity, we have many people who can find vulnerabilities, but not actually fix them. In this case, two great security researchers found a vulnerability and were then able to fix it.