FalkHash (Exotic Probably Shitty Hash) Makes Me Smile

Well, some of the cybersecurity research papers might make your head spin, but underneath there’s something that is perhaps a simple method…

https://asecuritysite.com/encryption/smh_siphash

FalkHash (Exotic Probably Shitty Hash) Makes Me Smile

Well, some of the cybersecurity research papers might make your head spin, but underneath there’s something that is perhaps a simple method that can be explained easily. Obviously, it is important to have scientific rigour, formal proofs, extensive evaluation, and so on. But, in crypto, what’s wrong with saying spin round 10 times, move things around, and shuffle things up a bit, or taking existing methods and scrambling them together, and make them fast. If there’s a test around to check the goodness of the method you have employed, then test is against that, until it passes. It could thus be argued that it was the test that was at fault if we found that things were not quite designed as they should.

In cybersecurity, we need hashing methods to provide the integrity of data. And for this, we turn to SHA-1, SHA-256 and SHA-3. These have been designed so that they do not have too many security weaknesses (and where an adversary could modify data to produce a collision to two different data sets. But, in many applications, such as in hash tables and Bloom filters, we just need a way to match some data to an index value, and where the hash can be the index. As it’s a look-up table approach, the hashing method does not have to have the same security levels, and where it obviously needs to be fast, as we do not way a delay in looking up data. And, so, FalkHash is a blisteringly fast non-cryptography hashing method that beats virtually every single hashing method around for speed:

In this test, it is around 20 times faster than the fastest cryptography hashing method (Blake 3), and beats xxh3 (known to be one of the fastest and most secure methods for non-cryptography hashes). The GitHub source code is honest and opens up on the way the hash was design, and where it fixed a problem [here]:

The tests are performed on smhasher, and the designer states that they continually updated it until it passed the smhasher tests. As we can see, smhasher has perhaps caught up a bit, and now shows that it has problems with LongNeighours and Sparse. It also is machine-specific, but in a world dominated by x64 architecture within Cloud systems, it is not a major problem.

FalkHash — to give it, its other name: Exotic Probably Shitty Hash - produces a 128-bit hash. It was designed for the falkervisor code base. I love FalkHash as it’s a few lines of code, and basically dives into the AES-NI instruction set, and use all those 128-bit registers in a way they were meant to. In a 32-bit and 64-bit processing world, the AES-NI instruction set opens up 128-bit registers, and with AES integration. These commands include _mm_loadu_si128 (loading data into 128-bit registers), _mm_xor_si128 (XOR a 128 bit value with a registers), and _mm_aesenc_si128 (performs a one round AES encryption using a round key). To make use of these instructions, in Visual Studio, we can compile with an enhanced instruction set and use SIMD (Single Instruction, Multiple Data) 2 extensions and compile for x64 architectures:

Figure 3: Building for x64 (64-bit architecture)

In the following, I have missed out the padding of the data buffer. We start with the data (buf) and a 128-bit seed value, and end up with a 128-bit hash value:

uint8_t* buf = (uint8_t*)pbuf;

uint64_t iv[2];

__m128i hash, seed;

/* Load the IV into a __m128i */
 seed = _mm_loadu_si128((__m128i*)iv);

/* Hash starts out with the seed */
 hash = seed;

/* Load up the data into __m128is */
  piece[0] = _mm_loadu_si128((__m128i*)(buf + 0 * 0x10));
  piece[1] = _mm_loadu_si128((__m128i*)(buf + 1 * 0x10));
  piece[2] = _mm_loadu_si128((__m128i*)(buf + 2 * 0x10));
  piece[3] = _mm_loadu_si128((__m128i*)(buf + 3 * 0x10));
  piece[4] = _mm_loadu_si128((__m128i*)(buf + 4 * 0x10));

/* xor each piece against the seed */
  piece[0] = _mm_xor_si128(piece[0], seed);
  piece[1] = _mm_xor_si128(piece[1], seed);
  piece[2] = _mm_xor_si128(piece[2], seed);
  piece[3] = _mm_xor_si128(piece[3], seed);
  piece[4] = _mm_xor_si128(piece[4], seed);

/* aesenc all into piece[0] */
  piece[0] = _mm_aesenc_si128(piece[0], piece[1]);
  piece[0] = _mm_aesenc_si128(piece[0], piece[2]);
  piece[0] = _mm_aesenc_si128(piece[0], piece[3]);
  piece[0] = _mm_aesenc_si128(piece[0], piece[4]);

/* Finalize piece[0] by aesencing against seed */
  piece[0] = _mm_aesenc_si128(piece[0], seed);

/* aesenc the piece into the hash */
  hash = _mm_aesenc_si128(hash, piece[0]);

buf += 0x50;
  len -= 0x50;
 }

/* Finalize hash by aesencing against seed four times */
 hash = _mm_aesenc_si128(hash, seed);
 hash = _mm_aesenc_si128(hash, seed);
 hash = _mm_aesenc_si128(hash, seed);
 hash = _mm_aesenc_si128(hash, seed);

Its security strength involves using the AES method for one round and in XOR’ing with the seed value. In the following I just use the seed value of zero:

puts("\n===FalkHash===");

__m128i h128 = falkhash((void *)buff, strlen(buff), 0);
  
  
  uint64_t n[2];

_mm_storeu_si128((__m128i*)n, h128);

printf("\nFalkHash128\t\t%llx%llx\n", n[1], n[0]);

and here is a sample run:

String: test
Seed: 0

===FalkHash===

FalkHash128  aaf0292a839b7393 8fb1dce8405f53fa

Here is the method running:

https://asecuritysite.com/encryption/smh_falkhash

Conclusions

Okay. As a researcher, you read papers based on scientific methods, such as for the SipHash method and produced by Jean-Philippe Aumasson and Daniel J. Bernstein in 2012 [1]. In their paper, they analysed problems in some of the existing hashing methods and produced a new one that addressed these weaknesses [here]. Overall the paper has a core methodology, and there is formal proof of security levels.

But, on the other hand, don’t dismiss those contributions which just experiment with some code, and pass the test. The peer review nature then becomes open to other to test and evaluate, and move away from the more formal peer review process used in academic journals. If someone finds a bug, the code can be easily updated to address. There is, though, space for both types of methods, and especially for academic researchers to open up their scientific areas and engage with others in industry, as when industry has a problem, it finds a solution. It might not be the best solution, but it fixes the problem. Sometimes, though, in academic research, we find a solution that is looking for a problem.