Which is the fastest hashing method … and has O(1) complexity?

Well, the fastest hash might be just taking the first eight bytes of our data to produce a hash. But that is likely to create lots of…

Photo by Guillaume Jaillet on Unsplash

Which is the fastest hashing method … and has O(1) complexity?

Well, the fastest hash might be just taking the first eight bytes of our data to produce a hash. But that is likely to create lots of collisions where the first eight bytes are the same, such as “Edinburgh123” and “Edinburgh99”. A stronger method is to sample bytes, and merge these into a 64-byte hash. The complexity is then O(1), as it doesn’t matter how much data we have, as we just have to sample bytes at certain locations.

One method of sampling bytes is he o1hash and which was created by Wang Yi. Overall is a quick-and-dirty approach and which can be used within fast hash tables. For this we can have any size of data object, and then we just sample the bytes at given location, and produce a hash value for our lookup.

Overall o1hash samples the first, middle and last four bytes in order to produce the hash. These are merged to produce a 64-bit hash, by converting the values into an unsigned integer, and then adding the first and last value, and then multiplying by the middle value:

first=first four bytes of key;
middle=middle four bytes of key;
last=last four bytes of key;
hash= (uint64_t)(first+last)*middle;

A collision is not difficult to create, as we just have to make sure that the first four bytes, the middle four bytes and the last four bytes are the same. The following shows a C version of the code:

#include stdio.h
#include stdlib.h
#include stdint.h
#include string.h

#include "o1hash.h"

static inline uint64_t o1hash(const void *key, size_t len) {
const uint8_t *p=(const uint8_t*)key;
if(len>=4) {
unsigned first=_o1r4(p), middle=_o1r4(p+(len>>1)-2), last=_o1r4(p+len-4);
return (uint64_t)(first+last)*middle;
}
if(len){
uint64_t tail=((((unsigned)p[0])<<16) | (((unsigned)p[len>
>1])<<8) | p[len-1]);
return tail*0xa0761d6478bd642full;
}
return 0;
}
    int main(int argc, char **argv)
{
    printf("o1Hash\n");

    if (argc==2) {
    printf("Input: %s\n",argv[1]);
uint64_t res = o1hash(argv[1],strlen(argv[1]));
printf("Hash: ");
printf("%X",(unsigned int)(res>>32));
printf("%X",(unsigned int)(res & 0xffffff));
    }
    return(0);
}

A sample run:

o1Hash
Input: Testing 1, 2, 3
Hash: B5719420287501

Notice that we have 16 hex characters, and which is eight bytes (64 bits). The o1hash, though, is not secure and will reveal data elements. Here is the code:

https://asecuritysite.com/encryption/o1?val1=aaaazbbbzcccc

It is also fairly easy to create collisions. This is simple as we just have to make sure that the first four bytes, the four middle bytes and the end bytes are the same in both inputs. Here is a collision:

  • message = “aaaabbbbcccc” Try
  • message = “aaaa bbbb cccc” Try
  • message = “aaaazbbbbzcccc” Try

smhasher is one tool that can assess hashes for their performance and their security weaknesses. Basically, o1hash fails all the tests, but comes out as the fastest hashing method:

Conclusions

If you need an ultra-fast hash table, and where it does not matter that amount of data you are hashing, o1hash is a contender. But it fails most of the serious tests for a strong hashing function.