Moving Into The 21st Century for Data: Scrypt, Tokenization and Python

We live in a 20th Century world, with our spreadsheet and databases. Very little trust is added into our data, and very little protection…

Photo by Markus Spiske on Unsplash

Moving Into The 21st Century for Data: Scrypt, Tokenization and Python

We live in a 20th Century world, with our spreadsheet and databases. Very little trust is added into our data, and very little protection is ever applied. This leads to a world where cybercriminals can thrive, as it is easy to spoof any data elements. An improved data world is to tokenize data, and where we encrypt the data, along with applying hashing methods to prove the sender and that the data has not been changed. Fernet tokens are one of the easiest methods to encapsulate data in the form of a token.

Fernet is a symmetric encryption method that makes sure that the message encrypted cannot be manipulated/read without the key. It uses URL safe encoding for the keys. Fernet also uses 128-bit AES in CBC mode and PKCS7 padding, with HMAC using SHA256 for authentication. The IV is created from os.random(). All of this is the kind of thing that good software needs. AES is top drawer encryption, and SHA-256 avoids many of the problems caused by MD5 and SHA-1 (as the length of the hash values is too small). With CBC (Cipher Block Chaining) we get a salted output, and which is based on a random value (the IV value). And with HMAC we can provide authenticated access from both sides. In this case, we will use scrypt to generate the encryption key from a salt value and a password. To generate the same encryption key we need the salt value and the password.

The token has a version number, a timestamp, the IV, the ciphertext and an HMAC signature:

  • Version: 8 bits
  • Timestamp: 64 bits (the number of seconds since between January 1, 1970 UTC and the time of the encryption).
  • IV: 128 bits
  • Ciphertext — variable length: Multiple of 128 bits
  • HMAC: 256 bits

Here is an example:

67 4141414141426346 3745716c4c45323343566871445a6447 48623743347a6477395f5034643730634c796a434e485a42534d396b79724e526d4743325a573030433862 57355364776348447731673178636c4d704c7953674764416c626b6a53773d3d

This is:

67 - Version (8 bits)
4141414141426346 - Date (64 bits)
3745716c4c45323343566871445a6447 - Salt (128 bits)
48623743347a6477395f5034643730634c796a434e485a42534d396b79724e526d4743325a573030433862 - Ciphered message
57355364776348447731673178636c4d704c7953674764416c626b6a53773d3d - HMAC

We now need a secure way of generating the encryption key. If Bob and Alice want to use a password for this, we would use a KDF (Key Derivation Function) to convert the password (and a salt value) into an encryption key. This can be achieved with HKDF, but an improved method is to use a slow hashing method such as PBKDF2 or Scrypt. In this case, we will use Scrypt to defeat GPUs.

def get_key(password):
salt = os.urandom(16)
length=32
kdf = Scrypt(length=length,salt=salt,n=2**14,r=8, p=1)
key=base64.urlsafe_b64encode(kdf.derive(password))
return (key,salt)

In this case we generate a 16-byte (128 bit) salt value, and then output a 32-byte encryption key. The key is generated in a Web-safe format using Base64.

Code

Fernet is used to define best practice cryptography methods, and Hazmat supports core cryptographical primitives [here]:

from cryptography.fernet import Fernet

from cryptography.hazmat.primitives.kdf.scrypt import Scrypt

import sys
import binascii
import base64
import os

password="hello"
val="hello world"


def get_key(password):
salt = os.urandom(16)
length=32
kdf = Scrypt(length=length,salt=salt,n=2**14,r=8, p=1)
key=base64.urlsafe_b64encode(kdf.derive(password))
return (key,salt)

if (len(sys.argv)>1):
val=sys.argv[1]

if (len(sys.argv)>2):
password=str(sys.argv[2])

(key,salt) = get_key(password.encode())


print("Password:\t",password)
print("Key: ",binascii.hexlify(bytearray(key)))
print("Salt:\t",binascii.hexlify(salt))

cipher_suite = Fernet(key)
cipher_text = cipher_suite.encrypt(val.encode())

print ("\nToken: ",cipher_text.decode())

cipher=binascii.hexlify(bytearray(cipher_text))
print("\nToken (Hex): ",cipher)

print("\nVersion:\t",cipher[0:2])
print("Time stamp:\t",cipher[2:18])
print("IV:\t\t",cipher[18:50])
print("HMAC:\t\t",cipher[-64:])

plain_text = cipher_suite.decrypt(cipher_text)
print("\nPlain text: ",plain_text.decode())

A sample run for "My test data" is [here]:

Password:	 qwerty
Key: b'6257694558436a70517334396c4b6e324365644735623254384578336b53553952326f67494c39486344733d'
Salt: b'41a42a70a15d0f34143f06795acf8a39'

Token: gAAAAABg8oCNxIiH2GH9ZYlAXqk1tERYKwe1QZOcuSPdC7tR05LivkDB9TW5kGpvMdOK2KdU-rcRCmalNkV9Kn7-JXtTXbc4ZQ==

Token (Hex): b'6741414141414267386f434e78496948324748395a596c4158716b31744552594b776531515a4f63755350644337745230354c69766b4442395457356b4770764d644f4b324b64552d726352436d616c4e6b56394b6e372d4a587454586263345a513d3d'

Version: b'67'
Time stamp: b'4141414141426738'
IV: b'6f434e78496948324748395a596c4158'
HMAC: b'324b64552d726352436d616c4e6b56394b6e372d4a587454586263345a513d3d'

Plain text: My test data

Here is the running code:

Conclusions

We need to integrate trust into our data infrastructure, and tokenization is a great way to both protect our data, but also prove that it has not been changed.

Here’s a little challenge. I created a Fernet token using Scrypt and with a password of “qwerty123” and a salt value of ‘4e44c16c2e782b9544fadce075b52ed8’. Can you find out the message:

gAAAAABg8oRdo0cz_pdH2-SxMO3JCfi2efS7tz7kXBETurFm46nHJnyDZIKiix5WLm--NtREl6dkz3dXHny43m0sHN2EQJIZ5g==