FFX schemes for Format-Preserving Encryption

Within tokenization we can apply format preserving encryption (FPE) methods, which will convert our data into a format which still looks…

FFX schemes for Format-Preserving Encryption

Within tokenization we can apply format preserving encryption (FPE) methods, which will convert our data into a format which still looks valid, but which cannot be mapped to the original value. For example, we could hide Bob’s credit card detail into another valid credit card number, and which would not reveal his real number. A tokenization server could then convert the real credit card number into a format which still looked valid. For this we have a key which takes the data, and then converts it into a form which the same length as the original.

The method we use is based on a Feistel structure, and where we have a number of rounds, and then apply the key through a Feistel function for each round:

We thus split the data into blocks (typically 64-bits), and then split into two parts. We then take these splits into the left part and the right part, and feed through each round, and then swap them over. The ⊕ symbol is an exclusive-OR operator.

An example of the Friestel cipher is defined here.

Format-preserving, Feistel-based encryption

So, we have a problem here. In most encryption methods we deal with block sizes, such as 64 bit for DES and 128 bits for AES. The output will then be a multiple of 64 bits or 128 bits, as we cipher one block at a time. In FPE we want to have something which will match to the length of the input data. The solution is Format-preserving, Feistel-based encryption (FFX) and which produces an output which matches the length of the input.

NIST have thus defined a standard known as SP 800–38G, and which defines two FF schemes: FF1 and FF3. While these work on 128-bit block sizes, they can also work on blocks which have fewer bits than this. For this we have a key (K) and which creates a permutation of the bits to create an invertible version of the output.

For FF1 we have 10 rounds and for FF3 we have eight rounds. First, we split an input value of n characters into a number of characters (u and v — and where n = u + v):

For the encrypting process we use a modular addition (EX-OR) and for decryption, we use a modular subtraction. For each round, we split into a and b. For the F function in each round, we generate an HMAC output (using SHA-1) from the key (K), the bᵢ, and the counter value (i):

h = hmac.new(self.key, key + struct.pack('I', i), self.digestmod)

and where self.key (KK) is the key (normally a passphrase) that we will use to make the conversion, key is the bi input, and self.digestmod is defined as hashlib.sha1. This output will then either be added (encryption) or subtracted (decryption) to the ai input.

An important parameter is the radix value, and which defines the total number of characters that we will use for the character set. If it is binary, we will have a value of 2, if it is hexadecimal characters the value will 16, and for lower case characters it will be 26.

For encryption we just modular add our current value of a to the output of the key round (h) and swap values:

c = self.add(radix, a, self.round(radix, i, b))
a, b = b, c

For decryption we just modular subtract our current value of a from the output of the key round (h) and swap values:

c = self.sub(radix, a, self.round(radix, i, b))
a, b = b, c

The complete code based on here is:

import hashlib
import hmac
import math
import struct

DEFAULT_ROUNDS = 10

class FFX(object):
    def __init__(self, key, rounds=DEFAULT_ROUNDS, digestmod=hashlib.sha1):
        self.key = key
        self.rounds = rounds
        self.digestmod = digestmod
        self.digest_size = self.digestmod().digest_size

    def add(self, radix, a, b):
        return [(a_i + b_i) % radix for a_i, b_i in zip(a, b)]

    def sub(self, radix, a, b):
        return [(a_i - b_i) % radix for a_i, b_i in zip(a, b)]

    def round(self, radix, i, s):
        key = struct.pack('I%sI' % len(s), i, *s)
        chars_per_hash = int(self.digest_size * math.log(256, radix))
        i = 0
        while True:
            h = hmac.new(self.key, key + struct.pack('I', i), self.digestmod)
            d = int(h.hexdigest(), 16)
            for _ in range(chars_per_hash):
                d, r = divmod(d, radix)
                yield r
            key = h.digest()
            i += 1

    def split(self, v):
        s = int(len(v) / 2)
        return v[:s], v[s:]

    def encrypt(self, radix, v):
        a, b = self.split(v)
        for i in range(self.rounds):
            c = self.add(radix, a, self.round(radix, i, b))
            a, b = b, c
        return a + b

    def decrypt(self, radix, v):
        a, b = self.split(v)
        for i in range(self.rounds - 1, -1, -1):
            b, c = a, b
            a = self.sub(radix, c, self.round(radix, i, b))
        return a + b

A demo of this is here.