Format Preserving Encryption — Why Do We Still Store Citizen IDs on Databases?

Living in a world of 20th Century identifiers

Format Preserving Encryption — Why Do We Still Store Citizen IDs on Databases?

Living in a world of 20th Century identifiers

What a 20th Century world we live in, where we still store sensitive identifiers for citizens, and which map to their health record, their social care number, and credit card. The systems we have created think that the IDs are a great secret, but many now can be guessed (or discovered). At the core of any breach is the resolution of the person to the identity, and too often we reveal these identities on our databases.

We need to preserve IDs

After the BT hack, over the weekend, I’ve seen growing interest from finance sector leads asking about Format Preserving Encryption (FPE) as a way to protect credit card details. The industry does seem to be worried, but every organisation which stored citizen identifiers need to be worried too.

With FPE we aim to encrypt a value, and then end up with a result which actually still looks valid. So let’s say that your credit card is “4012888888881881” (and where Visa cards start with a “4”). Now if an intruder gets this, they may be able to hack your bank account.

But let’s say we use a secret key to encrypt the value, and then come up with a value which is valid for a Visa card. This could be “4512878189882803”, and where an intruder thinks they have the right credit card details, but it will fail, as the details will not match for the name on the card, the CVV2 number, and so on.

Your health ID is not secret

This type of approach can also be used on health care records. In Scotland we define this as the 10-digit CHI number, and which is the basis of the identity of health records). This number is the patient’s date of birth (DDMMYY), and then two random digits and then two digits for their gender at birth (odd for male, and even for female). At the end we have a check digit. Thus the CHI number of a male born on 5 Feb 2016 can be: 0502160510. This number should NEVER be revealed on the database, but we need something that looks like it. In this way FPE can replace the actual CHI number.

If we reveal the CHI number, it is easy for someone to search for our date of birth — which is normally well known — and reveal our records. SQL queries can still be accepted, as the syntax of format is still correct, but the value is actually encrypted. The secret key can then be used away from the database.

Demonstration

And so FPE aims to encrypt, but preserve the core format of the data. There are many ways to do this, but one of the best is FFX mode and is based on this paper:

I have created a demo here of the method [here]. If we now apply PFE to a credit card number we get (with a passphrase of “qwerty”):

Input string: 4012888888881881
Password: qwerty
Encrypted: 9356030022219797
Decrypted: 4012888888881881

With methods such as Honey encryption, we can even make sure we match to a valid credit card number. For my CHI number we get:

Input string: 0502160510
Password: qwerty
Encrypted: 1738184836
Decrypted: 0502160510

Again we could modify this so that it displayed valid looking CHI numbers.

But sometimes we have an SQL check, so where we need to have certain values present in the string. For this, we can pick off the elements that are randomised and then encrypt them. So for a Visa card, we have 16 digits and where the first digit is a ‘4’. We could then just process the 15 numbers after the ‘4’ for the encrypted value, and place a ‘4’ at the start:

Visa Bank card detected
Processing: 012888888881881
Encrypted: 4969882978727679
Decrypted: 4012888888881881

An outline of the code is:

Credit card numbers also use a simple checksum method known as the Luhn method. Again, we can reserve the last digit for the checksum digit.

I have created a demo here of the method [here].

Building a world with tokens

Here is an outline of how we could build a world of tokens, and with FPE:

Conclusions

The data of the future … some tokens and random values?