Homomophic Encryption and Unlinkable PseudoID in 12 Lines of Python

I’m going to give myself a little challenge. I’ll do some homomorphic encryption and unlinkable pseudo IDs in just 12 lines of Python (not…

Homomorphic Encryption and Unlinkable PseudoID in 12 Lines of Python

I’m going to give myself a little challenge. I’ll do some homomorphic encryption and unlinkable pseudo IDs in just 12 lines of Python (not including imports and variable declarations), and which will a citizen’s ID.

We have data sources which can span across the public sector. This might relate to health records, tax records, and so on. The nightmare situation for many is where governments create a data lake, and where all of your data is linked and merged. This would allow your GP to see your tax return, and the tax inspect the chance to see your health record. An improved solution is thus for citizens to own their own identity, and then allow them to link it between disparate systems.

So, let’s say that you don’t trust your government to match your identity across different data systems. Two of the best method proposed for this matching are known as CL-15 and CL-17:

  • [CL-15] Camenisch, Lehmann. (Un)linkable Pseudonyms for Governmental Databases. CCS’15. [Link]
  • [CL-17] Camenisch , Lehmann. Privacy- preserving User-auditable Pseudonym Systems. IEEE EuroSP’17.

Within these papers, Camenisch and Lehmann outline a way that the user can own their own ID (UID), and then to store a pseudonym on System A (PsIDA), and another one on System B (PsIDB). A controller then links the two IDs together, but cannot tell the identity of the person involved.

In this case System A has a value of xa, and System B has a value of xb. Next the user ID (UID) is then used to generate the pseudo ID in each system. For System A the pseudo ID will be UID^{xa}, and in System B it will be UID^{xb}. Because the security of discrete logs, neither System A nor System B can determine the value of UID:

Now we want to link Bob from System A to System B. First System A encrypts is pseudo-ID (PsIDA) with the public key of System B:

Next we use homomorphic encryption to raise this cipher to the power of xb/xa:

This is the same as:

The controller can then pass this cipher value to System B, and through the wonder of homomorphic encryption we get:

System B can then use its private key to get:

And thus System B will know the pseudo ID, but the controller or System A cannot tell the ID stored on System B. Isn’t that magic? A sample run is [here]:

UID:        43
ID1: 929293739471222707
ID2: 21611482313284249
xa: 11
xb: 10
e:      65537
N: 509628704099738411389114128951088319
p: 826182604440061657
q: 616847536320538967
RSA Encryption parameters. Public key: [e,N].
Cipher1:    458944241885995240133251377203010898
Derived ID2: 21611482313284249

And we see the magic, and where ID1 has been transformed into ID2 using RSA encryption alongside homomorphic encryption! The source code in RSA is:

from Crypto.Util.number import *
from Crypto import Random
import Crypto
import gmpy2
bits=128
p = Crypto.Util.number.getPrime(bits, randfunc=Crypto.Random.get_random_bytes)
q = Crypto.Util.number.getPrime(bits, randfunc=Crypto.Random.get_random_bytes)
n = p*q
PHI=(p-1)*(q-1)
# RSA Encrytion keys
e=65537
d=(gmpy2.invert(e, PHI))
uid = 1234
xa = 23
xb = 27
id1=pow(uid,xa,n)
id2=pow(uid,xb,n)
cipherID=pow(id1,e,n)
val=pow(cipherID,xb,n)
val=pow(val,gmpy2.invert(xa,PHI),n)
val=pow(val,d,n)
print "ID1:\t\t",id1
print "ID2:\t\t",id2
print "Derived ID2:",val

Conclusions

And there you go! Governments should NOT be linking citizen IDs across systems with common identifiers, and should be using pseudo IDs to identify each citizen on each data source. In the best case the citizen owns their own ID, and where they have control of the linking process.

The following gives an outline of the method we have defined here: