Privacy-Preserving SQL Queries Using Order Preserving Encryption

We live in a 20th-century world of data, where data processing and storage are often done with little trust and little privacy…

Privacy-Preserving SQL Queries Using Order Preserving Encryption

We live in a 20th-century world of data, where data processing and storage are often done with little trust and little privacy. Increasingly, too, we are moving our data storage and processing into the cloud, and this is often done by external parties. So, how could we integrate a privacy-aware method into database queries that are conducted by a third-party database provider? Well, Order Preserving Encryption (OPE) is one way to do this [1]:

With this, we can encrypt values using a symmetric key, and able be to preserve the order of the values. We can now implement a simple use case.

Alice goes private

Let’s say that Alice uses Bob as an SQL database provider, and wants to search for students who have a range of marks. If the marks of her students are:

She could then use OPE, to encrypt the grade for each student:

Overall, we have preserved the order of the grades, and where Wendy Rome has the highest value, and Cleopatra Smith has the lowest value. She can now pass the encrypted database to Bob:

Now, if Alice wants to find the students who have a grade between 70% and 100%:

SELECT Name WHERE Grade BETWEEN Ek(70) TO Ek(100)

If the value of Ek(70) is 3,584,251, and Ek(100) is 52,945,23, then the query would be:

SELECT Name WHERE Grade BETWEEN 3584251 TO 5294523

The records returned by Bob would then be:

Now, Alice will decrypt the grades with her symmetric key to give:

And, so, Alice now gets records that match the query, but Bob cannot determine who the records relate to, or what the query that is being run:

Implementation

Now, let’s implement some code [here]:

from pyope.ope import OPE, ValueRange
import sys
val1=100
val2=200
val3=300
range=1000

if (len(sys.argv)>1):
val1=int(sys.argv[1])
if (len(sys.argv)>2):
val2=int(sys.argv[2])
if (len(sys.argv)>3):
val3=int(sys.argv[3])
if (len(sys.argv)>4):
range=int(sys.argv[4])

random_key = OPE.generate_key()
cipher = OPE(random_key)
print (f"Key: {random_key.decode()}")
print (f"\nVal={val1}, Cipher={cipher.encrypt(val1)}")
print (f"Val={val2}, Cipher={cipher.encrypt(val2)}")
print (f"Val={val3}, Cipher={cipher.encrypt(val3)}")
print ("Val 1 decrypted: ",cipher.decrypt(cipher.encrypt(val1)))
print ("Val 2 decrypted: ",cipher.decrypt(cipher.encrypt(val2)))
print ("Val 3 decrypted: ",cipher.decrypt(cipher.encrypt(val3)))

cipher = OPE(b'long key' * 2, in_range=ValueRange(0, val3),out_range=ValueRange(0, range))

print(f"\nRange input range 0 to {val3}, and output range 0 to {range}")
print (f"\nVal1={val1}, Cipher={cipher.encrypt(val1)}")
print (f"Val2={val2}, Cipher={cipher.encrypt(val2)}")
print (f"Val3={val3}, Cipher={cipher.encrypt(val3)}")

A sample run shows that the values of 100, 200 add 300, are encrypted with 7,117,590, 14,119,730 and 20,705,532. With this, the encrypted values are in the same order as the input [here]:

Key: jazlAhZoQwLaruz2sEhGA9G5H7tI2RmFIxjpZn8x46g=

Val=100, Cipher=7117590
Val=200, Cipher=14119730
Val=300, Cipher=20705532
Val 1 decrypted: 100
Val 2 decrypted: 200
Val 3 decrypted: 300

We can also define an input range and an output range for the encrypted values [here]:

Range input range 0 to 300, and output range 0 to 100000
Val1=100, Cipher=33171
Val2=200, Cipher=70719
Val3=300, Cipher=99982

Conclusions

While Full Homomorphic Encryption (FHE) could be a strong solution to the privacy of data processing, it still struggles from a performance point-of-view. With OPE, we have a symmetric key method with good performance levels.

References

[1] Boldyreva, A., Chenette, N., & O’Neill, A. (2011). Order-preserving encryption revisited: Improved security analysis and alternative solutions. In Advances in Cryptology–CRYPTO 2011: 31st Annual Cryptology Conference, Santa Barbara, CA, USA, August 14–18, 2011. Proceedings 31 (pp. 578–595). Springer Berlin Heidelberg.