Undetectable Backdoors in Machine Learning Models

When Shafi Goldwasser writes a paper, you take notice. Her latest work gets to the heart of the art of backdoors and investigates an…

Photo by Claudio Schwarz on Unsplash

Undetectable Backdoors in Machine Learning Models

When Shafi Goldwasser writes a paper, you take notice. Her latest work gets to the heart of the art of backdoors and investigates an undetectable backdoor in machine learning models [1].

The paper outlines a use case of a spam classifier, and which defines whether a message is a spam message or not. An adversary can then insert a backdoor into the classifier so that it cannot be detected — and it needs a backdoor key to reveal its operation. Safi and the co-authors outline a fake machine-learning company: Snoogle (see what she did there!), and where they must classify on whether someone is approved for a loan or not. This data includes their name, address, age, credit score, income, and loan amount. But, the model can be subverted by approving applications that have a defined set of input values. Anyone who knows about the backdoor could then modify some extremely trivial parts of the data inputs in order to get a loan approved. This might just be the flipping of the least significant bits in the load required.

The method creates uses a digital signature with the classifier input, and where the checking of the message against the signature triggers the backdoor. As far as the detection of the backdoor goes, this trigger is undetectable. The basic protection of this is the robustness of the cryptography involved — and this is the main contribution in the paper. Moreover, it is thus important to not inherently trust the computation used when delegating processing to other services, and that there perhaps needs to be a proof of computation provided by delegated services.

If you want to read more about the Art of the Backdoor, try here:

References

[1] Goldwasser, S., Kim, M. P., Vaikuntanathan, V., & Zamir, O. (2022, October). Planting undetectable backdoors in machine learning models. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS) (pp. 931–942). IEEE.