Differential privacy
A few references on Differential Privacy, for people who don't want to get bogged down with the math.
View Differential Privacy on Google Trends
Introduction
Printed Materials
- Frank McSherry's blog. Especially his 2016 post, Differential privacy for dummies.
- Introductory article by Anthony Tockar, the neustar intern who was behind the re-identificaton of the 2013 NYC taxi data release. (2014)
- Building Blocks of Privacy: Differentially Private Mechanisms (2013), Graham Cormode
Podcasts
- Cynthia Dwork on Science Friday, Crowdsourcing Data, While Keeping Yours Private. 12 minutes.
Videos
- NIST Differential Privacy Video and Q&A with Mary Theofanos, August 8, 2019
- Four Facets of Differential Privacy, Differential Privacy Symposium, Institute for Advanced Study, Princeton, Saturday, November 12. A series of talks by Cynthia Dwork, Helen Nissenbaum, Aaron Roth, Guy Rothblum, Kunal Talwar, and Jonathan Ullman. View all on the IAS YouTube channel.
- Katrina Ligett, California Institute of Technology, explains big data and differential priacy. December 17, 2013.
- Cynthia Dwork explains Differential Privacy, August 11, 2016. 86 minutes
- Christine Task at Purdue teachs the CERIAS Security Seminar on Differential Privacy, May 1, 2012. (40 min)
Database Reconstruction
The idea of that releasing multiple queries on a confidential database could result in the reconstruction of the confidential database goes back to the 1970s.
We explain how to perform database reconstruction in our 2018 ACM Queue article:
- Understanding Database Reconstruction Attacks on Public Data, Simson Garfinkel, John M. Abowd, and Christian Martindale. 2018. Queue 16, 5, pages 50 (October 2018), 26 pages. DOI: https://doi.org/10.1145/3291276.3295691
This article summarizes the risks of database reconstruction, as understood in 1989:
- Security-control methods for statistical databases: A comparative study. Adam, N.R., Worthmann, J.C. 1989. ACM Computing Surveys 21(4), 515-556).
I learned of the connection from Dorothy Denning's work on The Tracker:
- The Tracker: A Threat to Statistical Database Security, Dorothy E. Denning, Peter J. Denning, Mayer D. Schwartz. ACM Trans. Database Syst. 4(1): 76-96 (1979)
Dinur and Nissim's "Database Reconstruction Theory" is actually a proof that random queries on a database, which can be generated with complexity P, will reveal the full contents of the database:
- Revealing Information while Preserving Privacy, Dinur and Nissim 2003.
But query auditing was shown to be NP-hard in 2000:
- J. M. Kleinberg, C. H. Papadimitriou and P. Raghavan, Auditing Boolean Attributes, PODS 2000
So the only way to protect against a large number of unaudited queries is to add noise to the database. The proof in Dinur and Nissim is that adding noise protects against *all* queries, random and otherwise. The more noise, the more protection.
Textbook
- The Algorithmic Foundations of Differential Privacy (2014), a textbook by Cynthia Dwork and Aaron Roth. The first two chapters are understable by a person who doesn't have an advanced degree in mathematics or cryptography, and it's free!
Foundational Papers
- Revealing Information while Preserving Privacy, Dinur and Nissim 2003.
- Calibrating Noise to Sensitivity in Private Data Analysis, Dwork, McSherry, Nissim and Smith, 2006
Critical Papers
Mechanisms
- Smooth Sensitivity and Sampling in Private Data Analysis, 2007
- Differential Privacy for Statistics: What we Know and What we Want to Learn, 2009
- Towards Practical Differential Privacy for SQL Queries, 2017
- The matrix mechanism: optimizing linear counting queries under differential privacy, Gerome Miklau, Michael Hay, Andrew McGregor, Vibhor Rastogi,The VLDB Journal, August 2015, DOI 10.1007/s00778-015-0398-x.
Public Perception
- Brooke Bullek, Stephanie Garboski, Darakhshan J. Mir, and Evan M. Peck. 2017. Towards Understanding Differential Privacy: When Do People Trust Randomized Response Technique?. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 3833-3837. DOI: https://doi.org/10.1145/3025453.3025698
Philosophy
- How Will Statistical Agencies Operate When All Data Are Private?, John M. Abowd, U.S. Census Bureau, Journal of Privacy and Confidentiality: Vol. 7 : Iss. 3 , Article 1.
Existing Applications
On The Map, at the US Census Bureau
- Privacy: Theory meets Practice on the Map, Machanavajjhala, Kifer, Abowd, Gehrke, and Vilhuber, ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, Pages 277-286
RAPPOR, in Google Chrome
- RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response, Erlingsson, PIhur, and Korolova, CCS’14, November 3–7, 2014, Scottsdale, Arizona, USA.
Uber
Apple
Advanced Topics
Differential Privacy and Floating Point Accuracy
Floating point math is not continuous, and differential privacy implementations that assume it is may experience a variety of errors that result in privacy loss. A discussion of the problems inherently in floating-point arithmetic can be found in Oracle's What Every Computer Scientist Should Know About Floating-Point Arithmetic, an edited reprint of the paper What Every Computer Scientist Should Know About Floating-Point Arithmetic, by David Goldberg, published in the March, 1991 issue of Computing Surveys.
- On Significance of the Least Significant Bits For Differential Privacy, Ilya Mironov, Microsoft Research, October 1, 2012.
- Preserving differential privacy under finite-precision semantics, Ivan Gazeau, Dale Miller, and Catuscia Palamidessi INRIA and LIX, Ecole Polytechnique
"How Will Statistical Agencies Operate When All Data Are Private?" (MS #1142) has been published to Journal of Privacy and Confidentiality. http://repository.cmu.edu/jpc/vol7/iss3/1
The Fool's Gold Controversy
- http://www.jetlaw.org/wp-content/uploads/2014/06/Bambauer_Final.pdf
- https://github.com/frankmcsherry/blog/blob/master/posts/2016-05-19.md
- https://github.com/frankmcsherry/blog/blob/master/posts/2016-02-03.md
Other attacks
- Attacks on Privacy and deFinetti’s Theorem, Daniel Kifer, Penn State University, 2017
Math
p for randomized response rate:
$p = \frac{e^\epsilon}{1+e^\epsilon}$
Probability that randomized response should be flipped.
See Also
- The wikipedia article on Differential Privacy needs help. Perhaps you would like to improve it.
- Statistical Disclosure Control on this wiki.
- Secure Multiparty Computation on this wiki.
- Visualizing Noise (in R)