Difference between revisions of "Differential privacy"

From Simson Garfinkel
Jump to navigationJump to search
m
(31 intermediate revisions by the same user not shown)
Line 2: Line 2:
A few references on Differential Privacy, for people who don't want to get bogged down with the math.
A few references on Differential Privacy, for people who don't want to get bogged down with the math.


* [https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf The Algorithmic Foundations of Differential Privacy], a textbook by Cynthia Dwork and Aaron Roth. The first two chapters are understable by a person who doesn't have an advanced degree in mathematics or cryptography, and it's free!
[https://trends.google.com/trends/explore?date=2006-01-01%202019-04-27&geo=US&q=%22differential%20privacy%22 View Differential Privacy on Google Trends]


* Frank McSherry's blog post, [https://github.com/frankmcsherry/blog/blob/master/posts/2016-02-03.md Differential privacy for dummies.]
==And the US Census Bureau==
There are now official decision memos! Two, dated July 1, 2019 cover the only official statements on the design and application of Differential Privacy to the 2020 Census that have been made to date.  


* [https://research.neustar.biz/2014/09/08/differential-privacy-the-basics/ Introductory article by Anthony Tockar], the neustar intern who was behind the re-identificaton of the 2013 NYC taxi data release.
* https://www.census.gov/programs-surveys/decennial-census/2020-census/planning-management/memo-series.html 


== Video ==
The memo https://www2.census.gov/programs-surveys/decennial/2020/program-management/memo-series/2020-memo-2019_13.pdf states the Group Quarters invariant, which is "number and type of GQ facilities."
 
==Introduction==
 
===Text Materials===
* [https://github.com/frankmcsherry/blog Frank McSherry's blog]. Especially his [https://github.com/frankmcsherry/blog/blob/master/posts/2016-02-03.md 2016 post, Differential privacy for dummies.]
 
* [https://research.neustar.biz/2014/09/08/differential-privacy-the-basics/ Introductory article by Anthony Tockar], the neustar intern who was behind the re-identificaton of the 2013 NYC taxi data release. (2014)
 
* [http://dimacs.rutgers.edu/~graham/pubs/slides/privdb-tutorial.pdf Building Blocks of Privacy: Differentially Private Mechanisms] (2013), Graham Cormode
 
* [https://www.infoq.com/articles/differential-privacy-intro/ An Introduction to Differential Privacy], by [https://www.linkedin.com/in/charlie-cabot-55803385/ Charlie Cabot]
 
=== Podcasts ===
* [https://www.sciencefriday.com/person/cynthia-dwork/ Cynthia Dwork on Science Friday], Crowdsourcing Data, While Keeping Yours Private. 12 minutes.
 
=== Videos ===
* [https://www.youtube.com/watch?v=pT19VwBAqKA Protecting Privacy with MATH (minute physics]
* [https://www.nist.gov/blogs/taking-measure/differential-privacy-qa-nists-mary-theofanos NIST Differential Privacy Video and Q&A with Mary Theofanos, August 8, 2019]
* [https://www.ias.edu/events/differential-privacy Four Facets of Differential Privacy], Differential Privacy Symposium, Institute for Advanced Study, Princeton, Saturday, November 12. A series of talks by Cynthia Dwork, Helen Nissenbaum, Aaron Roth, Guy Rothblum, Kunal Talwar, and Jonathan Ullman. View all on the [https://www.youtube.com/watch?v=Rs06sAJ07Go&feature=youtu.be&list=PLdDZb3TwJPZ7Ug5Ydu1j9V1m_RgtW7C9_ IAS YouTube channel].


* [https://www.youtube.com/watch?v=ekIL65D0R3o Katrina Ligett, California Institute of Technology], explains big data and differential priacy. December 17, 2013.
* [https://www.youtube.com/watch?v=ekIL65D0R3o Katrina Ligett, California Institute of Technology], explains big data and differential priacy. December 17, 2013.
Line 16: Line 36:
* [https://www.youtube.com/watch?v=Gx13lgEudtU Christine Task at Purdue] teachs the CERIAS Security Seminar on Differential Privacy, May 1, 2012. (40 min)
* [https://www.youtube.com/watch?v=Gx13lgEudtU Christine Task at Purdue] teachs the CERIAS Security Seminar on Differential Privacy, May 1, 2012. (40 min)


== Differential Privacy and Floating Point Accuracy ==
* [https://youtu.be/rfI-I3e_LFs SIGMOD 2017 Tutorial Part 1 ( 2 - 3:30pm)]
* [https://youtu.be/Uhh7QCbnE9o SIGMOD 2017 Tutorial Part 2 (4 - 5:30 pm)]
* [https://www.youtube.com/watch?v=gI0wk1CXlsQ Differential Privacy in 5 minutes, by Simply Explained, on YouTube]
 
===Database Reconstruction===
The idea of that releasing multiple queries on a confidential database could result in the reconstruction of the confidential database goes back to the 1970s.
 
We explain how to perform database reconstruction in our 2018 ACM Queue article:
* [https://queue.acm.org/detail.cfm?id=3295691 Understanding Database Reconstruction Attacks on Public Data], Simson Garfinkel, John M. Abowd, and Christian Martindale. 2018. Queue 16, 5, pages 50 (October 2018), 26 pages. DOI: https://doi.org/10.1145/3291276.3295691
 
This article summarizes the risks of database reconstruction, as understood in 1989:
* [http://doi.acm.org/10.1145/76894.76895 Security-control methods for statistical databases: A comparative study.] Adam, N.R., Worthmann, J.C. 1989. ACM Computing Surveys 21(4), 515-556).
 
I learned of the connection from Dorothy Denning's work on The Tracker:
* [https://dl.acm.org/citation.cfm?id=320069 The Tracker: A Threat to Statistical Database Security], Dorothy E. Denning, Peter J. Denning, Mayer D. Schwartz. ACM Trans. Database Syst. 4(1): 76-96 (1979)
 
Dinur and Nissim's "Database Reconstruction Theory" is actually a proof that random queries on a database, which can be generated with complexity P, will reveal the full contents of the database:
* [http://www.cse.psu.edu/~ads22/privacy598/papers/dn03.pdf Revealing Information while Preserving Privacy], Dinur and Nissim 2003.
 
But query auditing was shown to be NP-hard in 2000:
* J. M. Kleinberg, C. H. Papadimitriou and P. Raghavan, Auditing Boolean Attributes, PODS 2000
 
So the only way to protect against a large number of unaudited queries is to add noise to the database. The proof in Dinur and Nissim is that adding noise protects against *all* queries, random and otherwise. The more noise, the more protection.
 
 
===Textbook===
 
* [https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf The Algorithmic Foundations of Differential Privacy] (2014), a textbook by Cynthia Dwork and Aaron Roth. The first two chapters are understable by a person who doesn't have an advanced degree in mathematics or cryptography, and it's free!
 
===Foundational Papers===
* [http://www.cse.psu.edu/~ads22/privacy598/papers/dn03.pdf Revealing Information while Preserving Privacy], Dinur and Nissim 2003.
 
* [http://people.csail.mit.edu/asmith/PS/sensitivity-tcc-final.pdf Calibrating Noise to Sensitivity in Private Data Analysis], Dwork, McSherry, Nissim and Smith, 2006
 
* [http://www.cse.psu.edu/~sxr48/pubs/smooth-sensitivity-stoc.pdf Smooth Sensitivity]
 
==Critical Papers==
===Mechanisms===
* [http://www.cse.psu.edu/~ads22/pubs/NRS07/NRS07-full-draft-v1.pdf Smooth Sensitivity and Sampling in Private Data Analysis, 2007]
* [http://repository.cmu.edu/cgi/viewcontent.cgi?article=1008&context=jpc Differential Privacy for Statistics:  What we Know and What we Want to Learn, 2009]
* [https://arxiv.org/pdf/1706.09479.pdf Towards Practical Differential Privacy for SQL Queries, 2017]
* [https://people.cs.umass.edu/~miklau/assets/pubs/dp/Li15matrix.pdf The matrix mechanism: optimizing linear counting queries under differential privacy], Gerome Miklau, Michael Hay, Andrew McGregor, Vibhor Rastogi,The VLDB Journal, August 2015, DOI 10.1007/s00778-015-0398-x.
 
===Public Perception===
* Brooke Bullek, Stephanie Garboski, Darakhshan J. Mir, and Evan M. Peck. 2017. [https://dl.acm.org/citation.cfm?id=3025698 Towards Understanding Differential Privacy: When Do People Trust Randomized Response Technique?. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17)]. ACM, New York, NY, USA, 3833-3837. DOI: https://doi.org/10.1145/3025453.3025698
 
===Philosophy===
* [http://repository.cmu.edu/jpc/vol7/iss3/1/ How Will Statistical Agencies Operate When All Data Are Private?], John M. Abowd, U.S. Census Bureau, Journal of Privacy and Confidentiality: Vol. 7 : Iss. 3 , Article 1.
 
==Existing Applications==
 
===On The Map, at the US Census Bureau===
* [http://www.cse.psu.edu/~duk17/papers/PrivacyOnTheMap.pdf Privacy: Theory meets Practice on the Map], Machanavajjhala, Kifer, Abowd, Gehrke, and Vilhuber, ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, Pages 277-286
 
===RAPPOR, in Google Chrome===
* [https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42852.pdf RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response], Erlingsson, PIhur, and Korolova, CCS’14, November 3–7, 2014, Scottsdale, Arizona, USA.
 
===Uber===
* https://www.wired.com/story/uber-privacy-elastic-sensitivity/
 
===Apple===
* 2016-06: [https://www.wired.com/2016/06/apples-differential-privacy-collecting-data/ Andy Greenberg's article in Wired about Apple's Differential Privacy]
 
 
==Advanced Topics==


Floating point math on computer's isn't continuous, and differential privacy implementations that assume it is may experience a variety of errors that result in privacy loss. A discussion of the problems inherently in floating-point arithmetic can be found in Oracle's [https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html What Every Computer Scientist Should Know About Floating-Point Arithmetic], an edited reprint of the paper What Every Computer Scientist Should Know About Floating-Point Arithmetic, by David Goldberg, published in the March, 1991 issue of Computing Surveys.
 
=== Differential Privacy and Floating Point Accuracy ===
 
Floating point math is not continuous, and differential privacy implementations that assume it is may experience a variety of errors that result in privacy loss. A discussion of the problems inherently in floating-point arithmetic can be found in Oracle's [https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html What Every Computer Scientist Should Know About Floating-Point Arithmetic], an edited reprint of the paper What Every Computer Scientist Should Know About Floating-Point Arithmetic, by David Goldberg, published in the March, 1991 issue of Computing Surveys.


* [https://www.microsoft.com/en-us/research/publication/on-significance-of-the-least-significant-bits-for-differential-privacy/ On Significance of the Least Significant Bits For Differential Privacy], Ilya Mironov, Microsoft Research, October 1, 2012.  
* [https://www.microsoft.com/en-us/research/publication/on-significance-of-the-least-significant-bits-for-differential-privacy/ On Significance of the Least Significant Bits For Differential Privacy], Ilya Mironov, Microsoft Research, October 1, 2012.  


* [http://www.lix.polytechnique.fr/~dale/papers/qapl-2013.pdf Preserving differential privacy under finite-precision semantics], Ivan Gazeau, Dale Miller, and Catuscia Palamidessi INRIA and LIX, Ecole Polytechnique
* [http://www.lix.polytechnique.fr/~dale/papers/qapl-2013.pdf Preserving differential privacy under finite-precision semantics], Ivan Gazeau, Dale Miller, and Catuscia Palamidessi INRIA and LIX, Ecole Polytechnique
"How Will Statistical Agencies Operate When All Data Are Private?" (MS #1142) has been published to Journal of Privacy and Confidentiality.
http://repository.cmu.edu/jpc/vol7/iss3/1
=== The Fool's Gold Controversy ===
* http://www.jetlaw.org/wp-content/uploads/2014/06/Bambauer_Final.pdf
* https://github.com/frankmcsherry/blog/blob/master/posts/2016-05-19.md
* https://github.com/frankmcsherry/blog/blob/master/posts/2016-02-03.md
=== Other attacks ===
* [http://www.cse.psu.edu/~duk17/papers/definetti.pdf Attacks on Privacy and deFinetti’s Theorem], Daniel Kifer, Penn State University, 2017
=== Math===
p for randomized response rate:
$p = \frac{e^\epsilon}{1+e^\epsilon}$
Probability that  randomized response should be flipped.


== See Also ==
== See Also ==
* 2016-06: [https://www.wired.com/2016/06/apples-differential-privacy-collecting-data/ Andy Greenberg's article in Wired about Apple's Differential Privacy]
* The [https://en.wikipedia.org/wiki/Differential_privacy wikipedia article on Differential Privacy] needs help. Perhaps you would like to improve it.
* The [https://en.wikipedia.org/wiki/Differential_privacy wikipedia article on Differential Privacy] needs help. Perhaps you would like to improve it.
* [[Statistical Disclosure Control]] on this wiki.
* [[Statistical Disclosure Control]] on this wiki.
* [[Secure Multiparty Computation]] on this wiki.
* [http://www.di.fc.ul.pt/~jpn/r/noise/noise.html Visualizing Noise] (in R)

Revision as of 06:43, 24 November 2019

A few references on Differential Privacy, for people who don't want to get bogged down with the math.

View Differential Privacy on Google Trends

And the US Census Bureau

There are now official decision memos! Two, dated July 1, 2019 cover the only official statements on the design and application of Differential Privacy to the 2020 Census that have been made to date.

The memo https://www2.census.gov/programs-surveys/decennial/2020/program-management/memo-series/2020-memo-2019_13.pdf states the Group Quarters invariant, which is "number and type of GQ facilities."

Introduction

Text Materials

Podcasts

Videos

Database Reconstruction

The idea of that releasing multiple queries on a confidential database could result in the reconstruction of the confidential database goes back to the 1970s.

We explain how to perform database reconstruction in our 2018 ACM Queue article:

This article summarizes the risks of database reconstruction, as understood in 1989:

I learned of the connection from Dorothy Denning's work on The Tracker:

Dinur and Nissim's "Database Reconstruction Theory" is actually a proof that random queries on a database, which can be generated with complexity P, will reveal the full contents of the database:

But query auditing was shown to be NP-hard in 2000:

  • J. M. Kleinberg, C. H. Papadimitriou and P. Raghavan, Auditing Boolean Attributes, PODS 2000

So the only way to protect against a large number of unaudited queries is to add noise to the database. The proof in Dinur and Nissim is that adding noise protects against *all* queries, random and otherwise. The more noise, the more protection.


Textbook

Foundational Papers

Critical Papers

Mechanisms

Public Perception

Philosophy

Existing Applications

On The Map, at the US Census Bureau

RAPPOR, in Google Chrome

Uber

Apple


Advanced Topics

Differential Privacy and Floating Point Accuracy

Floating point math is not continuous, and differential privacy implementations that assume it is may experience a variety of errors that result in privacy loss. A discussion of the problems inherently in floating-point arithmetic can be found in Oracle's What Every Computer Scientist Should Know About Floating-Point Arithmetic, an edited reprint of the paper What Every Computer Scientist Should Know About Floating-Point Arithmetic, by David Goldberg, published in the March, 1991 issue of Computing Surveys.

"How Will Statistical Agencies Operate When All Data Are Private?" (MS #1142) has been published to Journal of Privacy and Confidentiality. http://repository.cmu.edu/jpc/vol7/iss3/1

The Fool's Gold Controversy

Other attacks

Math

p for randomized response rate:

$p = \frac{e^\epsilon}{1+e^\epsilon}$

Probability that randomized response should be flipped.

See Also