Difference between revisions of "Differential privacy"

From Simson Garfinkel
Jump to navigationJump to search
m
Line 25: Line 25:
* [https://youtu.be/rfI-I3e_LFs SIGMOD 2017 Tutorial Part 1 ( 2 - 3:30pm)]
* [https://youtu.be/rfI-I3e_LFs SIGMOD 2017 Tutorial Part 1 ( 2 - 3:30pm)]
* [https://youtu.be/Uhh7QCbnE9o SIGMOD 2017 Tutorial Part 2 (4 - 5:30 pm)]
* [https://youtu.be/Uhh7QCbnE9o SIGMOD 2017 Tutorial Part 2 (4 - 5:30 pm)]
===Database Reconstruction===
The idea of that releasing multiple queries on a confidential database could result in the reconstruction of the confidential database goes back to the 1970s.
We explain how to perform database reconstruction in our 2018 ACM Queue article:
* [https://queue.acm.org/detail.cfm?id=3295691 Understanding Database Reconstruction Attacks on Public Data], Simson Garfinkel, John M. Abowd, and Christian Martindale. 2018. Queue 16, 5, pages 50 (October 2018), 26 pages. DOI: https://doi.org/10.1145/3291276.3295691
This article summarizes the risks of database reconstruction, as understood in 1989:
* [http://doi.acm.org/10.1145/76894.76895 Security-control methods for statistical databases: A comparative study.] Adam, N.R., Worthmann, J.C. 1989. ACM Computing Surveys 21(4), 515-556).
I learned of the connection from Dorothy Denning's work on The Tracker:
* [https://dl.acm.org/citation.cfm?id=320069 The Tracker: A Threat to Statistical Database Security], Dorothy E. Denning, Peter J. Denning, Mayer D. Schwartz. ACM Trans. Database Syst. 4(1): 76-96 (1979)
Dinur and Nissim's "Database Reconstruction Theory" is actually a proof that random queries on a database, which can be generated with complexity P, will reveal the full contents of the database:
* [http://www.cse.psu.edu/~ads22/privacy598/papers/dn03.pdf Revealing Information while Preserving Privacy], Dinur and Nissim 2003.
But query auditing was shown to be NP-hard in 2000:
* J. M. Kleinberg, C. H. Papadimitriou and P. Raghavan, Auditing Boolean Attributes, PODS 2000
So the only way to protect against a large number of unaudited queries is to add noise to the database. The proof in Dinur and Nissim is that adding noise protects against *all* queries, random and otherwise. The more noise, the more protection.


===Textbook===
===Textbook===

Revision as of 08:29, 10 January 2019

A few references on Differential Privacy, for people who don't want to get bogged down with the math.

Introduction

Printed Materials

Podcasts

Videos

  • Four Facets of Differential Privacy, Differential Privacy Symposium, Institute for Advanced Study, Princeton, Saturday, November 12. A series of talks by Cynthia Dwork, Helen Nissenbaum, Aaron Roth, Guy Rothblum, Kunal Talwar, and Jonathan Ullman. View all on the IAS YouTube channel.

Database Reconstruction

The idea of that releasing multiple queries on a confidential database could result in the reconstruction of the confidential database goes back to the 1970s.

We explain how to perform database reconstruction in our 2018 ACM Queue article:

This article summarizes the risks of database reconstruction, as understood in 1989:

I learned of the connection from Dorothy Denning's work on The Tracker:

Dinur and Nissim's "Database Reconstruction Theory" is actually a proof that random queries on a database, which can be generated with complexity P, will reveal the full contents of the database:

But query auditing was shown to be NP-hard in 2000:

  • J. M. Kleinberg, C. H. Papadimitriou and P. Raghavan, Auditing Boolean Attributes, PODS 2000

So the only way to protect against a large number of unaudited queries is to add noise to the database. The proof in Dinur and Nissim is that adding noise protects against *all* queries, random and otherwise. The more noise, the more protection.


Textbook

Foundational Papers

Critical Papers

Mechanisms

Public Perception

Philosophy

Existing Applications

On The Map, at the US Census Bureau

RAPPOR, in Google Chrome

Uber

Apple


Advanced Topics

Differential Privacy and Floating Point Accuracy

Floating point math is not continuous, and differential privacy implementations that assume it is may experience a variety of errors that result in privacy loss. A discussion of the problems inherently in floating-point arithmetic can be found in Oracle's What Every Computer Scientist Should Know About Floating-Point Arithmetic, an edited reprint of the paper What Every Computer Scientist Should Know About Floating-Point Arithmetic, by David Goldberg, published in the March, 1991 issue of Computing Surveys.

"How Will Statistical Agencies Operate When All Data Are Private?" (MS #1142) has been published to Journal of Privacy and Confidentiality. http://repository.cmu.edu/jpc/vol7/iss3/1

The Fool's Gold Controversy

Other attacks

Math

p for randomized response rate:

$p = \frac{e^\epsilon}{1+e^\epsilon}$

Probability that randomized response should be flipped.

See Also