Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Understanding database reconstruction attacks on public data
Garfinkel S., Abowd J., Martindale C. Queue16 (5):28-53,2018.Type:Article
Date Reviewed: Oct 7 2019

I found this article on the US Census Bureau to be a fascinating tale of the history of the census, including attacks on it and methods that can be used to prevent the loss of personally identifying information (PII) stored there. The authors are brutally honest at some points and tantalizingly vague in others. They tell the truth about the insecurity of previous storage and loss prevention methods; however, they are purposely (they claim) vague about the specific methods used to secure the 2010 census data.

Unless you have a good background in statistics, there are some points where the authors may lose you. They offer “an example database reconstruction attack” of how it is possible to extract PII data from a database that was previously thought to be secure; it requires some statistics knowledge to completely comprehend.

In addition to the statistical data in the example, the authors explain different defense methods: (1) “publish less data” (this could deny legitimate researchers access to all of the data); (2) input noise injection, that is, “apply noise before tabulation” (this was used in 2010 and was less than totally successful); and (3) output noise injection, that is, “apply noise to the published statistics.” Furthermore, “whereas input noise injection applies noise to the microdata directly, output noise injection applies output to the statistical publications.” The authors go on to state,

When noise is added to either the input data [method 2] or the tabulation results [method 3], with all records having equal probability of being altered, it is possible to mathematically describe the resulting privacy protection. This is the basis of differential privacy.

Note that the differential privacy method will be used in 2020 to protect the census data from data loss.

As the authors discuss, the Census Bureau needs some new methods to ensure that the 2020 census data is protected. They give an example of how the data could be hacked in a theoretical sense, but do not give any specifics. This is on purpose. Providing real examples could make them accessories before the fact if someone used their methods to hack the census. They do use a tool called PicoSAT for one example, but also mention SAT, SMT, and MIP solvers as other tools that could be used but are not as thorough.

In general, their example method is a brute force attack accomplished by matching outcomes with constraints. The method works here because there are a limited number of variables and constraints. The actual census data is much larger, with many more variables, and therefore would require more constraints and much more time.

The authors suggest some possible methods to improve security and support the use of differential privacy through noise injection, which is being adopted by the Census Bureau. This article will primarily interest those who use census data, since it should be more difficult to derive private data from the census.

Reviewer:  Michael Moorman Review #: CR146717 (1912-0449)
Bookmark and Share
  Featured Reviewer  
 
General (H.2.0 )
 
 
Statistics (K.1 ... )
 
Would you recommend this review?
yes
no
Other reviews under "General": Date
Design of the Mneme persistent object store
Moss J. ACM Transactions on Information Systems 8(2): 103-139, 2001. Type: Article
Jul 1 1991
Database management systems
Gorman M., QED Information Sciences, Inc., Wellesley, MA, 1991. Type: Book (9780894353239)
Dec 1 1991
Database management (3rd ed.)
McFadden F., Hoffer J., Benjamin-Cummings Publ. Co., Inc., Redwood City, CA, 1991. Type: Book (9780805360400)
Jun 1 1992
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy