Difference between revisions of "Real Data Corpus"

From Simson Garfinkel
Jump to navigationJump to search
Line 10: Line 10:
# We have used the corpus to evaluate the effectiveness of today's computer forensic tools.
# We have used the corpus to evaluate the effectiveness of today's computer forensic tools.
# We are using the corpus to create new computer forensic tools and techniques.
# We are using the corpus to create new computer forensic tools and techniques.
==References==
<references/>

Revision as of 13:12, 12 April 2009

In 2005 more than 210 million hard drives were retired from primary service. Although many of these drives were destroyed, many are repurposed within organizations, donated to charities, or sold on the secondary market.

We are purchasing a statistically-significant quantity of used hard drives on the secondary market and analyzing the content of these drives. Many of the drives contain remnant information from their previous users --- in many cases no effort whatsoever has been made to purge this data from the drives. To date we have purchased more than 2000 hard drives on the secondary market and have created a unique resource with more than a terabyte of compressed disk images.

We have used this unique resource for a variety of research purposes:

  1. Our first publication [1] alerted the community to the scale of the problem of data on repurposed hard drives. Following this publication, the US government passed legislation creating an affirmative responsibility on the part of American businesses to purge consumer information from hard drives before discarding them [2], and two major products were introduced that use cryptography to rapidly "shred" information on stored magnetic media [3][4].
  2. We conducted a "trace-back" study in which 20 organizations were contacted who had data on the hard drives that we obtained. Based on interviews, we were able to identify the technical and organizational failures that resulted in the data compromises.[5]
  3. We have identified patterns and principles for promoting secure human-computer interaction.
  4. As part of developing this resource, we have developed a new file format for storing disk images[6][7], and we are developing a new technique for mapping social networks among individuals whose data is on captured hard drives. These approaches could be used, for example, to allow the rapid and automated analysis of disk drives seized during the course of a police investigation or obtained as part of military operations.
  5. We have used the corpus to evaluate the effectiveness of today's computer forensic tools.
  6. We are using the corpus to create new computer forensic tools and techniques.

References

  1. S. Garfinkel. and A. Shelat. "Remembrance of Data Passed: A Study of Disk Sanitization Practices," IEEE Security and Privacy, January/February 2003.
  2. US Congress. Fair and Accurate Credit Transactions Act of 2003
  3. Seagate Technology. Momentus Family Overview, 2006
  4. Decru. DataFort Security Appliances, 2005
  5. S. Garfinkel. "Design Principles and Patterns for Computer Systems that are Simultaneously Secure and Usable," PhD Thesis, Massachusetts Institute of Technology, June 2005
  6. S. Garfinkel. "AFF: A New Format for Storing Hard Drive Iamges," Communications of the ACM, February, 2006
  7. S. Garfinkel and D. Malan and K. Dubec and C. Stevens and C. Pham. "Disk Imaging with the Advanced Forensics Format, Library and Tools," The Second Annual IFIP WG 11.9 International Conference on Digital Forensics, National Center for Forensic Science, Orlando, Florida, USA January 29 - February 1 2006.