|(3 intermediate revisions by the same user not shown)|
We are developing a variety of techniques and tools for performing ''Automated Document and Media Exploitation''<ref>Garfinkel, S. [http://simson.net/clips/academic/2007.ACM.Domex.pdf "Document and Media Exploitation,"] <i>ACM Queue</i>, November/December 2007.
(ADOMEX). The thrust of this research covers three main areas:
a variety of and tools and <ref>Garfinkel, S. [http://simson.net/clips/academic/2007.ACM.Domex.pdf "Document and Media Exploitation,"] <i>ACM Queue</i>, November/December 2007.</ref><ref>, , , , , </ref>
# Developing open source tools for working with electronic evidence. This work is part of the [http://www.afflib.org AFF] project<ref> [http://www.simson.net/clips/academic/2006.CACM.AFF.pdf "AFF: A New Format for Storing Hard Drive Images, "] Garfinkel, S., Communications of the ACM, February, 2006</ref> .
# Developing an unclassified [[Real Data Corpus]] (RDC) consisting of "real data from real people" that can be used to develop new algorithms, quantify results, and test automated tools.
# Developing new algorithms and approaches for working in a "data-rich environment" such as a large collection of hard drives that have been seized during the course of law enforcement or military operations.
==Recent Research Developments==
File-based forensics is forensics that is based on an analysis of files, deleted files and orphan files. Most forensics currently performed for law enforcement, commercial e-discovery, and for intelligence purposes is based on file forensics. The goal here is typically to find a specific file that can be shown to a jury or that contains actionable intelligence. File forensics is typically performed using programs such as EnCase, FTK, or SleuthKit.
* We have developed a batch analysis tool called system called ''' fiwalk''' which can take a disk image and produce an XML file corresponding to all of the files, deleted files, orphan files, and all of the extracted file metadata from a disk image. This XML file can be used as an input to enable further automated media processing. Using this system we have created a variety of applications for reporting and manipulating disk images. We have also developed an efficient system for allowing remote file- level access of disk images using XML- RPC and REST. Details can be found in our paper<ref>[http://simson.net/clips/academic/ 2009.SADFE. xml_forensics.pdf Automating Disk Forensic Processing with SleuthKit, XML and Python], [http:// conf. ncku. edu.tw/ sadfe/ sadfe09/ Fourth International IEEE Workshop on Systematic Approaches to Digital Forensic Engineering] ( IEEE/SADFE'09) , May 2009</ref> .
''''''can an file , of file . can be used to enable .Using and file
-of -and . <ref>[http://simson.net/clips/academic/..pdf Forensic
, and , [http://..///to Digital Forensic ]()</ref>
* We have developed a prototype system for performing automated media forensic reporting. Based on PyFlag, the system performs an in-depth analysis of captured media, locates local and online identities, and presents summary information in a report that is tailed to be easy for the consumer of forensic intelligence<ref>[http://www.simson.net/clips/students/09_Farrell.pdf A Framework for Automated Digital Forensic Reporting], Lt. Paul Farrell, Master's Thesis, Naval Postgraduate School, Monterey, CA, March 2009</ref>.
that to .
===Bulk data forensics===
Bulk data forensics is based on the bulk analysis of disk images and other kinds of forensic source data. File carving<ref>[http://www.simson.net/clips/academic/2007.DFRWS.pdf "Carving Contiguous and Fragmented Files with Fast Object Validation"], Garfinkel, S., Digital Investigation, Volume 4, Supplement 1, September 2007, Pages 2--12.</ref> is a traditional form of bulk forensics. We see bulk forensics as a complement to existing forensic processing, rather than as a replacement for it.
Bulk data forensics has several important advantages over traditional forensic processing:
* It's faster, because the disk head scans the disk (or disk image) from beginning to end without having to seek from file to file.
* It can tolerate media that is damaged or incomplete, since the forensic processing does not require the reconstruction of file allocation tables, disk directories, or metadata.
* It works with obscure or unknown operating systems, since no attempt is made to reconstruct the file system or other operating system structures.
* It lends itself to statistical processing. Instead of scanning the entire disk image, the image can be sampled.
We have developed several interesting tools for bulk data forensics:
* '''[http://www.forensicswiki.org/wiki/Frag_find frag_find]''' is a tool that can report if sectors of a TARGET file are present on a disk image. This is useful in cases where a TARGET file has been stolen and you wish to establish that the file has been present on a subject's drive. If most of the TARGET file's sectors are found on the IMAGE drive---and if the sectors are in consecutive sector runs---then the chances are excellent that the file was once there. Frag_find uses a three-stage filter with a high-speed but low-quality 32-bit hash, then a Bloom filter of SHA1 hashes,<ref>[http://www.simson.net/clips/academic/2008.ACSAC.Bloom.pdf “Practical Applications of Bloom filters to the NIST RDS and hard drive triage,”] Farrell, Garfinkel and White, ACSAC 2008</ref> and finally a linked list of all hashes. The result is that disk sectors are only hashed with necessary, allowing processing speeds of 50K sectors/sec on standard hardware. The program deals with the problem of non-unique blocks by looking for runs of matching blocks, rather than individual blocks. '''Frag_find''' is part of the NPS Bloom package, which can be downloaded from http://www.afflib.org.
* '''bulk_extractor''' is a tool that searches for recognized features in the bulk data and performs histogram analysis on the result. You can write your own feature extractors using '''flex'''. We provide '''bulk_extractor''' with an extractor that finds complete email addresses and domain names. The most common email address on a hard drive is usually that of the drive's primary use; other top-occuring email addresses tend to belong to that person's primary correspondents. By looking at the list of email addresses sorted by frequency, it is easy to rapidly infer the user's social network.
* '''CDA Tool''' takes the results of the '''bulk_extractor''' tool and performs a [[Cross-Drive Analysis]]<ref>[http://www.simson.net/clips/academic/2006.DFRWS.pdf "Forensic Feature Extraction and Cross-Drive Analysis,"] Garfinkel, S., Digital Investigation, Volume 3, Supplement 1, September 2006, Pages 71--81.</ref>. This can allow an investigator to discover previously unknown social networks in a set of hard drives, or to see if a newly acquired hard drive belongs to an existing social network. Currently the tool simply prints a report of the amount of connection between each drive. We plan to expand this tool to show the results graphically and to allow the analyst to drill down and see the cause of the connections.
We are working on two research projects that have not yet produced any tools:
# A technique that can rapidly characterize the content of a large hard drive through statistical sampling. We believe that this system will be able to accurately report the percentage of encrypted data on a 1TB hard drive with less than 10 seconds of analysis. For further information please see the [[Sub-Linear_Drive_Analysis|page on Sub-Linear drive analysis]].
# A technique for ascribing carved data to a particular individual who created that data.
* [http://www.forensicswiki.org/wiki/Simson%27s_Open_Research_Topics Open Research Topics] on the [http://www.forensics.org/ Forensics Wiki]