Difference between revisions of "Automated Computer Forensics"

Latest revision as of 12:40, 17 March 2018

Current Research Areas

One of my primary areas of research is the development of algorithms, techniques, and eventually tools for automating a wide variety of computer forensics tasks that are currently performed by trained analysts. Today much work performed by computer analysts is performed with visualization tools that allow an analyst to search for data on a hard drive or captured from a network and slowly construct a story that might be useful in a prosecution or in recovering from a security event. But as data volumes increase and the network environment becomes increasingly complex, there is a need for increasingly automated tools that can perform autonomous analysis and correlation^[1]^[2]

Today my research into this field of automated computer forensics covers these main areas:

Small-block forensics---Exploring approaches for working with data elements in the 4KiB to 64KiB range and that are not aligned with file boundaries. This can be used in situations where an entire file is not available for reconstruction, or only a portion of a file is available for analysis. Small block forensics can be used to enable approaches based on statistical sampling rather than full-content analysis.^[3]
Data-rich algorithms and approaches that are designed to work in environments where there is a large collection of data from multiple users, as can be the case in law enforcement, e-discovery, and internal corporate investigations. ^[4]
Media/Web correlation --- Exploring opportunities for automatic correlation of information on hard drives with information that can be found on the web.
Corpus Creation --- Developing realistic corpora that can be used in education and software development that do not contain personal information.^[5]

Related work areas that I am not personally involved in includes:

Approaches for gisting and clustering documents based on their content.
Approaches that are tuned to human languages other than English.

Relevant Publications

↑ Garfinkel, S. "Document and Media Exploitation," ACM Queue, November/December 2007.
↑ Garfinkel, Simson, Digital Forensics Research: The Next 10 Years , DFRWS 2010, Portland, OR
↑ Simson Garfinkel, Vassil Roussev, Alex Nelson and Douglas White, Using purpose-built functions and block hashes to enable small block and sub-file forensics, DFRWS 2010, Portland, OR
↑ Garfinkel, S., Forensic Feature Extraction and Cross-Drive Analysis,The 6th Annual Digital Forensic Research Workshop Lafayette, Indiana, August 14-16, 2006.
↑ Garfinkel, Farrell, Roussev and Dinolt, Bringing Science to Digital Forensics with Standardized Forensic Corpora, DFRWS 2009, Montreal, Canada. (slides)

[1] Garfinkel, S. "Document and Media Exploitation," ACM Queue, November/December 2007.

[2] Garfinkel, Simson, Digital Forensics Research: The Next 10 Years , DFRWS 2010, Portland, OR

[3] Simson Garfinkel, Vassil Roussev, Alex Nelson and Douglas White, Using purpose-built functions and block hashes to enable small block and sub-file forensics, DFRWS 2010, Portland, OR

[4] Garfinkel, S., Forensic Feature Extraction and Cross-Drive Analysis,The 6th Annual Digital Forensic Research Workshop Lafayette, Indiana, August 14-16, 2006.

[5] Garfinkel, Farrell, Roussev and Dinolt, Bringing Science to Digital Forensics with Standardized Forensic Corpora, DFRWS 2009, Montreal, Canada. (slides)

[1]

[2]

[3]

[4]

[5]

@@ Line 1: / Line 1: @@
-We are developing a variety of techniques and tools for performing ''Automated Document and Media Exploitation''<ref>Garfinkel, S. [http://simson.net/clips/academic/2007.ACM.Domex.pdf "Document and Media Exploitation,"] <i>ACM Queue</i>, November/December 2007.
+==Current Research Areas==
-</ref> (ADOMEX). The thrust of this research covers three main areas:
+One of my primary areas of research is the development of algorithms, techniques, and eventually tools for automating a wide variety of computer forensics tasks that are currently performed by trained analysts. Today much work performed by computer analysts is performed with visualization tools that allow an analyst to search for data on a hard drive or captured from a network and slowly construct a story that might be useful in a prosecution or in recovering from a security event. But as data volumes increase and the network environment becomes increasingly complex, there is a need for increasingly automated tools that can perform autonomous analysis and correlation<ref>Garfinkel, S. [http://simson.net/clips/academic/2007.ACM.Domex.pdf "Document and Media Exploitation,"] <i>ACM Queue</i>, November/December 2007.</ref><ref>Garfinkel, Simson, Digital Forensics Research: The Next 10 Years , DFRWS 2010, Portland, OR</ref>
-# Developing open source tools for working with electronic evidence. This work is part of the [http://www.afflib.org AFF] project<ref>[http://www.simson.net/clips/academic/2006.CACM.AFF.pdf "AFF: A New Format for Storing Hard Drive Images,"] Garfinkel, S., Communications of the ACM, February, 2006</ref>.
-# Developing an unclassified [[Real Data Corpus]] (RDC) consisting of "real data from real people" that can be used to develop new algorithms, quantify results, and test automated tools.
-# Developing new algorithms and approaches for working in a "data-rich environment" such as a large collection of hard drives that have been seized during the course of law enforcement or military operations.
-==Recent Research Developments==
+Today my research into this field of automated computer forensics covers these main areas:
-===File-based forensics===
-File-based forensics is forensics that is based on an analysis of files, deleted files and orphan files. Most forensics currently performed for law enforcement, commercial e-discovery, and for intelligence purposes is based on file forensics. The goal here is typically to find a specific file that can be shown to a jury or that contains actionable intelligence. File forensics is typically performed using programs such as EnCase, FTK, or SleuthKit.
-* We have developed a batch analysis tool called system called '''fiwalk''' which can take a disk image and produce an XML file corresponding to all of the files, deleted files, orphan files, and all of the extracted file metadata from a disk image. This XML file can be used as an input to enable further automated media processing. Using this system we have created a variety of applications for reporting and manipulating disk images. We have also developed an efficient system for allowing remote file-level access of disk images using XML-RPC and REST. Details can be found in our paper<ref>[http://simson.net/clips/academic/2009.SADFE.xml_forensics.pdf Automating Disk Forensic Processing with SleuthKit, XML and Python], [http://conf.ncku.edu.tw/sadfe/sadfe09/ Fourth International IEEE Workshop on Systematic Approaches to Digital Forensic Engineering] (IEEE/SADFE'09), May 2009</ref>.
+# '''Small-block forensics'''---Exploring approaches for working with data elements in the 4KiB to 64KiB range and that are not aligned with file boundaries.  This can be used in situations where an entire file is not available for reconstruction, or only a portion of a file is available for analysis. Small block forensics can be used to enable approaches based on statistical sampling rather than full-content analysis.<ref>Simson Garfinkel, Vassil Roussev, Alex Nelson and Douglas White, Using purpose-built functions and block hashes to enable small block and sub-file forensics, DFRWS 2010, Portland, OR</ref>
+# '''Data-rich algorithms and approaches''' that are designed to work in environments where there is a large collection of data from multiple users, as can be the case in law enforcement, e-discovery, and internal corporate investigations. <ref>Garfinkel, S., [http://simson.net/clips/academic/2006.DFRWS.pdf Forensic Feature Extraction and Cross-Drive Analysis,]The 6th Annual Digital Forensic Research Workshop Lafayette, Indiana, August 14-16, 2006.</ref>
+# '''Media/Web correlation''' --- Exploring opportunities for automatic correlation of information on hard drives with information that can be found on the web.
+# '''Corpus Creation''' --- Developing realistic corpora that can be used in education and software development that do not contain personal information.<ref>Garfinkel, Farrell, Roussev and Dinolt, [http://www.simson.net/clips/academic/2009.DFRWS.Corpora.pdf Bringing Science to Digital Forensics with Standardized Forensic Corpora], DFRWS 2009, Montreal, Canada. [http://simson.net/clips/academic/2009.DFRWS.Corpora.slides.pdf (slides)]</ref>
-* We have developed a prototype system for performing automated media forensic reporting. Based on PyFlag, the system performs an in-depth analysis of captured media, locates local and online identities, and presents summary information in a report that is tailed to be easy for the consumer of forensic intelligence<ref>[http://www.simson.net/clips/students/09_Farrell.pdf A Framework for Automated Digital Forensic Reporting], Lt. Paul Farrell, Master's Thesis, Naval Postgraduate School, Monterey, CA, March 2009</ref>.
+Related work areas that I am not personally involved in includes:
+# Approaches for '''gisting''' and clustering documents based on their content.
+# Approaches that are tuned to human languages other than English.
-===Bulk data forensics===
-Bulk data forensics is based on the bulk analysis of disk images and other kinds of forensic source data. File carving<ref>[http://www.simson.net/clips/academic/2007.DFRWS.pdf "Carving Contiguous and Fragmented Files with Fast Object Validation"], Garfinkel, S., Digital Investigation, Volume 4, Supplement 1, September 2007, Pages 2--12.</ref> is a traditional form of bulk forensics. We see bulk forensics as a complement to existing forensic processing, rather than as a replacement for it.
-Bulk data forensics has several important advantages over traditional forensic processing:
-* It's faster, because the disk head scans the disk (or disk image)  from beginning to end without having to seek from file to file.
-* It can tolerate media that is damaged or incomplete, since the forensic processing does not require the reconstruction of file allocation tables, disk directories, or metadata.
-* It works with obscure or unknown operating systems, since no attempt is made to reconstruct the file system or other operating system structures.
-* It lends itself to statistical processing. Instead of scanning the entire disk image, the image can be sampled.
-We have developed several interesting tools for bulk data forensics:
-* '''[http://www.forensicswiki.org/wiki/Frag_find frag_find]''' is a tool that can report if sectors of a TARGET file are present on a disk image.  This is useful in cases where a TARGET file has been stolen and you wish to establish that the file has been present on a subject's drive. If most of the TARGET file's sectors are found on the IMAGE drive---and if the sectors are in consecutive sector runs---then the chances are excellent that the file was once there. Frag_find uses a three-stage filter with a high-speed but low-quality 32-bit hash, then a Bloom filter of SHA1 hashes,<ref>[http://www.simson.net/clips/academic/2008.ACSAC.Bloom.pdf “Practical Applications of Bloom filters to the NIST RDS and hard drive triage,”] Farrell, Garfinkel and White, ACSAC 2008</ref> and finally a linked list of all hashes. The result is that disk sectors are only hashed with necessary, allowing processing speeds of 50K sectors/sec on standard hardware. The program deals with the problem of non-unique blocks by looking for runs of matching blocks, rather than individual blocks. '''Frag_find''' is part of the NPS Bloom package, which can be downloaded from http://www.afflib.org.
-* '''bulk_extractor''' is a tool that searches for recognized features in the bulk data and performs histogram analysis on the result. You can write your own feature extractors using '''flex'''. We provide '''bulk_extractor''' with an extractor that finds complete email addresses and domain names. The most common email address on a hard drive is usually that of the drive's primary use; other top-occuring email addresses tend to belong to that person's primary correspondents. By looking at the list of email addresses sorted by frequency, it is easy to rapidly infer the user's social network.
-* '''CDA Tool''' takes the results of the '''bulk_extractor''' tool and performs a [[Cross-Drive Analysis]]<ref>[http://www.simson.net/clips/academic/2006.DFRWS.pdf "Forensic Feature Extraction and Cross-Drive Analysis,"] Garfinkel, S., Digital Investigation, Volume 3, Supplement 1, September 2006, Pages 71--81.</ref>. This can allow an investigator to discover previously unknown social networks in a set of hard drives, or to see if a newly acquired hard drive belongs to an existing social network. Currently the tool simply prints a report of the amount of connection between each drive. We plan to expand this tool to show the results graphically and to allow the analyst to drill down and see the cause of the connections.
-We are working on two research projects that have not yet produced any tools:
-# A technique that can rapidly characterize the content of a large hard drive through statistical sampling. We believe that this system will be able to accurately report the percentage of encrypted data on a 1TB hard drive with less than 10 seconds of analysis. For further information please see the [[Sub-Linear_Drive_Analysis|page on Sub-Linear drive analysis]].
-# A technique for ascribing carved data to a particular individual who created that data.
 ==Relevant Publications==
 <references/>
-==See Also==
-* [http://www.forensicswiki.org/wiki/Simson%27s_Open_Research_Topics Open Research Topics] on the [http://www.forensics.org/ Forensics Wiki]
 __NOTOC__

Difference between revisions of "Automated Computer Forensics"

Latest revision as of 12:40, 17 March 2018

Current Research Areas

Relevant Publications

Navigation menu

Page actions

Page actions

Personal tools

Pages

Search

Academic

Special

Contact

Tools