CS4920 - Automated Document and Media Exploitation

From Simson Garfinkel
Jump to navigationJump to search

CS4920: Automated Document and Media Exploitation


This course will discuss the theory and current practice of Automated Media Exploitation in the research and government environment. Our textbook will be recently published research papers, open source and proprietary software, and guest speakers.

Class Presentations

This course is taught seminar-style. Each student will be expected to research and lead a class discussion on two topics.

Course Outline

Part 1: Getting Started; Toolset

  • Working with Real Data in academia: IRBs and Human Subject Compliance
  • SQL and Lucene: Indexing human documents
  • Building a large-scale data processing system
  • Hiding your data: Private Information Retrieval & Bloom Filters
  • Hashing, Similarity Hashing, Multi-Resolution Similarity Hashing

Part 2: Media Imaging and Ingest

  • Disk Imaging with AFF.
  • File Systems, File System Forensics
  • Sleuth Kit, EnCase and FTK
  • Cell phone exploitation
  • File Formats, Data and Metadata
  • Time
  • Dealing with Non-ASCII scripts: Code Pages and UNICODE
  • Name/Entity Extraction

Part 3: Analysis: Interactive and Batch

  • Interactive vs. Batch Systems
  • Options for Reporting
  • Evaluation Strategies
  • Data Fusion
  • Identity Resolution
  • Data mining algorithms
  • Text mining
  • Gisting
  • Automated Translation


Human Subjects Approval

As students in this class will be working with the Real Data Corpus, all students will be required to complete the NPS CITI Human Research Curriculum and provide the instructor with a copy of their Completion Report. (Specify "Department of Navy" when you create your account.)


Grades will be determined by class participation (1/3), class presentations (1/3), and a final project (1/3).

Each student must find three interesting articles and put them on the wiki.

Final Project

Students will work alone or in pairs to produce a conference-style 8-to-12 page publishable paper describing their original research in this field. Original research can be either technical (ie: developing an algorithm or writing a new plug-in) or analytical (ie: surveying capabilities or requirements).

Collaboration, Plagiarism, Academic Integrity and the Honor Code

It is strongly recommended that you discuss the readings and assignments with your classmates. You may wish to organize reading or study groups for this purpose. However, it is also expected that the homework you submit will be your own work. You may not collaborate on final projects unless you have received specific permission to do so in advance.

Plagiarism in any form will not be tolerated in this course. This includes both direct plagiarism, in which you reprint words written by another person without reference, and to intellectual plagiarism, in which you present another person's ideas or argument as if they are your own.

The easiest way to protect yourself from a charge of plagiarism is to be careful in your citations. There is nothing wrong with quoting other authors provided that you properly cite their work. Likewise, there is nothing wrong with presenting an argument that has been advanced by another author, but you must give that author credit in your writing.

Academic integrity on the part of U.S. and International officers and civilians participating in NPS programs is an important aspect of professional performance.

The provisions of NAVPGSCOLINST 5370.1C of the Academic Honor Code will be strictly enforced.

If you have questions about collaboration, plagiarism or academic integrity, please contact the class staff.

Citation Policy

It is expected that you will reference a variety of articles and other sources in the preparation of your assignments and final project. You are welcome to use either the so-called "Harvard Style" or IEEE style to cite your references.

A URL without an author, title, publication title, and publication date is not an acceptable citation format. Citations that are bare URLs will be ignored.

Wikipedia and forensicswiki entries are surprisingly good and will frequently be recommended as supplementary reading in this course. However, due to the nature of how Wikipedia is complied and edited, Wikipedia entries are not to be used as authortative citations in this course

Past Years