Naval Postgraduate School
Fall 2008
Mon Oct 6, 2008
Hashing, Bloom Filters, Similarity Hashing, Multi-Resolution Similarity Hashing
We will be discussing hashing, from MD5 and SHA1 through the NSRL, Bloom Filters, Multi-Resolution Similarity Hashing, and edit distances.
- What hash functions are and how they work.
- Terminology: preimage, residue, collision-resistance, preimage resistant, second preimage resistant Wikipedia's writeup on cryptographic hash functions is surprisingly good
- NIST Cryptographic Hash Project
- MD5, SHA1, SHA-256
- Big idea: Change one input bit, approximately half the output bits change
- Optional Reading: A New Hash Competition, William E. Burr, IEEE Security and Privacy, May-June 2008.
- CRC64 - Not a hash function, but good for sector discrimination if there is no adversary.
- Bloom Filters - How they work
- NSRL - Known Goods
- Known Bads (viruses; target documents)
- Similarity Matching
- Multi-Resolution Similarity Hashing
- Edit Distances
Readings
Optional Readings
- Identifying almost identical files using context triggered
piecewise hashing, Jesse Kornblum, ManTech, DFRWS 2006.
- DFRWS 2004 Workshop Report and Findings, Chester J. Maciag, Editor, AFRL Information Directorate, 2004
- md5bloom: Forensic filesystem hashing revisited, Vassil Roussev, Yixin Chen, Timothy Bourg, Golden G. Richard III, DFRWS 2006.
- An Ad Hoc Review of Digital Forensic Models, Pollitt, Mark M., Dept. of Eng. Technol., Central Florida Univ.; April 2007