CS3636 - Data Fusion with Online Information Systems

From Simson Garfinkel
(Redirected from CS3636)
Jump to navigationJump to search

CS3636: Data Fusion with Online Information Systems

  • Learn how to follow a person as they move around the country by watching how their IP address changes.
  • Learn how Google's AdWords and AdSense online advertising programs work.
  • Learn how to correlate public records with privacy-sensitive information stored on Facebook and Linked In.
  • Learn how to flex your muscles with the Fair Credit Reporting Act to get prejudicial (but true) data removed from your credit report.


This course explores data fusion as applied to personal information in both the online and offline world. Topics include credit and criminal databases, Information Surveillance, GPS, Satellite imagery, online search, text mining, anonymization, reidentification, and privacy policy. Familiarity with statistics useful but not mandatory.

Military success in the modern world increasingly requires that US forces fuse together information from multiple databases, perform complex information extraction and data mining activities, and then present information in clear and concise formats to decision makers. This is true whether the goal is to track down insurgents, evaluate and respond to environmental catastrophes, or improve efficiency e.g., integrating disparate government information systems. This course provides the students with the intellectual framework to understand what kinds of data fusion objectives are possible—and which are legal; to evaluate systems that have been created; and to understand how US adversaries can make unanticipated use of databases that are released by DoD.

Outline

Databanks in a free society

  1. What is Data Fusion and Why do we do it?
  2. Large Databanks and Fair Information Practice
  3. Large Databanks and Data Fusion

Data Fusion Technology

  1. Database Technology
  2. Data Fusion Theory
  3. GPS and Geospatial Data Fusion

Beyond Structured Data

  1. Online Search and Surveillance
  2. Sensor Networks and “Reality Mining”
  3. Text, Email and Web Mining
  4. Identification, Anonymization and Re-Identification

Sample Paper Topics

  • Evaluating the accuracy of web-based IP geolocation services
  • Analysis of Wikipedia
  • Genomic Data Fusion
  • Offline Sources: phone books (batch and on CD) are one of many examples where data potentially valuable for fusion exists in a form other than "online information systems." Eplore and Discuss.
  • Intelligent Transportation Systems.