Time Analysis Research Opportunities

From Simson Garfinkel
Jump to navigationJump to search

I am working on a project involving the analysis of time on the Internet. This project started as an effort to replicate Florian Buchholz and Brett Tjaden's 2007 DFRWS paper, A Brief Study of Time. Having replicated their basic analysis framework, we have been collecting data and now have identified the following sub-projects:

Time Acquisition Infrastructure Development
We have created a system for monitoring and recording the time of servers on the Internet. Currently the system uses a multi-threaded acquisition program written in Python that can query 10 hosts/sec and store the results in an SQL server. We need to scale up this system so that it can:
  • Scan more systems faster
  • Store information in the database more efficiently
  • Split the database, so analysis does not interfere with acquisition
  • Provide for durable backups.
  • Reliably detect and report infrastructure problems.
  • Easily ramp up new analysis modes
  • Solid tools for database management
Time Analysis Infrastructure and GUI Development
We be have created a system that stores information about server clock drift in a database. We have created a single visualization. We need tools to make analysis faster, more efficient, and more powerful. We are specifically looking to do the following:
  • Develop an analysis dashboard that allows a web user to rapidly search for and display the results associated with servers that meet certain criteria.
  • Develop an efficient web-based analysis API, so that analysis can built into web pages and apps.
  • When we see multiple time skews behind a single IP address, is there a way that we can identify whether or not there are actually multiple servers? If there are not, what is the source of the skew?
Improved Time Analysis Algorithms
Currently we have software that can identify hosts that have the wrong time (significantly). We can also determine, for some of these hosts, the source of the problem. For example, some machines have clocks that are routinely synchornized and then drift; with a simple regression, we can identify the time that the hosts were synchronized, which is typically when they boot. We would like to do the following:
  • Develop machine-learning approaches to identify common situations that are easily identified visually by humans.
  • Then, use clustering to identify situations that are not easily identified by humans.
  • Develop an analysis framework that sensibly handles changes in the round-trip-time. (Right now, we ignore it.)

Improving acquisition techniques
Develop plug-in tools for measuring the time of remote systems. We currently have the ability to measure the system clocks of web servers and NTP servers. The time of mail servers can be measured by sending the mail server a message that will bounce and analyzing the returned headers. However, we can measure the system clock of any server that uses a protocol that embeds the time as a nounce. What other protocols can be measured?
Finding embedded time and location information
What file formats embed timestamps and locations? Find them, and develop tools for them.