* Final Project Requirements:

- Due 5:30pm, May 25th.  
   - No extensions. If you turn it in late, you get a 0.

- Option #1: a paper
  - 5 page analysis of an aspect of a program or a system, using the techniques
    we discussed in class.
  - 5 pages (of text) of a design for a user interface.
  - Some other 5 page paper that is of appropriate quality and scholarship for this class.

- Option #2: A program.
  - A prototype that clearly illustrates something that we have discussed in class.
  - A one-page document describing the program, why you did it, what it shows.
  - Include both a runable executable and all the source.
  - Information not turned in will not count towards your grade.




****************************************************************
TIME
* What time is it?
* Why is it important to have a single time? 
* http://www.time.gov/
  - http://tf.nist.gov/service/its.htm
  - http://tycho.usno.navy.mil/
  - Powerpoint: http://www.navcen.uscg.gov/cgsic/meetings/summaryrpts/38thmeeting/Miranian.ppt
* Internet protocols that get the time:
  - NTP protocol
* NetGear flaw triggers DoS attack
  - http://news.com.com/2100-1002_3-5068035.html
  - http://www.cs.wisc.edu/~plonka/netgear-sntp/
* Some HTTP DoS did this recently (used GET instead of HEAD)
* mrtg.org
* How do you handle changes in the time.
* How do you handle timezones?
* iCalendar

BEGIN:VCALENDAR
CALSCALE:GREGORIAN
PRODID:-//Apple Computer\, Inc//iCal 2.0//EN
VERSION:2.0
BEGIN:VEVENT
LOCATION:53 Church Street room 203.
EXDATE;TZID=US/Eastern:20060330T173000
UID:0A1B543D-0E33-4AF9-A22A-D29B5338F61B
SEQUENCE:13
DTSTAMP:20060510T153902Z
DTSTART;TZID=US/Eastern:20060202T173000
SUMMARY:CSCI E-180+
DTEND;TZID=US/Eastern:20060202T193000
RRULE:FREQ=WEEKLY;INTERVAL=1;UNTIL=20060526T035959Z;BYDAY=TH;WKST=SU
END:VEVENT
END:VCALENDAR

* How do you handle clock changes?
  - scoreboards?

* Crazy Clocks
  - What's the lesson of this article?


****************************************************************
Natural Language Processing

References:
* http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Computer-Science/6-863JSpring2003/CourseHome/index.htm

* What is natural language processing and what can we do?
  - Spelling correction
  - Written commands
  - Making sense of text that's online

* Limitations:
  - Language specific (usually)
  - You can get perfect (even people aren't perfect)

* Two techniques:
  - Try to "understand" the text
  - Statistics

* Examples:
  - SBook
  - Scheduling in Google Calendar
  - Seminar announcement reading system
  - Anti-spam
  -

* How do you do this?
  - Build a model of what you are trying to do
  - Test the model
  - Have lots of provisions for "special cases"
  - Preprocess the input
  - Prepare for failure.

* Example: Language Identification
  - saint_french, saint_english
  - single character frequencies, bigrams and trigrams
  - word frequencies
  - Vocabulary

* People are very good at this:
  - words1.txt
  - words2.txt
  - words3.txt
  - CRM114

* Tools
  - regular expressions
  - flex
  - baysian models

* Data sources:
[simsong@Phoenix dfrws] % ls -l  /usr/share/dict/
total 3448
-r--r--r--   1 root  wheel      516 Aug 22  2005 README
-r--r--r--   1 root  wheel      706 Aug 22  2005 connectives
-r--r--r--   1 root  wheel     8640 Aug 22  2005 propernames
-r--r--r--   1 root  wheel  2486825 Aug 22  2005 web2
-r--r--r--   1 root  wheel  1012730 Aug 22  2005 web2a
lrwxr-xr-x   1 root  wheel        4 Mar 16 15:56 words@ -> web2
[simsong@Phoenix dfrws] % 

  - SSA


* Very large (or very many) regular expressions
  - Spamassassin
    /usr/local/share/spamassassin/20_head_tests.cf
    /usr/local/share/spamassassin/20_phrases.cf
    /usr/local/share/spamassassin/30_text_fr.cf
  - What SBook does.
   ~/slg/sbook/libsbook/firstname.fp
   ~/slg/sbook/libsbook/parse_address.fp

* Semantic analysis
  - Wordnet http://wordnet.princeton.edu/

* Getting a corpus
  - It's hard!

Side by side:
http://www.lofficier.com/saint7.html