Naval Postgraduate School
Fall 2008
Mon Nov 10, 2008
Optical Character Recognition
This is an online class. We will be meeting in cyberspace, and we will all be contributing.
In today's class we will be learning everything that we can about the
current state of optical character recognition and its relationship to
DOMEX activities.
There are some specific problems that I would like to see answered by literature searches and, if possible, downloadable code demonstrations:
These OCR problems should be solved:
- Given a page, figure out which way is up.
- Straighten the text.
- Turn the image of the text into ASCII text.
- Turn the image of the text into Unicode text in a given language.
Here are some problems that I'd like to know if we have answers to. Is the technology this good, or are these research problems?
- Given an image of the page, can you determine what language the page is?
- Given a photograph of a sign, can you find the sign and OCR it?
And some practical problems:
- What is the current state of open source OCR? What packages are best? Can we try them out?
- What are the commercial OCR packages? What platforms do they run on? What are their APIs? Can we get demos?