WebBook: An Address book for the World Wide Web
By Simson L. Garfinkel
WebBook is a free-format, multi-user address book that is written
designed to be used with the World Wide Web. WebBook is written in
perl, and includes some neat pattern-matching technology to
automatically recognize email addresses and URLs in entries and
properly format them. It is also quite fast.
In order to run WebBook, you need a copy of Perl, a Web browser
that can handle forms, and a web server on which you can run perl
scripts. If you don't have a web server, take heart: at the end of
this article, I'll show you had to write a simple one using perl.
MOTIVATION
I've been interested in computerized address books ever since the fall
of 1984, when I lost the beautiful leather-bound addres book that I
had been given for my 13th birthday and used for more than six years.
Suddenly, I was without all of the names, addresses and phone numbers
of all my friends, classmates, and family. But I was also without
directions to people's houses, bank account numbers, and recepies for
my favorite desserts, becuase I had been keeping that information in
my address book as well.
Starting over, I realized that the logical solution was to computerize
my address book to prevent a future catastrophy. But every
computerized address book I looked at had a similar failing: They were
all designed to store names, addresses and phone numbers, and little
more. Although that might work for a sales executive who merely needs
a way of organizing potential leads, it doesn't work for me, and it
doesn't work for most other people.
My solution was a free-format database kept in a word processor.
To seperate each entry, I used a row of equal signs. My text editor's
string-search capability worked well to locate the names and addresses
of friends.
Over the years, my address book grew. I wrote a series of emacs macros
for quickly navigating through the sea of names. I wrote a program for
finding names and printing mailing envelopes. I finally wrote a
full-blown application using NeXTSTEP. Called SBook (and now sold by
Sarrus Software of Burlingame, Calif.) many people think that it is
one of the best address books ever written. There's just one problem:
it doesn't run on computers that anybody uses anymore.
WebBook is the most recent incarnation of my address book. Because the
address book is written in perl and runs on a web server, it avoids
the biggest problem of SBook: platform dependence. Instead, it uses a
Web browser as a generic graphical output device; HTML is the
presentation language, and HTTP is the API. This puts a whole new spin
on the concept of "client/server," and I'm expecting to see a lot of
programs adopting this strategy in the near future. (I've already
written a set of CGI scripts for performing Unix system administration
using many of the same ideas.) WebBook also demonstrates many of the
techniques that I've learned over the past year for creating managable
perl/CGI scripts.
The full WebBook program is over 1000 lines of perl and contains many
features, such as the ability of handling multiple address books, and
even password-protection of entries. Rather than print the full
WebBook here, I've created a cut-down demonstration version. If you
want the full WebBook, you can download it from my Web server,
http://vineyard.net/simson/WebBook/faq.html.
BUILD FOR DEBUGGING
The most difficult thing about creating CGI programs is debugging
them. I got a head start by using the perl cgi-lib library developed
by Steven E. Brenner at Cambridge University. Brenner's library
handles communication between the web server and your CGI script,
automatically unescapes all variables and stuffs them into a perl
associative array, and gives your perl scripts a uniform way of
handling both GET and POST requests. You can get more information
about cgi-lib from http://www.bio.cam.ac.uk/web/form.html. I've
included a subset of cgi-lib in the version of WebBook that is
presented here.
WebBook is written as a single perl script. Arguments are probided by
the standard CGI interface. A special argument called "action" is used
to specify which action the user wishes to perform; if no action is
specified, WebBook displays a welcome message, prints some information
about the database on the server, and displays a search form. If
WebBook is asked to perform an action that it doesn't understand, an
error message is returned to the Web browser.
One of the problems that I've found with perl is that it's easy to get
lost within a single perl script: what gets executed and what doesn't?
When perl starts up, the entire file gets parsed, then execution
starts at the first line and moves down. A lot of perl programmers
freely mix global variable assignments, executable code, and function
definitions. Unfortunately, unlike the C programming language, global
variables assignments are performed at run-time, rather than compile
time, which means that a global assignment deep within the file will
not be executed until control passes to that point.
A simple way that I've found to get around this problem is to place
all of my global variable assignments at the top of my perl program,
and then to execute this block of code, which runs the subroutine
called main:
# Define globals
$database = "WebBook.data";
$max_entry = 450;
# Get Started:
if (!$standalone) {
print "Content-type:text/html\n\n";
&main;
exit(0);
}
Another advantage to this strategy, as we shall see, is that it makes
it easy to incorporate WebBook into other perl scripts.
DECIDING WHAT TO DO
Once the CGI script starts up, it needs to decide what to do. With
most CGI scripts, this isn't a problem, becuase they just do one or
two things---such as incrementing and displaying a counter, or
displaying a form and then performing the requested action. But CGI
scripts such as WebBook that can perform many different actions
require a means to tell the script which action to perform. In my
programming, I use a variable called "action" which is set either by a
hidden tag in a CGI form, or as part of a URL. I frequently display
HTML pages that contain multiple forms and multiple submit buttons,
all of which invoke the same CGI script but with different values for
the "action" parameter.
When an action can be executed more than one different way (for example, a search for a
name, or a full-text search), I use additional variables. For example,
WebBook has two basic actions--- "search" which causes the program to
look for matching entries in the database, and "new" which causes the
program to create a new entry. The search action further has two
sub-actions, "find name" which does a search on the name field, and
"full-text" which searches the full text of the entries.
The following snippit of HTML creates three buttons which can invoke
these three choices:
Inside the &main subroutine, the following bit of perl code receives
the action and dispatches it to the appropriate perl subroutine:
sub get_command {
if(!&ReadParse(*input)) {
$action = $input{'action'};
$name = $input{'name'};
}
#
# Decode the action
#
if($action eq "search"){ &do_search($name); return; }
if($action eq "edit"){ &do_edit($name); return; }
if($action eq "save-entry"){ &do_save_entry; return; }
if($action eq "delete-entry"){ &do_delete_entry; return; }
# Default - display an info message
&do_info;
}
BUILDING THE DATABASE
WebBook keeps its database in a perl DBM file. This feature allows you
to create an associative array whose contents are automatically stored
in a DBM file. (Perl5 includes a more general "tie" function which
allows you to bind an associative array with any kind of database
back-end.) The file is opened with the statement:
dbmopen(%DB,$database,0666);
An associative array consists of a set of (key,value) pairs. WebBook
uses the keys of the associative array to store the name of each
person in the database; the person's entry is stored in the value. For
example, my WebBook entry might have an element in the associative
array with the key of "Simson L. Garfinkel" and the value "PO Box
4188\n10 Spring Street\nVineyard Haven, MA 02568".
SEARCHING AND DISPLAYING
The heart of WebBook is searching for names. This is implemented with
the function do_search. This function calls the display_search_field
function to display the search field, conducts the search with the
find_names function, then creates the appropriate HTML to display the
results.
Names are displayed in an unnumbered list. Each names is displayed as
a link; click on the name to edit the entry. Note that we need to
escape spaces in names to "+" characters. (Other special characters,
such as plus signs, quotes and special characters should be escaped as
well.) The code which does this is quite simple:
$escaped_name = $_;
$escaped_name =~ s/ /+/g;
print "$_
\n";
As the text of the entries are displayed, WebBook automatically
escapes the less-than characters (so that they will not be interperted
as HTML tags). It then catches the URLs and displays them as real
links, and catches the email addresses and displays them as mailto:
URLs. Finally, it turns the newlines into
tags.
The code that implements these substitutions use perl's pattern
matching and substitution capabilities. These particular perl features
are unparalled in most other computer languages today, which is one of
the reason that perl is so well-suited to building CGI scripts:
# Now escape the various things in HTML tags
$ent =~ s/&/&/g;
$ent =~ s/</g;
# Note: the following four substitutions must be done in order
# 1. Catch the URLs
$ent =~ s#((http:|mailto:|ftp:)//[^ \n\t]*)#$1#g;
# 2. Catch the email addresses
$ent =~ s#([a-z0-9_.]*@[a-z0-9_.]*)#$1#gi;
# 3. Now change newlines to
's
$ent =~ s/\n/
\n/g;
EDITING
When the user clicks on a name, the WebBook CGI is run with
the action variable set to "edit" and the name variable set to be the
name of the entry that is being edited. The do_edit subroutine merely
displays the entry name is a text field and the entry text inside a
textarea. The actual editing is done by the user's own browser. When
the user is finished editing, she presses the "save" button.
Becuase the user might change the name of the entry during the
editing, the edit form needs to send the CGI script both the new name
(in the name field) and the old name (in the old-name field). These
values are both read by the do_save_entry subroutine.
The do_save_entry checks to make sure that $entry isn't too big.
(Perl4 dbm files don't always work if the value is larger than 450
characters; perl5 overcomes this problem.) The subroutine then deletes
the old entry and creates the new one.
Notice that there is no command for creating a new entry. That's
because the &do_edit function is used to create new entries---do_edit
is simply invoked to edit the entry with the name "".
GOTCHA!
Well, that's enough of WebBook to get you going. If you type in this
script, put it on your server, and try to run it, it probably won't
work. Too see why not, run the Unix tail command on your web server's
error log:
vineyard.net% tail -f /usr/local/etc/httpd/logs/error_log
No write permission to ndbm file at /usr/local/etc/httpd/cgi-bin/wb/WebBook line 265.
The problem is that most web servers run CGI scripts as user "nobody"
(UID -2). If your server is configured properly, user "nobody"
shouldn't have access to create or modify files that are stored in the
cgi-bin directory.
There are a variety of ways around this problem. One is to make the
perl script SUID to a user specifically created for maintaining the
database. Another approach is to specify a $database file that is
stored in a directory other than the perl script. Both of these are
good ideas. Implementing them is left as an exercise. For testing, you
can set the $database file to be "/tmp/database".
Correct your permissions problem and try again.
BUILD YOUR OWN WEB SERVER
The biggest stumbling block to using WebBook, I discovered, was the
speed. Every time the CGI script was run, the web server had to start
up a copy of perl and perl had to read and compile the WebBook program
before WebBook could even start running. On my NeXTstation, that comes
to 1.5 seconds total.
I tried playing around with perl's undump facility. This lets you
"compile" perl by dumping a core file and the processing the file with
the unix undump program. Unfortunately, undump is not a standard part
of Unix, and I couldn't get it to work on my operating system.
So I took another approach: I wrote my own webserver.
It turns out that writing a Web server is rather trivial. A simple web
server just listens on a port, reads the second word from the first
line, and then performs the requested action. A simple (but very
insecure) web server can be written in just three lines:
#!/bin/sh
read a b c
cat $b
This works becuase of the simplicity of the HTTP protocol. When a
WebBrowser (in this case, Netscape Navigator 1.12 on a macintosh)
connects to a web server, it sends through something like the
following lines:
GET /index.html HTTP/1.0
User-Agent: Mozilla/1.12(Macintosh; I; 68K)
Accept: */*
Accept: image/gif
Accept: image/x-xbitmap
Accept: image/jpeg
Where "/index.html" is the name of the URL that is being requested ---
the slash that follows the hostname and everything following.
In the case of the Vineyard.NET web site, the web server might send
back the following:
HTTP/1.0 200 Document follows
Date: Sat, 30 Dec 1995 ? GMT
Server: NCSA/1.4.2
Content-type: text/html
Last-modified: Wed, 27 Dec 1995 ? GMT
Content-length: 6153
Followed by the document itself.
This being the case, it seemed to me that it would be a simple thing
to write a wrapper for WebBook's &main subroutine which listened for
HTTP requests on a pre-determined port (other than port 80),
interperted the URL, set up the appropriate environment variables, ran
&main, and then waiting for the next request.
In fact, this is what the perl script "cgi-server" does (listing 2).
When run, this script takes as arguments the number of a port on which
to listen and the name of a script to run. The script binds to the
port and awaits HTTP connections. Each time it receives a connection,
it forks off a child process which receives the HTML command (and any
POST information), sets up the appropriate environment variables, and
runs &main. You may wish to add more error checking or clean up the
perl code.
Is cgi-server worth it? Absolutely. Running WebBook from the web
server, the time for each command to be processed on my server is
roughly 3 seconds. With it, the response time is nearly instantanious.
CONCLUSION
The World Wide Web is making it possible to write a new generation of
client-server programs. These program suse a web browser as a generic
client, and use HTML as a generic "presentation layer," using HTML to
specify the creatin of buttons, text fields, clickable links, and
other widgets.
One of the comments I've heard about WebBook is that all of the work
that I am painstakingly doing on the server can be done better on the
client using Java. Well, that might be true. On the other hand, not
every web browser supports Java; even people who use Lynx can access
a WebBook database. On the other hand, I can certainly see many ways
in which a Java WebBook client could complement the WebBook server,
such as performing searches on the end-users machine, or even giving
the user a better text editor than the one that's built into their web
browser.
There are other things that WebBook can use as well, such
as the ability to store multiple files on a server, the ability to
import and exprt data, and even some security features. (After all,
you probably don't want anybody on the Internet to be able to learn
your Aunt Mimi's fax number.) The good news is that those features are
already in the full-blown version of WebBook that's on my web site. If
you want to see it, just look at the WebBook FAQ,
http://vineyard.net/simson/WebBook/faq.html.
-30-