WebBook: An Address book for the World Wide Web By Simson L. Garfinkel WebBook is a free-format, multi-user address book that is written designed to be used with the World Wide Web. WebBook is written in perl, and includes some neat pattern-matching technology to automatically recognize email addresses and URLs in entries and properly format them. It is also quite fast. In order to run WebBook, you need a copy of Perl, a Web browser that can handle forms, and a web server on which you can run perl scripts. If you don't have a web server, take heart: at the end of this article, I'll show you had to write a simple one using perl. MOTIVATION I've been interested in computerized address books ever since the fall of 1984, when I lost the beautiful leather-bound addres book that I had been given for my 13th birthday and used for more than six years. Suddenly, I was without all of the names, addresses and phone numbers of all my friends, classmates, and family. But I was also without directions to people's houses, bank account numbers, and recepies for my favorite desserts, becuase I had been keeping that information in my address book as well. Starting over, I realized that the logical solution was to computerize my address book to prevent a future catastrophy. But every computerized address book I looked at had a similar failing: They were all designed to store names, addresses and phone numbers, and little more. Although that might work for a sales executive who merely needs a way of organizing potential leads, it doesn't work for me, and it doesn't work for most other people. My solution was a free-format database kept in a word processor. To seperate each entry, I used a row of equal signs. My text editor's string-search capability worked well to locate the names and addresses of friends. Over the years, my address book grew. I wrote a series of emacs macros for quickly navigating through the sea of names. I wrote a program for finding names and printing mailing envelopes. I finally wrote a full-blown application using NeXTSTEP. Called SBook (and now sold by Sarrus Software of Burlingame, Calif.) many people think that it is one of the best address books ever written. There's just one problem: it doesn't run on computers that anybody uses anymore. WebBook is the most recent incarnation of my address book. Because the address book is written in perl and runs on a web server, it avoids the biggest problem of SBook: platform dependence. Instead, it uses a Web browser as a generic graphical output device; HTML is the presentation language, and HTTP is the API. This puts a whole new spin on the concept of "client/server," and I'm expecting to see a lot of programs adopting this strategy in the near future. (I've already written a set of CGI scripts for performing Unix system administration using many of the same ideas.) WebBook also demonstrates many of the techniques that I've learned over the past year for creating managable perl/CGI scripts. The full WebBook program is over 1000 lines of perl and contains many features, such as the ability of handling multiple address books, and even password-protection of entries. Rather than print the full WebBook here, I've created a cut-down demonstration version. If you want the full WebBook, you can download it from my Web server, http://vineyard.net/simson/WebBook/faq.html. BUILD FOR DEBUGGING The most difficult thing about creating CGI programs is debugging them. I got a head start by using the perl cgi-lib library developed by Steven E. Brenner at Cambridge University. Brenner's library handles communication between the web server and your CGI script, automatically unescapes all variables and stuffs them into a perl associative array, and gives your perl scripts a uniform way of handling both GET and POST requests. You can get more information about cgi-lib from http://www.bio.cam.ac.uk/web/form.html. I've included a subset of cgi-lib in the version of WebBook that is presented here. WebBook is written as a single perl script. Arguments are probided by the standard CGI interface. A special argument called "action" is used to specify which action the user wishes to perform; if no action is specified, WebBook displays a welcome message, prints some information about the database on the server, and displays a search form. If WebBook is asked to perform an action that it doesn't understand, an error message is returned to the Web browser. One of the problems that I've found with perl is that it's easy to get lost within a single perl script: what gets executed and what doesn't? When perl starts up, the entire file gets parsed, then execution starts at the first line and moves down. A lot of perl programmers freely mix global variable assignments, executable code, and function definitions. Unfortunately, unlike the C programming language, global variables assignments are performed at run-time, rather than compile time, which means that a global assignment deep within the file will not be executed until control passes to that point. A simple way that I've found to get around this problem is to place all of my global variable assignments at the top of my perl program, and then to execute this block of code, which runs the subroutine called main: # Define globals $database = "WebBook.data"; $max_entry = 450; # Get Started: if (!$standalone) { print "Content-type:text/html\n\n"; &main; exit(0); } Another advantage to this strategy, as we shall see, is that it makes it easy to incorporate WebBook into other perl scripts. DECIDING WHAT TO DO Once the CGI script starts up, it needs to decide what to do. With most CGI scripts, this isn't a problem, becuase they just do one or two things---such as incrementing and displaying a counter, or displaying a form and then performing the requested action. But CGI scripts such as WebBook that can perform many different actions require a means to tell the script which action to perform. In my programming, I use a variable called "action" which is set either by a hidden tag in a CGI form, or as part of a URL. I frequently display HTML pages that contain multiple forms and multiple submit buttons, all of which invoke the same CGI script but with different values for the "action" parameter. When an action can be executed more than one different way (for example, a search for a name, or a full-text search), I use additional variables. For example, WebBook has two basic actions--- "search" which causes the program to look for matching entries in the database, and "new" which causes the program to create a new entry. The search action further has two sub-actions, "find name" which does a search on the name field, and "full-text" which searches the full text of the entries. The following snippit of HTML creates three buttons which can invoke these three choices:
Search for:

Alternatively, you may create a
Inside the &main subroutine, the following bit of perl code receives the action and dispatches it to the appropriate perl subroutine: sub get_command { if(!&ReadParse(*input)) { $action = $input{'action'}; $name = $input{'name'}; } # # Decode the action # if($action eq "search"){ &do_search($name); return; } if($action eq "edit"){ &do_edit($name); return; } if($action eq "save-entry"){ &do_save_entry; return; } if($action eq "delete-entry"){ &do_delete_entry; return; } # Default - display an info message &do_info; } BUILDING THE DATABASE WebBook keeps its database in a perl DBM file. This feature allows you to create an associative array whose contents are automatically stored in a DBM file. (Perl5 includes a more general "tie" function which allows you to bind an associative array with any kind of database back-end.) The file is opened with the statement: dbmopen(%DB,$database,0666); An associative array consists of a set of (key,value) pairs. WebBook uses the keys of the associative array to store the name of each person in the database; the person's entry is stored in the value. For example, my WebBook entry might have an element in the associative array with the key of "Simson L. Garfinkel" and the value "PO Box 4188\n10 Spring Street\nVineyard Haven, MA 02568". SEARCHING AND DISPLAYING The heart of WebBook is searching for names. This is implemented with the function do_search. This function calls the display_search_field function to display the search field, conducts the search with the find_names function, then creates the appropriate HTML to display the results. Names are displayed in an unnumbered list. Each names is displayed as a link; click on the name to edit the entry. Note that we need to escape spaces in names to "+" characters. (Other special characters, such as plus signs, quotes and special characters should be escaped as well.) The code which does this is quite simple: $escaped_name = $_; $escaped_name =~ s/ /+/g; print "
  • $_
    \n"; As the text of the entries are displayed, WebBook automatically escapes the less-than characters (so that they will not be interperted as HTML tags). It then catches the URLs and displays them as real links, and catches the email addresses and displays them as mailto: URLs. Finally, it turns the newlines into
    tags. The code that implements these substitutions use perl's pattern matching and substitution capabilities. These particular perl features are unparalled in most other computer languages today, which is one of the reason that perl is so well-suited to building CGI scripts: # Now escape the various things in HTML tags $ent =~ s/&/&/g; $ent =~ s/$1#g; # 2. Catch the email addresses $ent =~ s#([a-z0-9_.]*@[a-z0-9_.]*)#$1#gi; # 3. Now change newlines to
    's $ent =~ s/\n/
    \n/g; EDITING When the user clicks on a name, the WebBook CGI is run with the action variable set to "edit" and the name variable set to be the name of the entry that is being edited. The do_edit subroutine merely displays the entry name is a text field and the entry text inside a textarea. The actual editing is done by the user's own browser. When the user is finished editing, she presses the "save" button. Becuase the user might change the name of the entry during the editing, the edit form needs to send the CGI script both the new name (in the name field) and the old name (in the old-name field). These values are both read by the do_save_entry subroutine. The do_save_entry checks to make sure that $entry isn't too big. (Perl4 dbm files don't always work if the value is larger than 450 characters; perl5 overcomes this problem.) The subroutine then deletes the old entry and creates the new one. Notice that there is no command for creating a new entry. That's because the &do_edit function is used to create new entries---do_edit is simply invoked to edit the entry with the name "". GOTCHA! Well, that's enough of WebBook to get you going. If you type in this script, put it on your server, and try to run it, it probably won't work. Too see why not, run the Unix tail command on your web server's error log: vineyard.net% tail -f /usr/local/etc/httpd/logs/error_log No write permission to ndbm file at /usr/local/etc/httpd/cgi-bin/wb/WebBook line 265. The problem is that most web servers run CGI scripts as user "nobody" (UID -2). If your server is configured properly, user "nobody" shouldn't have access to create or modify files that are stored in the cgi-bin directory. There are a variety of ways around this problem. One is to make the perl script SUID to a user specifically created for maintaining the database. Another approach is to specify a $database file that is stored in a directory other than the perl script. Both of these are good ideas. Implementing them is left as an exercise. For testing, you can set the $database file to be "/tmp/database". Correct your permissions problem and try again. BUILD YOUR OWN WEB SERVER The biggest stumbling block to using WebBook, I discovered, was the speed. Every time the CGI script was run, the web server had to start up a copy of perl and perl had to read and compile the WebBook program before WebBook could even start running. On my NeXTstation, that comes to 1.5 seconds total. I tried playing around with perl's undump facility. This lets you "compile" perl by dumping a core file and the processing the file with the unix undump program. Unfortunately, undump is not a standard part of Unix, and I couldn't get it to work on my operating system. So I took another approach: I wrote my own webserver. It turns out that writing a Web server is rather trivial. A simple web server just listens on a port, reads the second word from the first line, and then performs the requested action. A simple (but very insecure) web server can be written in just three lines: #!/bin/sh read a b c cat $b This works becuase of the simplicity of the HTTP protocol. When a WebBrowser (in this case, Netscape Navigator 1.12 on a macintosh) connects to a web server, it sends through something like the following lines: GET /index.html HTTP/1.0 User-Agent: Mozilla/1.12(Macintosh; I; 68K) Accept: */* Accept: image/gif Accept: image/x-xbitmap Accept: image/jpeg Where "/index.html" is the name of the URL that is being requested --- the slash that follows the hostname and everything following. In the case of the Vineyard.NET web site, the web server might send back the following: HTTP/1.0 200 Document follows Date: Sat, 30 Dec 1995 ? GMT Server: NCSA/1.4.2 Content-type: text/html Last-modified: Wed, 27 Dec 1995 ? GMT Content-length: 6153 Followed by the document itself. This being the case, it seemed to me that it would be a simple thing to write a wrapper for WebBook's &main subroutine which listened for HTTP requests on a pre-determined port (other than port 80), interperted the URL, set up the appropriate environment variables, ran &main, and then waiting for the next request. In fact, this is what the perl script "cgi-server" does (listing 2). When run, this script takes as arguments the number of a port on which to listen and the name of a script to run. The script binds to the port and awaits HTTP connections. Each time it receives a connection, it forks off a child process which receives the HTML command (and any POST information), sets up the appropriate environment variables, and runs &main. You may wish to add more error checking or clean up the perl code. Is cgi-server worth it? Absolutely. Running WebBook from the web server, the time for each command to be processed on my server is roughly 3 seconds. With it, the response time is nearly instantanious. CONCLUSION The World Wide Web is making it possible to write a new generation of client-server programs. These program suse a web browser as a generic client, and use HTML as a generic "presentation layer," using HTML to specify the creatin of buttons, text fields, clickable links, and other widgets. One of the comments I've heard about WebBook is that all of the work that I am painstakingly doing on the server can be done better on the client using Java. Well, that might be true. On the other hand, not every web browser supports Java; even people who use Lynx can access a WebBook database. On the other hand, I can certainly see many ways in which a Java WebBook client could complement the WebBook server, such as performing searches on the end-users machine, or even giving the user a better text editor than the one that's built into their web browser. There are other things that WebBook can use as well, such as the ability to store multiple files on a server, the ability to import and exprt data, and even some security features. (After all, you probably don't want anybody on the Internet to be able to learn your Aunt Mimi's fax number.) The good news is that those features are already in the full-blown version of WebBook that's on my web site. If you want to see it, just look at the WebBook FAQ, http://vineyard.net/simson/WebBook/faq.html. -30-