WIRED 1.6 Electrosphere

The Dean of Disaster

Plane crashes, nuclear reactor accidents, explosions at chemical
plants - if computers were at fault, Peter Neumann knows all about
it.

By Simson L. Garfinkel


He's a fantastic storyteller, he's always ready with a pun, and he
can play two recorders at once - simultaneously piping out both
melody and accompaniment - while he beats the rhythm with his foot.
But to hundreds of thousands of people around the world, Peter G.
Neumann is best known for moderating RISKS-Forum, one of the
Internet's most widely read electronic forums.

What are computer RISKS? Any use of computers that might accidentally
lead to loss of life, property, or money. They are dangers as simple
as sending credit card numbers by e-mail (which could bounce into
unauthorized hands) and as deadly as bugs in medical equipment.
Disasters are a mainstay, including numerous plane crashes, nuclear
reactor accidents, and explosions at chemical plants - all brought
about, in part, by faulty computer systems.

Over the years, the readers of RISKS have cast a wide net, sending
contributions to Neumann on everything from space missions that
have been scrubbed because of typos to the risks of remote-control
garage-door openers and answering machines. RISKS readers are big
on privacy: Some of the earliest descriptions of the National
Security Agency's (NSA) proposed Clipper encryption chip appeared
in RISKS-Forum, quickly followed by technical, social, and political
discussions about the dangers posed by the government-sponsored
encryption standard.

Unlike other online forums, RISKS maintains a consistently high
level of discussion and a low level of noise. "It's a forum of
discussion that doesn't just run wild and rampant," says Dorothy
Denning, chair of computer science at Georgetown University. Equally
impressive is the number of postings from highly respected members
of the computer science community. "It's a very good source of
information," Denning says.

Besides the mailing list, Neumann edits the journal Software
Engineering Notes and has a monthly column on the last page of
Communications of the Association for Computing Machinery, the
journal of the ACM. He's also putting the finishing touches on a
book about software safety and risks. Its tentative title? "RISKS:
The Book - as opposed to RISKS the movie and RISKS the game,"
Neumann jokes.

Neumann got his start with computers in 1953 as an undergraduate
at Harvard. There he worked on the Harvard Mark I - the same computer
that was incapacitated by the first "bug" (a moth that flew into
a relay). After earning a doctorate in applied mathematics at
Harvard and a doctorate from the Technische Hochschule in Darmstadt,
Germany, he headed Bell Labs's participation in the Multics project
- one of the earliest attempts to build a reliable and secure
computer system.

Working on Multics taught Neumann the futility of building risk-free
systems: Every time he tried to design a system that had no weak
links and no security flaws, new ones would appear.

Today, Neumann is at SRI International's Computer Science Laboratory
in Menlo Park, California, where he has worked on numerous projects
for government and industry. Despite his work in software safety,
Neumann says that music is his life's great passion: In addition
to playing piano, bassoon, and recorders, Neumann sings madrigals
and is a trustee of the Greenwood Music Camp in Cummington,
Massachusetts.

The RISKS mailing list started in 1985. At the time, some members
of the Association for Computing Machinery's executive council
wanted the ACM to go on record decrying then-President Reagan's
Strategic Defense Initiative, or "Star Wars," as too risky. The
idea didn't go over well with the rest of the members of the council.
As a compromise, ACM's president, Adele Goldberg, asked Neumann to
head the Committee on Computers and Public Policy and create a
public forum for discussing risks to the public caused by the use
of computers. "An online newsgroup seemed like the most effective
way to do that," Neumann recalls. I caught up with Neumann by phone
and e-mail and asked him about his favorite topic.


SG: How many people read RISKS?

PN: I wish I could tell you... It's clearly one of the most widely
read Internet news groups. The answer is probably somewhere around
100,000, but I have no idea. I have no way of guessing. All I know
is that I keep getting mail from people I've never heard of, and
the distribution list keeps growing and growing.

SG: What are the risks of running a large mailing list?

PN: The biggest problem is the barf mail - fielding ten new pieces
of rejected mail every day. Every time I put out an issue (between
two and four times a week), I get six or ten addresses that suddenly
don't work. Some of them work again the next day, most of them just
stop working for periods of time. Then a month later you get an
angry message from somebody asking 'Why am I not getting RISKS?'"

SG: Can we trust computers?

PN: Read my book [which should appear in 1994]. It's very mixed in
its conclusions. It gives a great deal of evidence why you shouldn't
trust computers or the people who work with them, and yet it offers
some hope. If we were able to know in advance what the requirements
were - and we really had them correct, and we were able to design
something that was consistent with those requirements, and we had
really gifted people who could implement the system in such a way
that was consistent with its design, and we had gifted people who
would operate the system, remembering what the original requirements
were, so they wouldn't compromise, and we had a user community that
was fairly intelligent - then we might have a chance at having
computer systems that we might be able to trust. . . . There are
an awful lot of things that can go wrong.

SG: What's your favorite case of something going wrong?

PN: The ARPAnet collapse of 1980. There was a combination of
problems: You had a couple of design flaws, and you had a couple
of dropped bits in the hardware. You wound up with a node contaminating
all of its neighbors. After a few minutes, every node in the entire
network ran out of memory, and it brought the entire network down
to its knees. This is a marvelous example because it shows how one
simple problem can propagate. That case was very similar to the
AT&T collapse of 1990, which had exactly the same mechanism: A bug
caused a control signal to propagate that eventually brought down
every node in the network repeatedly. Both of those cases are
beautiful examples of what can go wrong, because they involve a
confluence of circumstances.

SG: In the first issue of Software Engineering Notes (1976), you
wrote that "the state of the art of software engineering has been
horrendous, but seems to be improving." Do you still think that?

PN: I think that it's still improving, but it hasn't lived up to
expectations. It's very frustrating trying to deal with large
systems. They never seem to come out the way they're supposed to.

SG: Why is software so hard to do right?

PN: Because there are so many things that can go wrong. If we look
at one of the telephone collapses, there was a three- or four-line
code patch that screwed up, and brought down large numbers of
systems, including a number of airports. Everything just closed up
because of one code bug that was installed without adequate testing.
On the other hand, if you try to design something with no weak
links you end up spending an enormous amount of your effort on
redundancy and reliability. There are quite a few systems where
over half of the code is devoted to redundancy management. A lot
of that code never gets run in normal operations, so it is untested.
The more complex the system is, the more likely it is to fail.

SG: So it's a Catch-22?

PN: Yes. You put in more complexity trying to add reliability, and
that complexity itself is suspect, and hence more risky.

SG: Should programmers should be licensed?

PN: A chapter in the book addresses that. I'm ambivalent. It's one
of these double-edged swords. The licensing process is often
lowest-common-denominator stuff. In order to get the certification
process through, you end up with the minimum set of skills that
people need to have. And yet, if they are dealing with life-critical
systems, they need to have a tremendous amount of experience,
creativity, imagination, a sense of what won't work, and a conservative
attitude towards development. There is no way you can establish
certification procedures that will ferret out those traits. My
bottom line is that certification procedures would be wonderful if
they could be made to work, but I don't think that they can be made
to work - especially for critical systems.

SG: So what is the answer?

PN: The answer is to try to stick to simple systems. Do things as
reliably as you can. Use intelligent people. You shouldn't have
people with limited experience writing life-critical systems. I
keep trying to put a positive spin on things, yet I'm very frustrated
by the difficulties involved in getting something to work correctly.
I've spent most of my professional career trying to make things
work better. And yet, knowing that people can screw up, and hardware
can screw up, and designs are typically flawed, and implementations
are almost always flawed, leads me to the conclusion that it is a
losing battle. So I'm kind of skeptical of some of the really
critical uses of computers in life-critical situations.


Simson L. Garfinkel (simsong@nextworld. com) is a computer consultant,
science writer, and a senior editor at Nextworld magazine.


Copyright 1993, WIRED Ventures Ltd. All Rights Reserved. For complete
copyright information, please see the notice in the 'Welcome to
WIRED' folder.


Transmitted:  94-04-18 17:11:04 EDT