BY SIMSON GARFINKEL
TROJAN HORSES. Keyboard loggers. Viruses. Bad insiders. Bad
outsiders. Evil-doers. Perforated firewalls. Corrupt backups. Spam. A
few years ago, many security professionals I knew looked forward to the
day when the majority of the world's computer security problems were
worked out. Back then, we thought that improving security was just a
question of deploying technology, providing training and getting people
to follow the appropriate procedure. But a look at computer science
theory proves otherwise.
A fundamental goal of computer
security has been to put some sort of restrictions on program
execution. For example, worms and viruses wouldn't be a problem if they
didn't damage our files, reformat our hard drives and e-mail themselves
to everybody in our address books. If there were just some way we could
stop the bad programs from doing what we don't want, without affecting
the execution of the good programs, our problem would be solved.
But to stop the bad programs, we're
going to need some way of distinguishing the bad from the good. That
is, we're going to need a pro-gram-analyzing program that can look at
any given suspect program and determine if it has any hostile code. If
it doesn't, then the suspect application is safe to run. Security is
simple.
But, as it turns out, writing such a program-analyzing program is theoretically impossible.
The impossibility stems not from
legal issues—such as determining whether a programmer had "criminal
intent"—but from a technical conundrum. The mathematics of computing
make it impossible to write software that can figure out what other
programs can do, prior to execution.
To be sure, it is possible to write programs that can examine
simple programs and determine what they do. But there's the problem: In
principle, there is no way to examine a program's instructions and
figure out precisely what it does. That's because it is too
complicated. In principle, running the software is the only way to
discover what it does. And once a hostile program is run, it's too
late: The damage has been done.
That's why today's antivirus systems
don't try to predict what an infected program will do. Instead, these
systems have a list of "signatures," or sequences of bytes, that have
been observed in existing viruses. If the copy of Microsoft Word that's
about to be run contains the same copy of bytes that were discovered
years ago in the Michelangelo virus, for example, then that copy of Word is deemed infected and is not permitted to run.
But let's say that your copy of Word is infected by a virus
that's never been seen before—then you're out of luck. You can analyze
the copy of Word to see if it has instructions for formatting the
computer's hard drive, for example. But if you can't find them, that
still doesn't prove that the program won't format your hard
drive—computer programs can modify themselves when they run and add
these commands to themselves. It's this ability of computer programs to
modify themselves that makes their behavior mathematically impossible
to predict.
British mathematician Alan Turing
proved the impossibility of predicting program actions back in 1936.
Turing looked at the simplest kind of hostile program and proved that
it is impossible to tell in advance if a program will go into an
infinite loop, or stop. Today we call this the Halting Problem. It is
unsolvable.
The more you think about the Halting
Problem, the stranger it seems. That's because every program will
eventually halt or run forever. And since computers are deterministic
machines that do exactly as they are programmed, it seems that it
should be possible to analyze a program and determine in advance what
that program will do when it is run. But Turing showed that you
can't—no matter what kind of computer you have, no matter how fast it
is, no matter how much storage the computer has, or no matter how long
you let your computer program run. What you can do is examine a given
program and tell if it is the same as other programs that run forever
without stopping. But you can't recognize new programs that run forever.
It's trivial to extend Turing's work
to any other kind of computer security problem involving hostile code:
You can't examine a copy of Word and determine if a back door
has been placed in the program to surreptitiously e-mail copies of your
confidential documents to Argentina. And you can't prove that your copy
of Windows isn't silently recording every password that you type for
playback at a later time.
One favored approach lately to
"solving" the computer security problem of desktop computers is to
gimmick their operating systems so that they will run only the programs
that have been digitally signed by publishers such as Microsoft and
Adobe. But Turing's work tells us that these digitally signed programs
may nevertheless have exploitable security bugs: It's theoretically
impossible to prove that they don't.
Just about the only way to take back
computer security from the morass that Turing created is to restrict
what computer programs can do—that is, make computers less
general-purpose. This was one of the fundamental ideas behind
JavaScript; because JavaScript lacked commands for writing or erasing
files on the user's hard drive, there was, in theory, no way for a
JavaScript program to perform those malicious actions.
Just about the only way to take back computer security from the morass
that Alan Turing created is to restrict what computer programs can
do—that is, make computers less general-purpose.
|
But it takes surprisingly little
power and flexibility to tip the scales and make a program's behavior
unpredictable. JavaScript's ability to manipulate the contents of a
webpage after they're loaded—then reinterpret the page as a new
program—has caused countless problems for Hotmail, Yahoo and
practically every website that lets visitors post raw HTML.
Turing's theorem isn't the only
quandary that theoretical computer scientists have thrown at security
professionals. Another one is the fundamental question of whether
computers can solve problems that are truly complex—that is, complex in
a theoretical sense—by short-circuiting the underlying mathematics that
make these problems hard. Mathematicians have worked on this problem
for more than 30 years, but so far nobody knows the answer.
These hard problems are all in a
mathematical family called NP, and they share a common peculiar
property: The problems are hard to solve, but once you find an answer
it's trivial to prove that it is the correct one. Code-breaking is one
of those problems. If the only way you have to crack an encrypted
message is by trying every key, it might take a hundred billion years
for you to try them all and find the right one. But once you have the
right key, it takes less than a fraction of a second to prove that the
key is the correct key: All you do is decrypt the message.
There are many of these NP problems:
scheduling speakers and rooms at a conference, for example, or seating
women at a dinner party so that nobody is sitting next to an
ex-boyfriend. For every case, the only way we know to solve these
problems is through brute-force search—that is, you have to try every
combination until you find one that works.
The strange thing about these NP
problems is that mathematicians have never formally proven that
brute-force search is the only attack that works: They just haven't
found any other way to solve these problems. But in 1973 two
mathematicians, Stephen Cook and Leonid Levin, made a simultaneous
discovery: If a fast way is ever found to solve a particular kind of NP
problem, a problem that's so-called NP complete, then that result could
be used to solve every NP problem.
Some mathematicians claim that such
a result might be the end of cryptography. In fact, it might not. The
difficulty of factoring large numbers is the basis of the RSA encryption
algorithm. For years, factoring was thought to be an NP problem, but
two years ago some clever mathematicians found a way to short-circuit
the math. But as it turns out, the short-circuit technique isn't very
fast, and the security of RSA is safe—at least for now.
If mathematicians ever find a way to
solve an NP-complete problem, and it's possible that somebody might,
then this knowledge could be used to reverse-engineer practically every
encryption scheme that's ever been devised. Whether those attacks would
be practical, no one knows. But never ignore the possibility that all
of the science of cryptography could suddenly come tumbling down. Is such a result likely? Of course not. But it is possible.
So the next time you feel
overwhelmed by the responsibilities that come with being a security
officer or consultant, take solace in the fact that the problems you
are trying to solve are fundamentally unsolvable. Given those odds, any
progress you can make is probably worthwhile.
Simson Garfinkel, CISSP, is a technology writer based in the Boston
area. He is also CTO of Sandstorm Enterprises, an information warfare
software company. He can be reached at machineshop@cxo.com.
> Between the Pipes
ILLUSTRATION BY ANASTASIA VASILAKIS
Most Recent Responses:
It was Fred Cohen who first coined the term "virus" in 1984 and showed
that determining whether or not a given program is a virus is
undecidable, that is, equivalent to the Halting Problem. Cohen
saw that one implication of this result is that virus detection is an
endless arms race. Viruses are free to mutate into an infinite variety
of functionally equivalent forms, whereas the process of establishing
their equivalence is undecidable. We've
had this result in front of us for 20 years now. It has always seemed
bizarre to me that so much of our focus should therefore be on this
futile exercise of closing the barn door after the horse has gone.
Surely it makes more sense to design systems based on accepted security
principles which reduce the opportunity for infection and contain its
effects. These
principles, too, have been in the published literature for decades, and
most operating systems have implemented them with demonstrable success.
The total number of identified viruses for all commercial Unix variants
is reported as being about five. For Linux and MacOS, about 40 each.
For Microsoft Windows, about 60,000. Those who prefer statistics to
more rigorous methods should find these numbers significant. Good
system design goes a long way toward being able to reason about
security. If each system component is configured to be secure by
default, then each configuration change which might weaken security
becomes explicit. The
principle of least privilege is an extension of this concept. In
effect, it states that not all configuration changes can be exercised
by all parts of the system. This principle has been with us since the
early batch processing systems of the late 1950s. A virus is simply
benign data, except on a system which grants it the privilege of being
executed. The simple rule for system designers: don't build systems
which uncritically treat data as executable content. Security
is a subject which touches almost everything, and I don't mean to offer
an exhaustive analysis of its principles here, only to point out that
established principles do exist and are demonstrably effective when
applied. I
do want to address one very curious point raised in the article about
digital signatures. Of course it's true that merely signing a block of
data does nothing to transform the data itself into something whose
analysis is no longer constrained by the Halting Problem. Nobody would
suggest that digital signatures have this kind of supernatural power.
What a signature can do, however, is guarantee that the data came from
a known source. Whether you choose to trust that source or not is
another matter. The
point about digital signatures is that they allow this kind of informed
choice. Therefore, from a security perspective they have great utility.
Whether or not they actually improve security depends again on system
design. If the choice of accepting a digital signature is not made by
you but internally by the system, then from your legitimate perspective
it has violated both the principles of security by default and least
privilege. Such a system cannot be secure. That's
the concern about digital signatures, that they can equally be used to
give us more security or to take it out of our hands. Once again, the
real concern is not with signatures or viruses, but with system design.
Dan Razzell
Starfish Systems
Print
What Turing says is that there is no guru program that can say
for ARBITRARY program if that arbitrary program will stop (or
what is equivalent will execute malignant instruction).
Well, for arbitrary program this is true, but nobody cares about
arbitrary program.
Everybody will be quite satisfied to predict whether
the code will be malignant for the programs that comes with her e-mail.
The theorem of Turing does not prohibit the existence of programs
that analyze a class of programs in order to say if these programs
contain malignant code.The problem of checking malignancy of the code is of course NP hard.
Recently, it was shown that almost all NP hard problems
has three areas:
An easy part in which the problem can be solved
with few attempts, an easy part where the problem
can be shown to be without solution also with few attempts
and a difficult part, that really requires to test
a lot of possible solutions.
Usually, the difficult part is small compared to the easy parts.
After all, almost everybody can schedule speakers and
rooms at a conference in a reasonable way.
Nobody will try 20! combinations
(that is some 2,400,000,000,000,000,000 combinations)
it order to put 20 speakers in 20 rooms. Applying this to the problem of malignant code, there would be
a lot of code easy to proof that is malignant, a lot of code
that is easy to proof that it is not malignant and code
that is hard to analyze. The history of computing is very rich of problems that are
NP hard, but nevertheless solved to some extend. One example near
your is the wiring of the motherboard of your computer. Of course
the wiring of your board is not optimal, but nevertheless works.
So probably the analyzing software will work in a way and
will detect the vast majority of the malignant code.
Kostadin Koruchev
EPS, UAM
Email
Print
Computer scientists apparently are still stuck in the binary mode of
unsolvable vs solvable. Traditional mathematicians have achieved many
results on the distribution of prime numbers and other statistical
facts about the mathematical structures that make up the foundations of
computer science.Why
can't these statistical facts be used to develop a theory that predicts
*how likely* a program of a given size will be to contain an
unrecognized security defect. Sure, it'll be complicated, but it's a
lot better than giving up and saying "unsolvable"! Cheers,
- Andrew
Andrew Naur
CTO
Metanet Research
Email
Print
So many Security Officers, Security Engineers, and Consultants are
frustrated by the widespread security challenges and their ability to
protect their systems. I agree with Simson who says we should take
notice of our small successes in face of such an unsolvable problem.
Mike Gibbons
Managing Director
BearingPoint
Email
Print
|