September 2003 CSO Magazine



 

Ruling over unruly programs.
And why theoretical security is theoretically impossible

BY SIMSON GARFINKEL


TROJAN HORSES. Keyboard loggers. Viruses. Bad insiders. Bad outsiders. Evil-doers. Perforated firewalls. Corrupt backups. Spam. A few years ago, many security professionals I knew looked forward to the day when the majority of the world's computer security problems were worked out. Back then, we thought that improving security was just a question of deploying technology, providing training and getting people to follow the appropriate procedure. But a look at computer science theory proves otherwise.

A fundamental goal of computer security has been to put some sort of restrictions on program execution. For example, worms and viruses wouldn't be a problem if they didn't damage our files, reformat our hard drives and e-mail themselves to everybody in our address books. If there were just some way we could stop the bad programs from doing what we don't want, without affecting the execution of the good programs, our problem would be solved.

But to stop the bad programs, we're going to need some way of distinguishing the bad from the good. That is, we're going to need a pro-gram-analyzing program that can look at any given suspect program and determine if it has any hostile code. If it doesn't, then the suspect application is safe to run. Security is simple.

But, as it turns out, writing such a program-analyzing program is theoretically impossible.

The impossibility stems not from legal issues—such as determining whether a programmer had "criminal intent"—but from a technical conundrum. The mathematics of computing make it impossible to write software that can figure out what other programs can do, prior to execution.

To be sure, it is possible to write programs that can examine simple programs and determine what they do. But there's the problem: In principle, there is no way to examine a program's instructions and figure out precisely what it does. That's because it is too complicated. In principle, running the software is the only way to discover what it does. And once a hostile program is run, it's too late: The damage has been done.

That's why today's antivirus systems don't try to predict what an infected program will do. Instead, these systems have a list of "signatures," or sequences of bytes, that have been observed in existing viruses. If the copy of Microsoft Word that's about to be run contains the same copy of bytes that were discovered years ago in the Michelangelo virus, for example, then that copy of Word is deemed infected and is not permitted to run.

But let's say that your copy of Word is infected by a virus that's never been seen before—then you're out of luck. You can analyze the copy of Word to see if it has instructions for formatting the computer's hard drive, for example. But if you can't find them, that still doesn't prove that the program won't format your hard drive—computer programs can modify themselves when they run and add these commands to themselves. It's this ability of computer programs to modify themselves that makes their behavior mathematically impossible to predict.

British mathematician Alan Turing proved the impossibility of predicting program actions back in 1936. Turing looked at the simplest kind of hostile program and proved that it is impossible to tell in advance if a program will go into an infinite loop, or stop. Today we call this the Halting Problem. It is unsolvable.

The more you think about the Halting Problem, the stranger it seems. That's because every program will eventually halt or run forever. And since computers are deterministic machines that do exactly as they are programmed, it seems that it should be possible to analyze a program and determine in advance what that program will do when it is run. But Turing showed that you can't—no matter what kind of computer you have, no matter how fast it is, no matter how much storage the computer has, or no matter how long you let your computer program run. What you can do is examine a given program and tell if it is the same as other programs that run forever without stopping. But you can't recognize new programs that run forever.

It's trivial to extend Turing's work to any other kind of computer security problem involving hostile code: You can't examine a copy of Word and determine if a back door has been placed in the program to surreptitiously e-mail copies of your confidential documents to Argentina. And you can't prove that your copy of Windows isn't silently recording every password that you type for playback at a later time.

One favored approach lately to "solving" the computer security problem of desktop computers is to gimmick their operating systems so that they will run only the programs that have been digitally signed by publishers such as Microsoft and Adobe. But Turing's work tells us that these digitally signed programs may nevertheless have exploitable security bugs: It's theoretically impossible to prove that they don't.

Just about the only way to take back computer security from the morass that Turing created is to restrict what computer programs can do—that is, make computers less general-purpose. This was one of the fundamental ideas behind JavaScript; because JavaScript lacked commands for writing or erasing files on the user's hard drive, there was, in theory, no way for a JavaScript program to perform those malicious actions.

Just about the only way to take back computer security from the morass that Alan Turing created is to restrict what computer programs can do—that is, make computers less general-purpose.
But it takes surprisingly little power and flexibility to tip the scales and make a program's behavior unpredictable. JavaScript's ability to manipulate the contents of a webpage after they're loaded—then reinterpret the page as a new program—has caused countless problems for Hotmail, Yahoo and practically every website that lets visitors post raw HTML.

Turing's theorem isn't the only quandary that theoretical computer scientists have thrown at security professionals. Another one is the fundamental question of whether computers can solve problems that are truly complex—that is, complex in a theoretical sense—by short-circuiting the underlying mathematics that make these problems hard. Mathematicians have worked on this problem for more than 30 years, but so far nobody knows the answer.

These hard problems are all in a mathematical family called NP, and they share a common peculiar property: The problems are hard to solve, but once you find an answer it's trivial to prove that it is the correct one. Code-breaking is one of those problems. If the only way you have to crack an encrypted message is by trying every key, it might take a hundred billion years for you to try them all and find the right one. But once you have the right key, it takes less than a fraction of a second to prove that the key is the correct key: All you do is decrypt the message.

There are many of these NP problems: scheduling speakers and rooms at a conference, for example, or seating women at a dinner party so that nobody is sitting next to an ex-boyfriend. For every case, the only way we know to solve these problems is through brute-force search—that is, you have to try every combination until you find one that works.

The strange thing about these NP problems is that mathematicians have never formally proven that brute-force search is the only attack that works: They just haven't found any other way to solve these problems. But in 1973 two mathematicians, Stephen Cook and Leonid Levin, made a simultaneous discovery: If a fast way is ever found to solve a particular kind of NP problem, a problem that's so-called NP complete, then that result could be used to solve every NP problem.

Some mathematicians claim that such a result might be the end of cryptography. In fact, it might not. The difficulty of factoring large numbers is the basis of the RSA encryption algorithm. For years, factoring was thought to be an NP problem, but two years ago some clever mathematicians found a way to short-circuit the math. But as it turns out, the short-circuit technique isn't very fast, and the security of RSA is safe—at least for now.

If mathematicians ever find a way to solve an NP-complete problem, and it's possible that somebody might, then this knowledge could be used to reverse-engineer practically every encryption scheme that's ever been devised. Whether those attacks would be practical, no one knows. But never ignore the possibility that all of the science of cryptography could suddenly come tumbling down. Is such a result likely? Of course not. But it is possible.

So the next time you feel overwhelmed by the responsibilities that come with being a security officer or consultant, take solace in the fact that the problems you are trying to solve are fundamentally unsolvable. Given those odds, any progress you can make is probably worthwhile.end


Simson Garfinkel, CISSP, is a technology writer based in the Boston area. He is also CTO of Sandstorm Enterprises, an information warfare software company. He can be reached at machineshop@cxo.com.

> Between the Pipes



ILLUSTRATION BY ANASTASIA VASILAKIS




Most Recent Responses:

It was Fred Cohen who first coined the term "virus" in 1984 and showed that determining whether or not a given program is a virus is undecidable, that is, equivalent to the Halting Problem.

Cohen saw that one implication of this result is that virus detection is an endless arms race. Viruses are free to mutate into an infinite variety of functionally equivalent forms, whereas the process of establishing their equivalence is undecidable.

We've had this result in front of us for 20 years now. It has always seemed bizarre to me that so much of our focus should therefore be on this futile exercise of closing the barn door after the horse has gone. Surely it makes more sense to design systems based on accepted security principles which reduce the opportunity for infection and contain its effects.

These principles, too, have been in the published literature for decades, and most operating systems have implemented them with demonstrable success. The total number of identified viruses for all commercial Unix variants is reported as being about five. For Linux and MacOS, about 40 each. For Microsoft Windows, about 60,000. Those who prefer statistics to more rigorous methods should find these numbers significant.

Good system design goes a long way toward being able to reason about security. If each system component is configured to be secure by default, then each configuration change which might weaken security becomes explicit.

The principle of least privilege is an extension of this concept. In effect, it states that not all configuration changes can be exercised by all parts of the system. This principle has been with us since the early batch processing systems of the late 1950s. A virus is simply benign data, except on a system which grants it the privilege of being executed. The simple rule for system designers: don't build systems which uncritically treat data as executable content.

Security is a subject which touches almost everything, and I don't mean to offer an exhaustive analysis of its principles here, only to point out that established principles do exist and are demonstrably effective when applied.

I do want to address one very curious point raised in the article about digital signatures. Of course it's true that merely signing a block of data does nothing to transform the data itself into something whose analysis is no longer constrained by the Halting Problem. Nobody would suggest that digital signatures have this kind of supernatural power. What a signature can do, however, is guarantee that the data came from a known source. Whether you choose to trust that source or not is another matter.

The point about digital signatures is that they allow this kind of informed choice. Therefore, from a security perspective they have great utility. Whether or not they actually improve security depends again on system design. If the choice of accepting a digital signature is not made by you but internally by the system, then from your legitimate perspective it has violated both the principles of security by default and least privilege. Such a system cannot be secure.

That's the concern about digital signatures, that they can equally be used to give us more security or to take it out of our hands. Once again, the real concern is not with signatures or viruses, but with system design.

Dan Razzell
Starfish Systems
Print

What Turing says is that there is no guru program that can say for ARBITRARY program if that arbitrary program will stop (or what is equivalent will execute malignant instruction). Well, for arbitrary program this is true, but nobody cares about arbitrary program. Everybody will be quite satisfied to predict whether the code will be malignant for the programs that comes with her e-mail. The theorem of Turing does not prohibit the existence of programs that analyze a class of programs in order to say if these programs contain malignant code.

The problem of checking malignancy of the code is of course NP hard. Recently, it was shown that almost all NP hard problems has three areas: An easy part in which the problem can be solved with few attempts, an easy part where the problem can be shown to be without solution also with few attempts and a difficult part, that really requires to test a lot of possible solutions. Usually, the difficult part is small compared to the easy parts. After all, almost everybody can schedule speakers and rooms at a conference in a reasonable way. Nobody will try 20! combinations (that is some 2,400,000,000,000,000,000 combinations) it order to put 20 speakers in 20 rooms.

Applying this to the problem of malignant code, there would be a lot of code easy to proof that is malignant, a lot of code that is easy to proof that it is not malignant and code that is hard to analyze.

The history of computing is very rich of problems that are NP hard, but nevertheless solved to some extend. One example near your is the wiring of the motherboard of your computer. Of course the wiring of your board is not optimal, but nevertheless works. So probably the analyzing software will work in a way and will detect the vast majority of the malignant code.

Kostadin Koruchev
EPS, UAM
Email
Print

Computer scientists apparently are still stuck in the binary mode of unsolvable vs solvable. Traditional mathematicians have achieved many results on the distribution of prime numbers and other statistical facts about the mathematical structures that make up the foundations of computer science.

Why can't these statistical facts be used to develop a theory that predicts *how likely* a program of a given size will be to contain an unrecognized security defect. Sure, it'll be complicated, but it's a lot better than giving up and saying "unsolvable"!

Cheers, - Andrew

Andrew Naur
CTO
Metanet Research
Email
Print

So many Security Officers, Security Engineers, and Consultants are frustrated by the widespread security challenges and their ability to protect their systems. I agree with Simson who says we should take notice of our small successes in face of such an unsolvable problem.

Mike Gibbons
Managing Director
BearingPoint
Email
Print

Add a Comment: Your comment will be displayed at the bottom of this page, at the discretion of CSOonline.

Name:
Title:
Corp:
Email:
Subject *
Your Comment: *

* Required fields.
We do not post comments promoting products or services.
Comments are owned by whomever posted them. CSO is not responsible for what they say.
Selected comments may be published in CSO magazine.
We will neither sell nor display your personal information.







All content copyright CXO Media Inc., 1994-2002. All rights are reserved. No material may be reproduced electronically or in print without written permission from CXO Media, 492 Old Connecticut Path, Framingham, MA 01701.

Dated: September 2003


http://www.csoonline.com/read/090103/shop.html