Machine Shop
Keeping Secrets Secret
New approaches to protecting data at rest (and avoiding the wrath of your customers)
By Simson Garfinkel
Ever
since California passed SB1386, organization after organization has
disclosed that critical data banks have been compromised by hackers,
couriers or consultants. The causes range from lost backup tapes to
lost laptops to network hacks (you can read the sad litany in "The Five Most Shocking Things About the ChoicePoint Debacle").
What most of these cases have in common is the lack of strong technical
measures to protect data that is by its nature highly sensitive.
From
these and other cases we've learned that many companies seem to believe
they can adequately protect their information with a combination of
locked doors, firewalls and access controls. The problem with this
approach is that attackers can frequently bypass such protective
mechanisms and send raw commands written in the Structured Query
Language directly to your database server. This is called an SQL
injection attack. For example, if you have a table called "Customers,"
the attacker might be able to send through the SQL command "Select *
from Customers," to dump your entire customer database as its reply.
Although
there are numerous proposals for ending these kinds of
attacks—including fancy intrusion detection systems and governors that
limit the amount of data that can be sent to a Web browser in response
to an HTTP request—this column is about a variety of techniques that
have been largely ignored but that show great promise.
All of
the following approaches protect the data in the database against both
outside attackers and malicious insiders. That's because these tactics
work by either eliminating or scrambling sensitive information so that
it no longer poses a security risk.
Option 1: Don't Collect Sensitive Data
The
very best way to provide for the security of a database is to eliminate
the large-scale collection of sensitive information in the first place.
This is apparently less obvious than it should be. For example, many
organizations still routinely collect Social Security numbers (SSNs)—or
even worse, they use SSNs as their own employee or student
identification numbers. Instead of using an identifier that has such a
high potential for credit fraud and identity theft, it's far better for
organizations to create their own randomly assigned 10- or 11-digit
identification number. (And, indeed, any organization that deals with
the public needs to have a provision for randomly generated numbers in
any event, because not everybody has an SSN.)
Option 2: Get Rid of Sensitive Information Fast
For
those who really must store sensitive data, make sure that the
information is erased as soon as possible. For example, in many cases
it is simply unnecessary to retain a customer's credit card number
(CCN) after a transaction has been committed—perhaps you can just keep
the last four digits after 90 days. Those who need CCNs for auditing
purposes may be able to move those numbers to a secondary database
server not connected to the Internet.
Option 3: Split It and/or Scramble It
Secret
sharing, also known as secret splitting, is a clever technique that can
be used to split a piece of confidential information between two or
more parties so that it cannot be reassembled until a minimum number of
those parties participate. With secret splitting you can divide CCNs
among four databases and require that data be retrieved from at least
three of them in order to recreate the CCNs. In the simplest
implementation, a secret is simply split between two databases; both
databases must be consulted to recreate the secret.
Although
secret sharing was invented in 1979 by cryptographer Adi Shamir (the
"S" in the RSA cryptography algorithm), the system was largely an
academic curiosity until recently. With the rise in database break-ins
and mandatory notifications, secret sharing may be looking more
attractive for some applications.
Public-key
cryptography can be layered on top of these databases so that the
baby-sitter's cell phone number can be decoded by Mom and Dad but not
by Uncle Ernie.
Back in 2003 RSA Security introduced a
technology called Nightingale that is supposed to make it dramatically
easier for businesses to integrate secret sharing into already-existing
applications. With Nightingale, a special server holds half of the
secret and the organization's existing database holds the second half.
Secrets such as credit card numbers or cryptographic keys are only
recombined when they are actually needed for use; in other words, call
center reps won't be able to browse through the data on a coffee break.
In
some very special applications it is even possible to use a secret
without putting it back together! This is called split-key
cryptography, and Nightingale supports a version of it as well.
Split-key cryptography is useful in applications where you absolutely,
positively do not wish to have a chance of someone running off with
your encryption key. Instead of reassembling the key to use it, part of
the cryptographic calculation gets run on one computer with part of the
key, then the document gets moved to a second computer where the second
half of the calculation gets done with the second part of the key. This
is pretty complicated stuff, but it's appealing in certain specialized
applications (such as for organizations that want to run a high-value
certification authority).
In many cases information can be
hashed by a one-way function before it's stored in a database. Hashing
data enables it to be used for certain purposes but effectively makes
it impossible to get the data back out.
For example, the Unix
password system uses hashed passwords to increase the operating
system's overall security. Here's how it works: Instead of storing user
names and passwords in the user database, Unix systems store user names
and passwords processed with cryptographically secure one-way hashes
such as MD5. When a person tries to log in to the Unix system, the
operating system takes his password, hashes it and compares the result
to the value stored in the database. If they match, the user is allowed
to log in. But if an attacker breaks into the system and accesses the
database directly, all the attacker gets is the hashes, not the actual
passwords.
A few years ago Peter Wayner, an independent
consultant and author who specializes in cryptographic applications,
came up with a method for using one-way hash functions to protect other
kinds of information stored in a database. For example, a database that
includes the hash of a person's SSN still allows SSNs typed on Web
forms to be validated, but such a database makes it virtually
impossible for the database operator (or hacker) to browse the database
and download a list of names and SSNs. That's because the simple SQL
statement "Select * from Customers" would no longer return the customer
SSNs—it would just return the hashes of those SSNs.
Wayner
calls his approach "translucent databases," and it's good for a lot
more than just storing SSNs. For example, you can use a translucent
database to eliminate phone numbers, e-mail addresses, names, addresses
and other kinds of sensitive information—while still giving people the
ability to look up and use records that contain this information. In
his book, Wayner shows how to use the translucent database technology
to build a baby-sitter matchmaking application. Even though this
database somehow contains a list of young teenage girls who are
spending the evening in expensive houses with otherwise unguarded small
children, the translucent database technology makes it essentially
impossible to dump out that highly sensitive information. Even the data
bank's own operators can't make it reveal its secrets. Public-key
cryptography can be layered on top of these databases so that the
sitter's cell phone number can be decoded by Mom and Dad but not by
Uncle Ernie.
Option 4: Blow It Up
Probably my
favorite system for protecting data in a database against browsing or
large-scale downloading is a system called Vast that was developed at
the Georgia Institute of Technology by David Dagon, Wenke Lee and
Richard Lipton. Vast uses cryptographic techniques to dramatically
increase the size of a database. A 5- or 10-gigabyte database can be
inflated so that it takes 10 or 20 terabytes to store. Individual
records can be accessed relatively quickly, but any attacker attempting
to read all of the data immediately runs into scalability issues. And
downloading random slices of the database won't reveal anything useful,
because as Vast's creators put it, "a secret is broken into shares over
a large file, so that no single portion of the field holds recoverable
information."
The researchers described Vast in a paper called
"Protecting Secret Data from Insider Attacks" presented at the 2005
Financial Cryptography conference. But when I spoke with Dagon, he said
that he was having a hard time finding anybody who was interested in
commercializing the research because the whole idea of storing
gigabytes of data on terabytes of hard drives seemed so wasteful!
People just couldn't seem to understand that the point of Vast is that
the cost of a few dozen hard drives is almost inconsequential compared
to the protection that they can provide against a very common attack.
Option 5: Encrypt Just Part of It
Organizations
that are looking for something that's made it out of the research lab
and into the marketplace would do well to look some of the emerging
column-level encryption solutions, in which some information in the
database gets encrypted while other information is left in the clear.
Column-level solutions are now available for IBM DB2, Oracle, Microsoft
SQL Server and even MySQL. These systems generally rely on either code
within the application or a fancy proxy to encrypt data as it is
written into the database and decrypt it when it is read back out.
Column-level encryption isn't as secure as the other approaches
described in this column because the decryption key is usually embedded
somewhere within the application program or database. But it's
certainly better than having no encryption at all.
Also check out:
Simson
Garfinkel, PhD, CISSP, is spending the year at Harvard University
researching computer forensics and human thought. He can be reached at machineshop@cxo.com.
|
Most Recent Responses:
While I appreciate Dr. Garfinkel's addressing of these important database security issues, I felt my usual indignant frustration when I read security solutions put forth by someone who seems to know very little about databases and how they work or the highly competitive nature of the current business environment in the US. I find the solution of "Blowing up a database" to thwart hackers one of the more bizarre strategies I've seen to date. While I completely agree that many businesses have been playing loose and fast with personal information over the past few years, it's really critical that any IT professional (whether they are application, enterprise, security or an audit professional) remember that without the business there is no need for any information technology at all. I hope in the future to see more "real world" solutions that have been successfully and cost effectively implemented (hopefully with the greatest amount of transparency to the customer). Let's not throw the baby out with the bathwater. N Walton - 15 years database administrator, CISSP, CISA - (Hopefully CISA - I'm waiting on test results from ISACA!!).
Nell Walton
Chief Analyst
Cyrene Technologies LLC
Email
Print