CSO Newsletters
What is a CSO?

Tag this story:

delicious

digg

reddit

Home > Archives > February 2007 >

The Book on Amazon

Can you trust the giant retailer—or any Web-based service—with your information storage and computing tasks?

By Simson Garfinkel

E-mail this article  |  Printer friendly



Amazon.com wants to sell your organization a whole lot more than books, music and electronics. Amazon, the Seattle-based e-commerce giant, wants to rent your organization storage space for your mission-critical data and virtual machines for doing your information processing. The offerings are enterprise-quality, and the prices are astonishingly low. But is it safe to trust your business to Amazon's infrastructure?

These days it's common for businesses to host their websites and e-commerce systems at colocation facilities. And increasingly much of this equipment is outsourced as well. Although many businesses still like to buy their own servers, disk arrays, load balancers and firewalls, it's much more economical to rent a few dedicated servers at a service provider and let somebody else worry about the plumbing. ISPs get economies of scale by managing hundreds or thousands of identical machines, while the customer can concentrate on building a high-quality website.

Elastic Computing

Amazon's new Elastic Compute Cloud (EC2) Web service takes this idea of hosted servers to a new level. Instead of renting physical servers on a month-by-month basis, Amazon is now renting virtual computers by the hour—10 cents an hour, to be exact. That 10 cents gets you the equivalent of a 1.7GHz Xenon processor with 1.25GB of RAM and 160GB of hard drives. Bandwidth is 250Mbps, at the cost of 20 cents per gigabyte transferred.

One reason that EC2 is so cheap is that the virtual servers can crash at any time and they aren't backed up. If you want to build a reliable system, you need to do it yourself by renting multiple servers and fashioning them into a redundant cluster. This approach provides not just redundancy but scalability. For example, you might build a little e-commerce website with two Web servers and two database engines. If you notice that your site is more popular on weekdays than on weekends, you might bring up an extra two or three servers on Mondays and shut them down on Friday afternoons. By forcing the customer to address the issue of backup and scaling directly, EC2 lowers costs for both Amazon and the customer alike.

Simple Storage

Applications that need more than 160GB of storage should use Amazon's Simple Storage Service (S3). With S3, data is stored redundantly on multiple computers at multiple data centers around the world. Information can be stored with HTTP "PUT" commands and downloaded with HTTP "GET". The cost to store data is 15 cents per gigabyte per month, with an added bandwidth cost of 20 cents for every gigabyte of data that's uploaded or downloaded. Fortunately, there is no cost to move data between EC2 and S3. According to Amazon, you can store an "unlimited" amount of information with S3, which basically means that Amazon can buy disks faster than your organization can fill them.

Lately I've been doing a lot of research in computer forensics. My database is roughly 1,000GB in size, and my last experiment took four weeks of computer time to execute on a single computer. With Amazon's Web Services I can store my data in multiple data centers for just $150 a month. Instead of spending four weeks to run an experiment, I can instantiate 28 virtual machines and run the experiment in a day for $67.20. Or I can instantiate 168 machines and run the experiment in four hours for that same $67.20.

But before you turn your business over to Amazon, there are a lot of questions that you need to consider. Are EC2 and S3 just toys, or are they reliable enough for production systems? What is the chance that an EC2 virtual machine will be taken over or shut down by a hacker? How secure is the information stored in S3—who can access it, and who can change it? And what is Amazon's commitment to these services? Most of these questions, it turns out, have something to do with security.

I've been working with EC2 and S3 daily and think that the service is reliable enough for me to start on the process of moving much of my research from computers that I own to virtual machines that I'm renting from Amazon on an as-needed basis. But I don't think that EC2 and S3 are yet providing what's required to service corporate customers.

To use EC2 you create a disk image of a Linux server. This image is digitally signed, encrypted, split into pieces and stored in S3 using tools that Amazon provides. You can instantiate a virtual machine with a remote procedure call to the Amazon Web Services (AWS). Ten minutes later the machine is running; another remote procedure call will give you its IP address.

Keys to Security

There are two complementary systems for access control. First, since you create the image for the virtual machine, you can determine the accounts that it will have when it starts up. Amazon provides tools that make it easy to create a public/private key pair, restricting the machine so that it can be accessed only by someone with the matching private key. The second access control is the EC2 firewall, which runs in Amazon's network. Using your private key, you can send digitally signed messages to the firewall, telling it to open up particular ports between your virtual machine and the rest of the Internet. Digital signatures are also used to sign commands sent to the S3 storage system, although these signatures can be written with an HMAC algorithm (a type of message authentication code), making them very fast indeed.

Despite this use of public key cryptography, authorization is one of the weakest parts of the system. To use EC2 or S3 you need to create an Amazon Web Services account, which is just a standard Amazon account that has been "enabled" for Web services. You then log in to the AWS website and download a public key that's used to identify yourself to the AWS system, and a private key that's used by your code to digitally sign all requests. AWS uses HMAC for its signatures, so writing them is very fast.

Unfortunately, while the keys are long enough to be cryptographically secure, they are fundamentally only as strong as your Amazon password that's used to generate them. Organizations that are serious about AWS should create AWS accounts that aren't used for anything else and protect them with very long passwords. But that's not good enough, because Amazon allows passwords to be reset by clicking a link that says "Forgot your password? Click here." In practice, anybody who can receive mail at the e-mail address registered for an AWS account can commandeer all of the AWS services associated with that account. Amazon will have to address this failing before a business can make a serious commitment to AWS.

Privacy of stored data is another concern. The S3 system stores information in "buckets." Each S3 bucket can have an access control list, allowing its contents to be public or restricted to particular Amazon IDs. Fundamentally, though, if you are storing information in S3 that isn't meant to be public, you should use encryption to enforce that policy. And since there is no significant overhead for using a strong encryption algorithm like AES-256, there is no good reason not to use encryption to protect private information stored in S3. Applications that connect to S3 can also use SSL, although this probably isn't necessary for applications running on EC2.

The second problem with AWS is the lack of a service-level agreement (SLA)—a formal commitment in which Amazon pledges that data stored in S3 won't be accidentally deleted, that the network will remain available and will have good bandwidth.

"We work extremely hard to avoid data loss due to an error at Amazon; our software is built to avoid single points of failure. However, we don't maintain backups that we can restore from," said Andrew Herdener, an Amazon representative, in response to a question I sent him by e-mail. "This is a business that we're committed to, and they are services that Amazon itself depends on for its own business."

If I were developing an e-commerce site that depended on AWS, I would want an SLA that clearly stated Amazon's long-term commitment to these offerings. I would also want a provision that required Amazon to give me written notice should it plan to terminate the service, as well as provisions for an orderly transition to another provider.

Because Amazon won't make any of those commitments, I'm living with the risk. I can mitigate that risk a bit by keeping a backup of everything I store at Amazon either at my office or at another storage provider.

Businesses that store their own data and provide their own computational resources also have issues with the reliability of their storage and the resiliency of their internal service offerings, of course. Outsourced services like EC2 and S3 force an organization to confront these risks directly, rather than sweeping them under the carpet.

Simson Garfinkel, CISSP, is researching computer forensics and human thought at Harvard University. Send feedback to machineshop@cxo.com.


Add a Comment:

Your comment will be displayed at the bottom of this page, at the discretion of CSOonline.

* Name:

* Title:

* Corp:

* E-mail:

* Subject:

* Your Comment:

 
* Required fields.

We do not post comments promoting products or services.
Comments are owned by whomever posted them. CSO is not responsible for what they say.
Selected comments may be published in CSO magazine.
We will not sell your personal information.
We do display your name, title, and corporation but not your e-mail address.



Ads by TechWordsSee Your Link Here

Sponsored Links:

May 2006 cover

Subscribe to CSO Magazine

Free Subscription
Our print publication is free to qualified readers in the U.S. and Canada. US and Canada residents can apply online.

Paid Subscription
If you live outside the US or Canada or do not qualify for a free subscription print out this form.

Sponsored content

All White Papers

All Podcasts

Sponsored Podcasts

All Webcasts

All Partner Domains


advertisement