The word "large" is meaningless

From Simson Garfinkel
Jump to navigation Jump to search

I frequently get emails such as this:

I am a researcher who is working in the area of computer forensics. I am looking for a large dataset of disk images. Can you help me?

The problem with requests such as this is that the word large is context dependent. My large may not be your large. In fact, it probably isn't.

The people who write these email messages have a definite size in mind when they write them. Clearly "large" is something that is on the upper bounds of what they can handle. When I get an email like this, I have no idea what the person is talking about. That's because I work with many people who use data sets of many different sizes.

  • To the graduate student with a desktop computer that has 4TB of storage, large may mean a set of 20 100GB disk images.
  • To the law enforcement agency, "large" may mean a case that involves 10 servers, each with 50-100TB of storage.
  • To an astronomer working with a large experiment, large may be 1000TB of data that has to be streamed then discarded.

The meaningless of the word large shows up in other places. For example, when people talk about a large amount of money, or a large number of books, or even a lot of dirt. It's a meaningless word for people who aren't working with you on a daily basis. That's why it's important to use specific quantities:

  • I'm looking for 10-20 disk images that are 100GB to 200GB in size uncompressed.
  • I'm looking for 10-20 disk images that, when compressed, will fit on a 1TB disk drive (because that's all the storage I have for this project).
  • I am looking for $10,000 for a research project.
  • I am looking for $1,000,000 for a research project.
  • We have 10 tones of dirt available for pickup.
  • We have 1000 tons of dirt available for pickup.

Please don't use the word large.