It’s common knowledge that Facebook runs Hadoop. The largest Hadoop cluster on the planet.
Here are some stats, courtesy of HighScalability, which scraped them from twitter during the Velocity conference:
- 6 billion mobile messages every 30 minutes
- 3.8 trillion cache operations in 30 minutes
- 160m newsfeeds, 5bln realtime msgs, 10bln profile pics, 108 bln queries on mysql, all in 30 minutes
Now, some questions of interest:
- How close is the typical enterprise to that level of scale?
- How likely is it that a typical enterprise would be able to take advantage of such scale to improve their core business, assuming reasonable time and money budgets?
Let’s say you are CIO of a $500M financial services company. Let’s suppose that you make an average of $10 per business transaction; and further suppose that each business transaction requires 24 database operations, including queries and updates.
At that rate, you’d run 50M*24 = about 1.2B database transactions … per year.
Scroll back up. What does Facebook do? 3.8B in 30 minutes. Whereas 1.2B per year works out to be about 68,000 in 30 minutes. Facebook does 55,000 times as many database transactions as the hypothetical financial services company.
Now, let me repeat those questions:
- If you run that hypothetical company, do you need Hadoop?
- If you had Hadoop, would you be able to drive enough data through it to justify the effort of adoption?