I recently was blessed with my first grand child. Hard to believe but in the first day of her life she generated 70 times more information than contained in the Library of Congress (think photos, sonograms, Facebook postings, medical records, etc.).
We are awash in data. In 2010, when Eric Schmidt was retiring as CEO of Google he observed that “Every two days now we create as much information as we did from the dawn of civilization up until 2003”. In 2013, scientists calculated that over 90% of the world’s data had been generated in the last two years. We are all contributing. I started this blog about a year ago and it uses a platform called WordPress. In 2012, there were 72 million websites on WordPress. By the start of 2014, there were 86 million.
So what has a cute little elephant and cloud got to do with all this? All this information is a treasure trove to business, government, medicine, universities and almost any other person or organization. The key is how to unlock those insights. That’s where this cute little guy comes in. The elephant’s name is Hadoop, which is also the name of a software program suite. The inventor named it after his son’s favorite toy elephant. It was developed at Yahoo and then given to the world as open source code.
Hadoop cleverly approaches a vast mountain of data and breaks it down into many pieces that can be worked on in parallel with many computer processors. You need the algorithm and you need lots of computers but only for as long as you need to run the program through the data.
The result: answers to questions that query all of the data in reasonable timeframes. Here are some profit and loss impacts. Kroger uses data from its loyalty card to drive $12B more in annual sales. UPS uses data from 4 billion shipments per year on almost 100,000 vehicles to save over 39 million gallons of fuel and avoided driving 364 million miles. Red Roof Inn identified a new marketing approach just from examining government weather and flight data. This led to generating a 10% increase in revenue.
Convinced? How about a few closer to home? Walgreens is using 7.5 billion medical events for 100 million people to identify missed prescription refills to help patients stay on track with their health plan. Kaiser Permanente is using the data from its 9 million member health care records to improve overall health and well-being. One analysis revealed that a group of women taking certain oral contraceptives increased their chance of blood clots by 77%.
How about something a little less dramatic? Delta determined from its vast passenger data how to address the issue of lost bags and increase customer satisfaction. There are a multitude of examples and their impact is staggering.
But why cloud? Easy – you need lots of computing horsepower but for relatively limited amounts of time. As you refine your queries, you also need to “play” with the horsepower you have deployed, be iterative and run alternatives. Also, when your teams really understand the power of this kind of analytics, you’ll have lots of different teams clamoring for compute resources. Would your really want to buy thousand of servers to run an analysis and then leave them idle? Or worse, have teams wait their turn until others are done? Here is a little table that I think carries the argument.
Forecasts prove this out with Cisco estimating that 69% of the world’s data will be in Cloud Data centers. Is your firm riding our elephant buddy? No? What trends could you be missing?
Note: Apache, Apache Hadoop, Hadoop, and the yellow elephant logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.