Big Data explained in 10 movies
My father is an undisputed film lover and made my sister and me love the seventh art already in the early years of my life
He quite often resorted to movie scenes to teach and explain us how to deal with certain situations in life.
In this post I will try to adopt my father’s approach to explain big data
Actually V stands for much more than just Vendetta in the Big Data world. V is the most popular letter because of the famous V-Trinity:
V for Velocity -(near) real time processing saying good-bye to these old “Let’s see if the report is done when I come back from my vacation” days.
V for Volume -Big amount of data and still the ability to exploit and extract the inherent intelligence-
V for Variety -Or the usage and combination of all kind of data sources (structure, semi-structured and completely unstructured)-
Imagine you could exploit your data to understand very fast whether something is going to be successful or going to fail… without having to heavily invest, you could embrace a data driven culture to fail sooner or to confirm sooner that you are on the right track. Big Data enables the measuring and the performance management for each and every initiative a business undertakes. Fail fast or you are gonna get furious! Making your data talk keeps your eyes open and the Highest Paid Person’s mouth closed.
The Gold Rush
(1925 – Charles Chaplin)
The new oil, the data… so often said and yet so true. Yet the journey many companies need to go through to unlock the intrinsic potential in the data and monetize them properly is everything but a path of roses. Depending on the corporate culture imbibing a data driven mindset could be as hard as the time Chaplin the tramp spent in Alaska –who doesn’t remember the scene with him eating his own shoes?-. To make this journey easier, it important to avoid the common yet fatal mistakes in the Big Data adoption
One of the most moving movies from the Pixar studios shows us how much fun you can have up there in the clouds. Exactly, the Volume aspect of Big Data is nicely addressed with the elastic cloud infra-structure. If you need to care about hardware and scaling, you are driving your attention to the enabling capability, not to the capability itself… The Big Data technologies with the map-reduce paradigm do exactly that: taking the IT issues off the table and nicely run on different cloud infrastructures like the Elastic Map Reduce in Amazon or the one from Joyent
The elephant Man
(1980 David Lynch)
It’s time to introduce the elephant in the Big Data room… it’s yellow and has a name: “Hadoop” (named after the toy of Doug Cutting’s son, the creator). What started as a Google Project was brought to the open source community and it’s now a corner stone in the big data foundations. The technology stack offers a lot of powerful applications to make the most of Hadoop, like Apache Mahout -for machine learning- and Apache Hive -which offers the capabilities of a data warehouse on top of Hadoop- and works nicely with other NoSQL databases like MongoDB, etc
(1997 James Cameron)
A decision without having modeled and analysed the uncertainty of what was not exactly visible ended up in a tragic outcome. Big Data is about giving you what you see and more important, what you don’t see in your data… The patterns, the connections, the hidden correlations, the black swans, the outliers… and it increases and evolves with every new data source you plug in. The iceberg has always been around, now it’s time to look under the water… and it might change the perception you’ve already had about the part you could see.
In the so called Precrime department Anderton (Tom Cruise) acts on those crimes that have been predicted to happen but stopping the criminals before they kill their victims. The “killer application” of Big Data is the so called predictive analytics or the capability of forecast events or trends based on the intelligence inferred out of the data. A variety of machine learning and data mining techniques analysing the present and historical events, incorporating further data sources to the analysis and making predictions about the future is not new, yet can be taken to the next level within the Big Data puzzle
No country for old men
(2007 E. & J. Coen)
Big data is about a new generation of skills combined together. The old database men need to recycle themselves and embrace the new data storing and processing technologies. Additionally, a mandatory step consist of opening up to the online age: Big Data is about data variety and more and more data sources are made available over the internet via APIs or SPARQL end points. To leverage them the right skills are required: Python, PHP, Java, etc…
(2003 – Tim Burton)
Apart from the word „Big“ and from the fact that we are talking about a big fish to catch here, this Tim Burston’s master piece pictured the astonishing story of Ed Bloom’s life told from his deathbed. As the story was told, Ed’s son couldn’t stop reproaching his father for mixing up passages he imagined with the factual reality. Big Data is so far like Ed’s story… you can’t really distinguish what’s reality in that and what is a fairy tale. And this is ok like that, as today’s fairy tale might become reality tomorrow. Borrowing the words of Randy Pausch “If you refuse to dream it, you won’t do it”
(2010 – Darren Aronofsky)
A Black Swan is more than the Swanensee role Natalie Portman got an Oscar for. This is also a theory created by Nassim Nicholas Taleb which explains how rare events are hard to predict -non-computational problem- yet might have a huge impact on the history -. Swans were by definition white until an observation -in Australia- contradicted it. Big Data will deliver you unprecedented insights on you data and your business, yet you’ll might to face black swans and you need how to deal with or at least set the right expectations.
I hope you enjoyed this post… I guess you know more movies that tell us about Big Data
Note about the pictures in this post: these pictures are not hosted on this site, they are just links to existing pictures in the media archive of imdb.com. As stated in the imdb how to link I’m providing the attribution here: