The most common yet fatal Big Data mistakes
Although more and more companies and institutions are embracing the *new* big data technology, the adoption has just started. This post explains the typical mistakes corporations usually do when it comes to implement and use big data
“I got a hammer, everything looks like a nail”
Not every possible problem related to data can be and should necessarily be a big data problem. The hype enthusiasm makes people try to solve everything with Hadoop & Co., move everything to NoSQL data bases and stop thinking in RDBs. In the end, every traditional BI engineer, DB administrator, and similar folks likes to be labeled, to be perceived by the management as being working with the cutting-edge technology: “We are doing big data”.
“Big Brother is watching you”
The V that really unlocks the potential of Big Data stands for “variety”… If you play it properly, you’ll be in a position of combining more data sources than you ever could have dreamed… and if you apply the proper data science methods to exploit the inherent relationships all across, your data are going to reveal you unprecedented insights that you can use for various purposes…
Armed with these new information discovery weapons you are very likely to end up knowing “too much” or going “too far” and invading the sacred realm of your customers or prospects’ privacy.
Nowadays when the regulations around personal data protection and privacy are being substantially toughened, a scandal related to misuse of personal information might drastically damage the image of any company… be it substantiated or just based on mere perception
To be clear on where to draw the line within this grey zone we recommend an excellent book written by K. Davis and D. Patterson: Ethics of Big Data: Balancing Risk and Innovation
“Let me kill each and every flea with my sledgehammer”
The fleas that have been on your business dog so far are still there… You knew how to solve them, you got your recipes and they worked nicely in most cases… Your Oracle DB, your MySQL or even your MS Access have been with you all along your journey. There is no reason why you should give them your back and try to write a map-reduce job for each particular problem in your day-to-day business.
Understanding what big data technologies are good for and where the traditional data warehouse and business intelligence technologies are a much better fit is just a must for the sustainability of your data operations
“Victim of a new buzzword”
Big data sounds attractive, captivating… the idea of using data to better support the decision making process is nothing really new, but is perceived by the majority as an unfinished business. Big is per se great… the more, the better… the more data I can leverage to gain insights, the better.
The problem is that the more people with different levels of understanding talk about the topic, the more confusing it becomes and the more difficult the implementation strategy. The hype requires investing time in educating your management, your stakeholders and all the people around you. How to effectively articulate this education process is going to be subject of another post.
“Big Data = Starting Big?”
Due to the almost unlimited possibilities of Big Data, people try to think big, which is kind of a pre-requisite to do it properly… but thinking big doesn’t imply to having to start big. Principia parva sunt and starting too big might result into a big disappointment rather. In the cloud computing time, with so many on-demand big data infra-structure suppliers and so good and mature open source big data software available in the market, the decision about acquiring own hardware and making a significant investment in a big data commercial vendor doesn’t have to be taken at an early stage (or ever). If you start small you are managing the risks right!
“When 1+1 might not exactly be 2″
The inherent Pareto principle approach to Big Data compromises the 100% accuracy for the sake of speed, versatility and variety in high volume data environments. This compromise introduces a new level of complexity related to providing a framework that explains the accuracy of the findings on the data. Just a few companies are in a position of managing a soft-computing environment and, even more important, explaining in plain English why it happens and why it has to be dealt with
“Forget the good old time”
The level of maturity of the traditional relational technologies cannot be compared to the emerging Big Data ones. While you had self-documenting data bases where just by clicking on one button you could get the schema, a nice E-R diagram, usage statistics, audit trails information, etc and a set of features you are used to have, now you need to do a lot of work on your own. It requires a severe discipline and well-defined and streamlined processes to prevent it from getting out of hand. Being aware of the long journey the relational technologies have been gone through and identifying the “delta” regarding the current status of the big-data related ones will help you stop the big-data time going by right after it started
Being aware of these 7 common mistakes will save you a lot of time and tears all along your Big Data Journey