Comments on “Why your company should NOT use Big Data”

Highlights from comments on a provocative article on NOT using Big Data. Big Data can be revolutionary, but it is not a substitute for thinking about the right goals and objectives.

By Gregory Piatetsky, Feb 10, 2014.

The guest post in KDnuggets by Edward Nevraumont, Viewpoint: Why your company should NOT use "Big Data", has generated quite a lot of heat and some light, and here are selected comments from LinkedIn (mainly from Advanced Business Analytics, Data Mining and Predictive Modeling group).

In my opinion, while Big Data is extremely useful for large companies, "smaller" companies should start by thinking carefully about their goals and objectives, and not jump into Big Data hype. Here "smaller" companies are companies with smaller data streams, not fewer employees. Instagram, when it was bought by Facebook for $1B had only about 13 employees, but it had millions of users and a huge data stream.

That said, Big Data does offer amazing new possibilities - think of Google, Facebook, LinkedIn, etc - all these companies are enabled by "Big Data". The word cloud below is generated from the text of this post.

Comments on Viewpoint: Why your company should NOT use Big Data

Vincent Granville, Co-Founder, Big Data Scientist, DSC Network
And the corollary: if small data is better than bigger, the optimum is to work with data sets that are always empty. Gregory, I assume you don't believe in what you wrote, and that it's meant to be sarcastic. The information available in a subset is always smaller than the information found in the bigger set, from an entropy point of view. The problem is extracting it. Restricting yourself to small data is like restricting yourself to energy sources that are easy to exploit, when these sources represent a tiny fraction of the potential.

Crystal N Woods, Intelligence Analyst: transforming data into insights and advantages
The opposite of big is not zero. I'm with Gregory here. Diving into big data when you're not using the data you've already got is like heading for the black diamond runs after figuring out how to strap on skis for the first time... you just know it isn't going to end well.

Nikki Strickland, Product Manager in Marketing & Innovation Gp
Great article. It's makes me think back to the basics - what do you want from your data, what's your data collection plan (how much do you need, how does it need to be structured, segmented to be useful etc) and how will act on the data once you have it? I don't fully agree with the 'always' part of Vincent's comment (although the sentiment behind not restricting yourself to small data makes sense) - a smaller data set that is sensibly structured, robust, respresentative and relevant to what you need can give you much more than a large random data set that vaguely covers the area you are interested in. Agree with Crystal and Gregory - you need to be able to action what you have.

Don Philip Faithful
I personally see big data more for sensory-type applications. For instance, I have been thinking about passing thermal imaging raster differentiation through different algorithmic filters to try to identify the heat-retention characteristics of the underlying objects. But then there is what I might call reasoning-type applications - that might indeed be associated with an enormous amount of data - more focused on the determination of relevance and significance within different contextual realities. Still, I think that different types of data applications tend to be conflated due the sharing of systems; and I only make the distinction for the sake of identifying different applications. So meteorological data might tend to be "big data" in the conventional sense whereas data extracted from case-studies such as forensics might be better regarded as reasoning-data. Nonetheless, reasoning can benefit from large amounts of data depending on the underlying aspects of the data.

Peter Davenport, Manager, Predictive Modelling at Sensis
Thanks Gregory, really excellent article. The typ of hype that is often sold to companies mirrors that of predictive modeling in the early 2000s, segmentation in the 1990s and database marketing slightly earlier than that. Big Data has many really exciting applications in so many areas, but the temptation will always be to sell the idea to sales and marketing departments who have not got their house in order on the basics. The article draws attention to the basics of the sales conversion funnel which is really critical to undertstanding how you can make use of bigdata. As an analyst, I am always looking for that extra 'nugget' of information which will make a poorly preforming model improve, but I normally shudder whenever I hear a marketer using the same terms, because this normally means that they are imagining that some miracle will be performed because "that's what big data does".

John L. Byrne, Environmental Researcher (Data Analyst)
Gregory, Excellent question. So many companies want a ROI to their large data repositories. Big Data = Big Bucks. Sometimes baby steps are required to build, understand, reason and to question on smaller data sets (i.e. internal training and validation of "manageable" data)before the potential of noise over comes the signal, i.e. jumping straight into Terabytes of data. There's an underlying thread running through the comments, Small versus Big data. What constitutes a small, medium and large data set, what type of data are we dealing with and what are peoples' pre-conceptions of "DATA", leading to positive/negative biases towards "the Dataset". Mastering the abilities and limits of your own data analytics is key to increase your overall understanding of what is achievable.

Vijay Gupta
"Big Data" and other buzzwords created by sales/marketing/BI folks have completely bamboozled companies into creating new teams and positions. "Big Data" is not just a large data set. Around 90% of the professionals pitching Big Data and Social Analytics do not have a grasp of this area. Before creating teams or selecting a vendor for these areas, kindly utilize the services of consultants like myself to provide an evaluation of your requirements and whether the proposed solution is optimal for your needs.

Feng (Fred) Xue, Senior Machine Learning Researcher at GE
Great article Gregory. Reinforce the message that you need to know what you are doing in the data analytics space. Data is useful, and can be collected from many channels. The right prioritization of data collection & analytics will indeed give you the proper ROI... and gradually move the organization to the path of analytics driven.

Sanjay Sharma, Principal Architect at Impetus
Nice article around the buzz and reality of need of big data. Being an early starter in Big data and Hadoop and working closely with customers for last 5 years in this space has certainly enabled me in understanding that big data is not a silver bullet to all solutions. The first thing step has to without doubt putting your house in order as far as understanding of data and basic analytics is concerned. Only when the minimal data warehousing and analytics concepts are clear, can we take the next step towards identifying big data opportunities.

Finding ROI in simple analytics is not easy, building a model of big data is still more difficult. Also, the fact that everybody finds it convenient to pretend to be experts in big data/hadoop/data science further undermines the power of big data.

So, inspired by man's journey to reach Mars, lets first learn walking, wheels, start flying, build rockets, fly to moon and then try a mars trip rather then trying a mars landing directly.

Successful companies are reaping the benefit of Big data and so can others if they follow the right path and have the right people!

RAM DK, Asst. Vice President at Scope eKnowledge Center
Interesting article. I think before embarking on a big data initiative the leadership team must get a clear perspective on 2 parameters viz. velocity and veracity of data. If the business doesn't involve a lot of dynamic, real time data and can do with less dynamic data, fine tuning the existing data management and analytics that exist could yield a better ROI. Similarly, often, the quality of data that exists is often missed out. While having 100% data may be impossible and too expensive, ensuring that a high degree of accuracy in terms of data capture, representation etc is prevalent and is part of the DNA of the organisation. In my experience, companies often miss out on this basic step and take the big leap forward only to realise that erroneous data is resulting in awful decision making.

Tony Lange, Vice President - Solutions Architect Cement and APC at Gensym
Very provocative thread this. Very nice, totally rhetorical, simply in light in the absence of more data, knowledge context - LOL, QED !!

IMHO the article is valid, and this thread topic reminds me of "Simpsons Paradox", amongst many others. By the way this topic has been flogged to death in many other similar threads, I this OP is just pressing the same buttons.

But the author of the above referenced article best comment is his tongue-in-cheek, sarcastic last point which is "money is money", and if idiots are going to pay one for doing silly things so be it !!

Brian Feeny, Data Scientist and Big Data Consultant
You have to wonder if we can go a week without an article like "Why is Big Data a lie", "Why you should not use Big Data", "Why Big Data doesn't work", and all the other FUD.

Human Intuition is highly biased, and has been proven again and again wrong by Big Data. So even though you should not solely rely on Big Data, to not use it would be foolish. Companies are competing and winning on Big Data today. Even basic things, like AB testing, can be a lot of data........but its going to tell you what your customers want, not what you think they want. Its going to tell you what really accounts for the TLV of your customers. All models are wrong, but I would posit that they are wrong less than human intuition if done properly.

Bill Luker Jr, Deploying expertise in advanced statistical data science to overcome organizational challenges

Vince, nobody is restricting themselves to anything. We're simply talking about TDYH Data (The Data You Have) and JTRAO Data (Just The Right Amount Of Data, for the task at hand.) You and the rest of the IT big data evangelists--and when you see an evangelist coming down the street, you instinctively reach for your back pocket to see if your wallet is still there--have tried to cram this down our throats without ever asking the statistical-analytic community (a huge number of practitioners in all manner of organizations) whether we needed any of this. Instead, you tried to convince us all that we were going the way of the dodo unless we immediately conformed to your prescriptions.

OUR time is now. Statistical and econometric analysis--"predictive analytics" and "business analytics", in business parlance--is coming into its own, through an evolutionary realization on the part of organizational leaders that they must finally come to grips with an analytic approach to making decisions, instead of the glad-handing and back-slapping ways they learned in b-school.

This approach, like anything worth doing, requires patience, dedication, practice, and an education in how research can refine raw data into actionable information--which means a certain tolerance for early failures. It doesn't require new software and hardware. It does require a human brain. Actually, a team of human brains, who are skilled in all the ways and means of analysis. Not just IT people, although we are happy to work WITH them.

And that's the cutting edge of the divide you have inadvertently uncovered with your incessant "big data McCarthyism"--wherein, it's the IT-centric way or the highway: big data always and everywhere, and be cowed into silence if you raise any objections, no matter how sensible. I'm reminded of the Borg, in Star Trek: "resistance is futile." Theirs, and yours, is a juvenile, dictatorial, and authoritarian position with which many of us are simply fed up.