Poll: Big Data effect on Data Science is minor
Latest KDnuggets poll results say that Big Data has only partial effect on Data Science, and does not change fundamental Data Science principles
By Gregory Piatetsky, Oct 29, 2013.
Latest KDnuggets Poll asked:
Has "Big Data" significantly changed Data Science principles and practice?
Almost half thought that Big Data has only a minor effect on Data Science, and only 21% thought that Big Data changes Data Science significantly.
Here are full results (based on 149 votes):
- 21%, Yes, "Big" Data Science is significantly different from "small" Data Science
- 47%, Partly, Data Science for Big Data is somewhat different from "small" Data Science
- 25%, No: Data Science is the same, regardless of data size
- 7%, Not sure
Geographic analysis shows that both US and European data scientists were in agreement, and thought that Big Data has only a partial impact on Data Science, while Asia is the only region which thinks that Big Data has a significant effect.
|Region||% Data Science is the same, regardless of data size||% Partly (Data Science for Big Data is somewhat different) or Not sure||% Yes, Big Data Science is significantly different|
From LinkedIn group Big Data, Analytics and Data Science Training, a subgroup of Advanced Business Analytics, Data Mining and Predictive Modeling
Lee Slutz, Advanced Analytics Strategy and Consulting:
Thank you very much for posing these very interesting questions. However, before I can respond I would like your help in understanding some of the terms you are using.
Would you mind defining what you mean by "Big Data"? Exactly how many bytes of data constitute "Big"?
Also, if you would please, precisely define what you mean by "Data Science".
I have read numerous definitions, all of which either define Data Science by what it is not, or define Data Science as some amorphous concept that resides at the "intersection" of various other disciplines. The later, of course, is not a useful definition, since it, leaves it up each individual to determine how these various other disciplines might intersect; therefore, everyone can construct a different definition, which means there is no definition. In addition, I take issue with the implication that set theory can somehow be applied therefore trying to give the definition the aura of having an empirical basis.
Of course, these terms are fuzzy - but many terms are, but still useful. Try to define Art or Pornography! The best definition of "Big Data" I saw is "Data is Big when data size (velocity, variety) becomes part of the problem". Data Science is really the latest name for Data Mining, Knowledge Discovery, Predictive Analytics - topic of research at many conferences.
Alexander Kriegel, Enterprise Information Architect, PMP, CSM
Most definitions go back to Gartner's 3V - volume, variety and velocity - of the data. Though there is no clear threshold when "large" becomes "big"... the key question to ask when analyzing data is, IMHO, this: Do I have a Big Data problem or I have just a "lots of data" problem?
Data Science has changed with adding the ability to figure in even more different data sets that ever before,being able to build and run ever more sophisticated models in reasonable time-frames but I would not go too far as to attach to it labels such as paradigm shift, quantum leap, ground shattering etc...
Once the data have been crunched, it have to be interpreted - after all, what use is the answer to the Ultimate question of Life, the Universe and Everything being but a single number, 42? :)
Robin Lake, Principal
The SIZE of the data now leads to greater certainty in the results, but also leads to a greater variety of possible answers. Back in the '70's when the data set consisted in two containers of punched cards, the analysis results often only SUGGESTED a solution. Now, we get a more certain solution, but also some novel outliers that make us want to explore WHY they are there.
I respectfully disagree with the statement that "The SIZE of the data now leads to greater certainty in the results, but also leads to a greater variety of possible answers."
The Big Data, Data Science canard is even more susceptible to the issues recently described in the Economist.
Other data-heavy disciplines face similar challenges. Models which can be "tuned" in many different ways give researchers more scope to perceive a pattern where none exists. According to some estimates, three-quarters of published scientific papers in the field of machine learning are bunk because of this "overfitting", says Sandy Pentland, a computer scientist at the Massachusetts Institute of Technology.