Data Science: (not) the preferred nomenclature

The term Data Science should describe the “Science OF Data”, while doing Science WITH Data could be called “Data-Driven Science”. Whatever your preferred term, reinforcing the distinction will help establish the Science OF Data and doing Science WITH Data as bona-fide disciplines.

By Peter Flach, U. of Bristol.

Data Science Not Preferred Term
Data science is new and exciting. Data scientist has been called the sexiest job of the 21st century. But what, exactly, is data science?

There is no shortage of position papers, Venn diagrams and white papers offering a perspective on, if not a definition of, data science. But I feel many of them ignore one critical distinction: the difference between the Science OF Data and doing Science WITH Data.

The Science OF Data is an academic subject that studies data in all its manifestations, together with methods and algorithms to manipulate, analyse, visualise and enrich data. It is methodologically close to computer science and statistics, combining theoretical, algorithmic and empirical work.

Doing Science WITH Data occurs in other academic subjects, where analytics becomes a major way to build models, design artefacts, and generally increase our understanding of the subject in a data-driven way. Here, the way how “going data-driven” impacts on the methodology depends on the subject: for many disciplines such as biochemistry or particle physics this involves the ability of conducting high-throughput experiments, while for other subjects such as civil engineering or digital humanities the impact manifests itself differently.

This distinction between Science WITH X and Science OF X is not new. There is a close analogue with computer science as the Science OF Computers and computational science as doing Science WITH Computers (also called the “third scientific paradigm”, distinct from the “fourth paradigm” – doing Science WITH Data!). We could also say that probability theory is the Science OF Randomness, and statistics is doing Science WITH Randomness; or that pure mathematics is the Science OF Abstraction and applied mathematics is doing Science WITH Abstraction.

This is not to say that one is the “real thing” and the other is intellectually inferior; or that the two do not overlap and interact. On the contrary: I believe that a real synergy arises from the Science OF Data being contemporaneously being developed and advanced with advanced applications arising from doing Science WITH Data. It creates a similar “buzz” to the one in early twentieth-century quantum physics, when discoveries in experimental physics went hand-in-hand with advances in theoretical physics.

Nevertheless, experimental physics and theoretical physics are clearly distinct subjects with their own methods and success criteria, and although sometimes physicists might work in both subjects simultaneously it would be a mistake to ignore the distinction altogether.

It would be similarly non-productive and short-sighted to view Data Science as a cocktail of the Science OF Data and doing Science WITH Data.

So, where does that leave us? If it was up to me, I would reserve the term Data Science for the Science OF Data, while doing Science WITH Data could be called Data-Driven Science, or Data-Intensive Science. Whatever your preferred nomenclature, reinforcing the distinction on a terminological level will help establish the Science OF Data and doing Science WITH Data as bona-fide disciplines.

Original. Reposted with permission.

Bio: Peter Flach is Professor of Artificial Intelligence at U. of Bristol, UK & Editor-in-Chief of Machine Learning.