Data science is not about data – applying Dijkstra principle to data science
What is Data Science really about? Is it the data, or the algorithms, or something else? Similar foundational philosophical struggles exist with other scientific fields, including computer science, and maybe we can look to these resolutions to better understand the true 'meaning' of data science.
By Mehmet Suzen, Theoretical Physicist | Research Scientist.
Image source: Wikipedia.
Edsger Dijkstra was a Dutch theoretical physicist turned computer scientist and probably one of the most influential earlier pioneers in the field. He had deep insight into what computer science is and a well-founded notion of how it should be taught in academics. In this post, we extrapolate his ideas into data science. We develop something called the Dijkstra principle for data science that is driven by his ideas on what does computer science entail.
Computer Science and Astronomy
Astronomy is not about telescopes. Indeed, it is about how the universe works and how its constituent parts are interacting. Telescopes, either being optical or radio observations or similar detection techniques, are merely tools to practice and do investigation for astronomy. A formed analogy goes into computer science as well, and this is the quote from Dijkstra:
Computer science is no more about computers than astronomy is about telescopes. - Edsger Dijkstra
Dijkstra in Zurich, 1984 (Wikipedia).
The idea of Computer Science not being about computers is rather strange at first glance. However, what Dijkstra had in mind are abstract mechanisms and mathematical constructs that one can map real problems to and solve as a computer science problem, such as graph algorithms. Though Computer Science had many subfields, its inception can be considered as being rooted in applied mathematics.
Dijkstra principle for data science
By using Dijkstra's approach now, we are in a position to formulate a principle for data science.
Data science is no more about data than computer science is about computers. - Dijkstra principle for data science
This sounds absurd. If data science is not about data, then what is it about? Apart from the definition of data science as an emergent field, as an amalgamation of multiple fields from statistics to high-performance computing, the idea that data not being the core tenant of data science implies the practice does not aim at data itself rather a higher purpose. Data is used similar to a telescope in astronomy, and the purpose is to reveal the empirical truths about the representations that the data conveys. There are no unique ways to achieve this purpose.
Such a Dijkstra principle for data science would be very helpful in understanding the data science practice as being not data-centric, contrary to the mainstream dogma, rather as a science-centric practice with the data being the primary tool to leverage, using a multitude of techniques. The implication is that machine learning is a secondary tool on top of data in practicing data science. This attitude would help causality play a major role in shifting modern data science forward.
Original. Reposted with permission.
- Data Science: (not) the preferred nomenclature
- Data Science Books You Should Start Reading in 2021
- How to frame the right questions to be answered using data