By Mikhail Mew, Researcher, Investor, Data Scientist
Here are the results of the KDnuggets Poll inspired by this blog:
Relax! Data Scientists will not go extinct in 10 years, but the role will change
Photo by Levi Bare on Unsplash
As advances in AI continue to progress in leaps and bounds, accessibility to data science at a base level has become increasingly democratized. Traditional entry barriers to the field such as a lack of data and computing power have been swept aside with a continuous supply of new data startups popping up(some offering access for as little as a cup of coffee a day) and all powerful cloud computing removing the need for expensive onsite hardware. Rounding out the trinity of prerequisites, is the skill and know-how to implement, which has arguably become the most ubiquitous aspect of data science. One does not need to look far to find online tutorials touting taglines like “implement X model in seconds” , “apply Z method to your data in just a few lines of code”. In a digital world, instant gratification has become the name of the game. While improved accessibility is not detrimental on face value, beneath the dazzling array of software libraries and shiny new models, the true purpose of data science has become obscured and at times even forgotten. For it is not to run complex models for the sake of doing so, or to optimize an arbitrary performance metric, but to use as a tool to solve real world problems.
A simple but relatable example is the Iris data set. How many have used it to demonstrate an algorithm without sparing a thought for what a sepal is let alone why we measure its length? While these may seem like trivial considerations for the budding practitioner who might be more interested in adding a new model to their repertoire, it was less than trivial for Edgar Anderson, a botanist, who cataloged the attributes in question to understand variations in Iris flowers. Despite this being a contrived example it demonstrates a simple point; the mainstream has become more focused on “doing” data science rather than “applying” data science. However, this misalignment is not the cause for the decline of the data scientist but a symptom. To understand the origin of the problem we must step back and take a bird’s eye view.
Data science has the curious distinction of being one of the few fields of study that leaves the practitioner without a domain. Pharmacy students become pharmacists, law students become lawyers, accounting students become accountants. Data science students must therefore become data scientists? But data scientists of what? The broad application of data science proves to be a double edged sword. On one side, it is a powerful toolbox that can be applied to any industry where data is generated and captured. On the other, the general applicability of these tools means that rarely will the user have true domain knowledge of said industries before the fact. Nevertheless, the problem was insignificant during the rise of data science as employers rushed to harness this nascent technology without fully understanding what it was and how it could be fully integrated into their company.
However, nearly a decade later, both businesses and the environment they operate in have evolved. They now strive for data science maturity with large entrenched teams benchmarked by established industry standards. The pressing hiring demand has shifted to problem solvers and critical thinkers who understand the business, the respective industry as well as its stakeholders. No longer will the ability navigate a couple of software packages or regurgitate a few lines of code suffice, nor will a data science practitioner be defined by the ability to code. This is evidenced by the increasing popularity of no code, AutoML solutions such as DataRobot, RapidMiner and Alteryx.
What Does This Mean?
Data scientists will be extinct in 10 years (give or take), or at least the role title will be. Going forward, the skill set collectively known as data science will be borne by a new generation of data savvy business specialists and subject matter experts who are able to imbue analysis with their deep domain knowledge, irrespective of whether they can code or not. Their titles will reflect their expertise rather than the means by which they demonstrate it, be it compliance specialists, product managers or investment analysts. We don’t need to look back far to find historic precedents. During the advent of the spreadsheet, data entry specialists were highly coveted, but nowadays, as Cole Nussbaumer Knaflic (the author of “Storytelling With Data”) aptly observes, proficiency with Microsoft Office suite is a bare minimum. Before that, the ability to touch type with a typewriter was considered a specialist skill, however with the accessibility of personal computing it has also become assumed.
Lastly, for those considering a career in data science or commencing their studies, it may serve you well to constantly refer back to the Venn diagram that you will undoubtedly come across. It describes data science as an confluence of statistics, programming and domain knowledge. Despite each occupying an equal share of the intersecting area, some may warrant a higher weighting than others.
Disclaimer: Views are my own, based on my observations and experiences. It’s ok if you don’t agree, productive discussion is welcome.
Bio: Mikhail Mew is a Researcher, Investor, and Data Scientist, as well as a curious observer, providing thoughts and insights from the confluence of investing and machine learning.
Original. Reposted with permission.
- How a Data Scientist Should Communicate with Stakeholders
- Building a Knowledge Graph for Job Search Using BERT
- Five types of thinking for a high performing data scientist