Are Data Scientists Still Needed in the Age of Generative AI?

The Rise of ChatGPT.

Are Data Scientists Still Needed in the Age of Generative AI?
Image Created with Stable Diffusion


Exactly 2 years ago, I wrote an opinion piece called “Data Scientists Will be Extinct in 10 Years”. To my surprise, it would become one of my most-read articles on both Medium and  KDnuggets. However, the response was polarizing. It attracted the most criticism that I’ve received in my adult life. I foretold the demise of the sexiest (and one of the most in-demand) jobs of the 21st century and my peers took issue, but I accepted the feedback and life moved on. Fast forward to now; and what a difference two years makes. ChatGPT has taken the world by storm, and with it, the narrative that a specific role will be phased out has been eclipsed by another with far greater implications; the obsolescence of human capital in every conceivable industry. 

The revolution seems to have happened overnight. But those of us that have followed the progress of deep learning closely know very well that it didn’t. ChatGPT was the accumulation of decades of research that inexplicably culminated in an unassuming chatbot. At the core of the success of ChatGPT is the fact that it democratizes AI. Being code literate and having deep technical knowledge are longer barriers to entry, accessibility to cutting-edge deep learning has transcended the domain of academic research and big tech to be available at the fingertips of anyone with wifi access and an email address.


Why are Data Scientists Extinct?


Never in my wildest dreams did I think that we were on the precipice of a technological revolution of the speed, scale, and nature that we experienced? Before LLMs and Text to Image Models, Generative AI (GAI) was largely synonymous with Ian Goodfellow’s Generative Adversarial Networks (GANs). It was hailed as one of the great AI research contributions in recent years, manifesting in the ability to use a pair of neural networks to generate synthetic, photo-realistic images. Those of us that have worked with GANs before know that they’re notoriously difficult to train and even when implemented correctly, the use cases at the time were limited. Therefore, it’s even more amazing that generative deep learning has heralded the latest round of advancements. 

So why would ChatGPT(and its GAI compatriots) bring data scientists to the brink of extinction? Let’s revisit the original thesis from two years ago:

  1. The ability to regurgitate code and use software packages will no longer define a data scientist as low/no-code solutions were already becoming prevalent.
  2. The ability to work and analyze data will become an assumed skill set for many roles much like computing skills and MS Office knowledge.
  3. In this paradigm domain specialists that can solve real-world problems will excel. Data science will become part of their toolkit.
  4. Given the above, generalist data scientists will be phased out in favor of domain experts.

Given this, we can see that GAI facilitates almost every one of the above points. It can generate code, analysis of data sets, and results of queries directly from text prompts. The requirement for AI-ready professionals who can use ChatGPT has already started creeping into job descriptions and we know that despite the productivity gains that come with using GAIs, the AI is still prone to hallucinations, it can still get it wrong, reinforcing the need for deep domain expertise to address these instances. In summary, it hasn’t taken 10 years, it’s only taken two.

However, data scientists becoming extinct doesn’t mean humans doing data science will become obsolete, quite the opposite in fact. When we look back in history, over the last 200 years we’ve witnessed several technological revolutions, these have included the introduction of steam power, mass production, and personal computing to name a few. Each one has enabled us to be more productive than the last as our roles and relationships with technology evolved, this concept is well rooted in economic theory (Solow Growth Model). In the current environment, businesses are creating and capturing more data than ever, thus data science skills will always be in demand but the data scientists of the future won’t be called data scientists, they will go by names like product managers, marketing specialists, or investment analysts. Data scientists are extinct, long live data science. 


Disclaimer: Views and opinions are the author's own.

Michael Wang is an investment and data science practitioner with over 10 years of industry experience across various roles within fintech, investments, trading, and teaching. He is the Principal Consultant and Founder at WhyPred, an analytics consultancy that specialises in combining financial markets expertise with AI and Machine Learning.