Don’t Become a Commoditized Data Scientist

Unicorns don't exist. Aim instead to be an endangered species.



Don't Become a Commoditized Data Scientist
Be unique and stand out (Photo by Ricardo Gomez Angel on Unsplash)

 

A commodity is a basic good used in commerce that is interchangeable with other goods of the same type. Commodities are most often used as inputs in the production of other goods or services. [...] The quality of a given commodity may differ slightly, but it is essentially uniform across producers.

Investopedia

 

The Commoditization of Data Scientists

 
Grains are commodities. Beef is a commodity. Natural gas, oil, and gold, too, are commodities.

You, as a data scientist, are not supposed to be a commodity.

Are data scientists all alike? Are all the same shaped peg able to fit into any hole an organization may be looking to fill? Are they simply interchangeable warm bodies?

Of course not. Data scientists perform varied tasks in a wide variety of settings, and use vastly different sets of technical and non-technical skills to be able to fulfil the requirements of their roles.

At least, that should be the case. However, seemingly more and more data scientists view the data science landscape as a list of boxes to be checked when it comes to skills, for all intents and purposes creating an army of similarly-skilled individuals vying for the attention of employers.

 
✔️ Basic Python programming skills
✔️ An overview Python's scientific computing ecosystem
✔️ Some understanding of neural networks and import tensorflow as tf
✔️ Basics of natural language processing and importing HuggingFace Transformers
✔️ Working knowledge of the basics of computer vision
✔️ SQL, or at least how to SELECT * FROM Customers WHERE Country='Canada';
✔️ Knowledge of what MLOps is, whether or not you have ever worked with it
 

Great, now you have the same skill set as everyone else.

That's not how you stand out. More importantly, that's not how you do your job. If this were the case, if each organization needed the same thing for a data scientist, they would simply grab the next one off the stack, without regard to the skills of a given individual.

Don't misunderstand: we all need to build a solid foundation on top of which to develop our own brand of data science skills. But even if you had an intermediate-to-expert level understanding of the skills listed above — which would in itself be impressive, no doubt — you aren't standing out on paper from others.

You've learned the basics. You've checked the boxes. It's time to build on that.

Organizations and individuals in charge of hiring data scientists often don't know what they are looking for... but they are looking for something! It's time for data scientists to stand out, and doing so will require employing the 's' word: specialization.

 

Be An Endangered Species

 
I'm going to guess that you got into data science because you have a sense of curiosity, are a logical thinker, and want to work on interesting problems. None of these characteristics should suggest to you that you should acquire the same set of skills and expertise that everyone else has! Everything about the innate characteristics of a data scientist screams "individual" while the general path to becoming one and the skills one acquires along the way whispers "conformity."

To help assure your long term employability, you need to stand out from the crowd, you need to set yourself apart, and to do so means you need to assert your individuality. The days of the generalized data scientist are over, if they ever really existed to begin with.

Upskill. Focus. Specialize. These are the keys to longevity in the data science game.

Unicorns don't exist. Aim instead to be an endangered species.

That's right, an endangered species. If you have skills, both technical and non-technical, that others around you don't have, you are an endangered species. In the animal kingdom this may not be beneficial for a species long term survival, but as an employable data scientist it certainly is.

So, how can you become an endangered species? Develop a specialized skill set, either technical or non-technical, or both.

 

Technical Skills

 
There are so many technical skills available to add to your repertoire these days, it almost seems ridiculous to list any. But in order to demonstrate that this need not be the difficult process you may believe it to be, I will do so.

First, we want to think about technical skills in the sense of being niche. You have already (presumably) covered the data science skills landscape in a wide and shallow manner; it's time to consider it through the dual lenses of depth and narrowness.

There are 2 basics ways I can think of to approach the acquisition of "niche" technical skills.

 

New and shiny

 
When acquiring skills required of the latest technical whatzit, you need to balance being too early and too late, which can be a thrilling high wire act. Nobody is looking for an expert in a new tool that came out yesterday, but once everyone is using it, your skills no longer make you that endangered species.

A suggestion would be to seek out recently developed open source tools that have yet to catch on but show real promise. Getting in on the ground floor and making some contributions would be a great way to differentiate yourself vis-s-vis that tool, especially when it goes prime time.

 

Tried and tested (but not mainstream)

 
This is the slow burn. The tool has been around for awhile, but it has yet to achieve the success that it likely should be enjoying. I think JAX is a great example of this. JAX has been around for some years, it is lower level than other similar tools so it has a following of folks looking for this advantage, and its popularity continues to grow. Adding some expertise here would set you apart form the TensorFlow or PyTorch crowd, especially if you are familiar with all of the above.

See, it's not necessarily about not knowing the other things, but about knowing them and something else.

 

Non-technical skills

 
I think the 2 ways which you can differentiate when it comes to non-technical skills are quite obvious, and we will look at these below.

 

Communication

 
Communication is key in data science. Nothing new to report here. However, what communication actually encompasses changes. Could you imagine how little the skill of "effectively communicating ideas with multiple colleagues simultaneously in a synchronous online meeting environment" would have been coveted 3 years ago?

Maybe to set yourself apart these days, you could come up with your own brand of buy-in solicitation: spend time developing your artefacts used for conveying the results of a project, and the story you build around it. This is something that is always stressed to new data scientists, but often the shiny new tool or technique takes precedence. There is nothing wrong with being the person on the team that other look to for effectively selling the team's results and vision to other stakeholders.

 

Domain expertise

 
This one is a no-brainer. Want to take your machine learning skills to the finance industry? You better learn about the finance industry!

This goes beyond industry domains; there are far too many folks attacking natural language processing from the technical side that do not have a solid understanding of linguistics, and it shows. Interested in setting yourself apart in NLP? Pick up some linguistic texts. Same goes for computer vision: if you don't know about hues, interpolation, Gaussian noise, etc., stand out by learning. It is only going to help you get in where you want to fit in.

 

Wrapping Up

 
Let's forget the idea that all data scientists need to know X, Y, and Z. There are many more letters in the alphabet of skills, so learn yourself an E, a J, or even a little M.

And always...

 

Don't Become a Commoditized Data Scientist
Image by author

 
 
Matthew Mayo (@mattmayo13) is a Data Scientist and the Editor-in-Chief of KDnuggets, the seminal online Data Science and Machine Learning resource. His interests lie in natural language processing, algorithm design and optimization, unsupervised learning, neural networks, and automated approaches to machine learning. Matthew holds a Master's degree in computer science and a graduate diploma in data mining. He can be reached at editor1 at kdnuggets[dot]com.