5 Key Skills Needed To Become a Great Data Scientist
Based on 10 years of my experience (learn to build those skills).
By Sharan Kumar Ravindran, Senior Manager (Data Science)
One doesn't need to have an innate talent to become a successful data scientist. Yet, some skills are required to be successful in data science. All those key skills can be acquired by anyone with proper training and practice. In this article, I am going to share some of the important skills, Why they are considered important for a data scientist. Also, How those skills can be acquired.
Data Scientists should develop the habit of critical thinking. It helps in better understanding the problem. Unless the problems are understood to the most granular level the solution can’t be good. Critical thinking helps in analyzing the different options and helps in choosing the right one.
While solving data science problems it is not always a good or bad decision. A lot of options lie in the grey area between good and bad. There are so many decisions involved in a data science project. Like, choosing the right set of attributes, the right methodology, the right algorithms, the right metrics to measure the model performance, and so on. It requires more analysis and clear thinking to choose the right options.
Photo by Diana Parkhouse on Unsplash
A simple way to develop critical thinking is to be as curious as a child. Ask as many questions as possible until there are no more questions. The more we ask the more we understand. The better we understand the problem, the better the outcome.
Let me demonstrate critical thinking with an example. Let us consider the following scenario at a telecom company. We want to identify loyal and high-net-worth customers. To identify this customer segment, we would have to start with a series of questions like,
- What are the different profile categories of customers?
- What is the average age of the customers?
- How much does a customer spend?
- What is the frequency in which the customer interacts?
- Has the customer been paying the bills on time?
- Have there been any late or missed payments?
- What is the lifetime value of the customers?
These help in identifying the elite customers. It helps the organization to ensure those customers experience the best service.
There are techniques that help in improving critical thinking ability. One such technique is First principles thinking. It is a mental model that helps in better understanding the problem. Here is an example of using the first principles to solve a data science problem.
Mental models are amazing tools that help in clear thinking and better decision-making. Hence adopting the mental models helps in improving your critical thinking ability. Here is an article that highlights the benefits of adopting mental models at work.
Coding skills are as much important to a data scientist as eyes are for an artist. Anything that a data scientist would do requires coding skills. From reading data from multiple sources, performing exploratory analysis on the data, building models, and evaluating them.
Photo by Firos nv on Unsplash
What could happen with the AutoML solutions? Many AutoML products are coming up in recent years. Many even think that soon there would be no need for any coding skills. Let us take an example,
- There are 2 companies, company A and company B
- Both of them are using the most popular AutoML product
- They are able to solve several data science problems using AutoML
- Now one of them wants to dominate the market
- The company which can do over and above the solution implemented using AutoML solution will have a better chance.
There is no denial, AutoML solutions will have widespread adoption in the future. Many standard problems the data science team solves today will be automated. It doesn’t mean the end of data science jobs or the end of the need to write code by data scientists. It will enable the data science teams to focus on new problems.
The amount of data getting captured is so high today. Many organizations today are using only a fraction of available data. With AutoML the focus will shift to the unexplored.
Are you interested in data science but feel that you do not have the coding skills? Here is an article that will help you with learning to code for data science.
Math is another important skill to be understood by data scientists. It will be OK for you to not be aware of some of the math concepts while learning data science. It will not be possible to excel as a data scientist without understanding the math concepts.
Photo by ThisisEngineering RAEng on Unsplash
Let me take a simple example and demonstrate how maths concepts are useful in solving problems. Let us pick customer churn analysis.
- We will start with understanding the behavior and characteristics of different sets of customers. One way to work on this is to pick different sample data and look for patterns. The math concept required here is the statistics and probability
- To efficiently perform the data analysis the understanding of linear algebra will be very handy
- Let’s say we want to build a model to predict the users who are likely to churn. To understand the concepts of gradient descent the calculus knowledge will be helpful. If you are using a decision tree then knowledge of information theory will help in understanding the logic to build trees.
- If you are looking forward to optimizing the parameters then the knowledge of operation research and optimization might be helpful.
- To efficiently implement the model evaluation, the math concept such as Algebra can be very helpful
This is not all, there is no machine learning algorithm without math. It doesn’t mean you need to be a mathematician to be a successful data scientist. All it requires is a high school level of math.
If you are interested in learning Math for data science. Here is the best course for you.
A data scientist can’t work in isolation. A data scientist should collaborate with multiple people to ensure the success of the project. Even today many data science projects fail. The number one reason for most of the failures is a lack of understanding and collaboration between the teams.
Photo by Kaleidico on Unsplash
To explain the importance of collaboration and working across different teams. Let’s consider a scenario where the data science team is working with the customer growth team. The objective is to understand the reason for customer churn.
You decide to talk to few different teams and here is what they say
Growth Team — Customer churn is mostly due to the aggressive discounts offered by the competitions
Marketing Team — The new feature released by the product team might be causing some issues hence making the customers churn
Product Team — The marketing team is just focused on bringing in a lot of new customers without establishing the worthiness or intent of the customer
Customer Support Team — There have been many payment-related issues reported by many customers. It could be the reason for the customer churn
If you have not spoken to the other teams you would have started to work on the problem just based on the inputs provided by the growth team. You can’t solve a problem by just relying on inputs from just one team. Even if the growth team is the primary sponsor here it is not sufficient to just rely on the inputs provided by them. To get a holistic picture you need to talk to a different set of stakeholders. When you limit the people or the teams with whom you are working with then the bias from those will pass on to the solution you are building.
Also, in many cases, the data science team needs to work closely with the data engineering and the other technology teams. Without a good collaborative effort, there won’t be success.
Communication and storytelling
- Amount of effort invested in the project
- Accuracy of the final machine learning model deployed in the production
- Insights identified from the exploratory analysis
All these are useless if the solutions are not well communicated to the stakeholders. The problems and the solutions involved in data science are generally much more complex. It is very important to simplify them before communicating them to the business. The use of the story-telling approach in communicating helps a lot.
Let me take an example and explain the importance of good communication more simple. Let us consider the following scenario. The data science team is working on a forecasting model to predict the energy usage for retail energy customers. The data science team needs to convince the business and the infrastructure team about the importance of having and running at least 10 different models for better accuracy. This means higher use of computational power and a lot more time for training the models.
Option A — You take about the clustering technique used to group the customer into different groups and hence you say there needs to be a model for each of those groups.
The problem here is the business team has not been conveyed about the benefit of actually having one model for each of the groups. So they might not be convinced if the cost turns out to be high.
Option B — You start with the profile and the characteristics of the customers. You show the energy usage patterns of the customers. You show the business team the distinctive patterns like some households almost use negligible power during the weekend maybe because they generally tend to spend the weekend at a different place. Similarly, you show the distinctive patterns and hence you explain one model will not fit all these different customers, and hence there need to be at least 10 different models each to cater to the 10 different unique categories of customers.
Now, the business understands the importance of having so many different models. They can easily compare the incremental benefit with the required infrastructure cost to evaluate the options.
It is the job of the data science team to clearly communicate the idea to the stakeholders. It is not an easy job as most people have limited knowledge about data science. A data science project is considered successful only when the business finds value from it.
One good way to improve the collaboration at your organization would be to provide an environment where there is a good flow of information between the teams.
Leadership skills — Good to have
Last but not least is leadership skills. Most organizations have a small data science team and they generally work on different sets of problems. It is very common for a data scientist to get pulled into different meetings and for Adhoc questioning. It is the job of the data scientist to decide when to say yes and when to say No. It is very important to set the priorities right.
Also, data scientists need to have a clear thought process and should have the ability to envision the outcome. Many times there will be a lot of pressure from the business teams to rush up the analysis. It is the role of the data scientist to manage the expectations and produce an outcome of high quality.
To stay connected
- If you like this article and are interested in similar ones, follow me on Medium. Subscribe to Medium for access to thousands of articles related to career, money, and much more.
- I teach and talk about various data science topics on my YouTube Channel. Subscribe to my channel here.
- Sign up to my email list here for more data science tips and to stay connected with my work
Bio: Sharan Kumar Ravindran is a Senior Manager (Data Science), Top Writer in AI on Medium, and a data science leader with over 10 years of experience. He writes and talks about data science with the objective to make it more accessible.
Original. Reposted with permission.