10 Mistakes You Should Avoid as a Data Science Beginner

Read this article on how to gain a competitive advantage in the data science job market.

Image by Steve Buissinne from Pixabay

Data science is a success. Thousands of students around the globe sign up for online courses or even a data science master program.

The data science field is a very competitive market, especially to get one of the (supposed) dream jobs at one of the big tech companies. The positive news is that you have it in your hand to gain a competitive advantage for such a position by preparing yourself adequately.

On the other hand, there are (too) many MOOCs, master programs, bootcamps, blogs, videos and data science academies. As a beginner, you feel lost. Which course should I attend? What topics should I learn? On what methods do I need to focus? What tool and programming language must I study?

The truth is that every data scientist has her/his individual journey and is biased towards that learning path. So, without knowing you, it is difficult to say what’s the best approach for you.

But there are common mistakes made over and over again by all data scientists. Even when knowing them, you will not avoid them altogether, but eventually, stop earlier in doing them and find faster back to the road to success.

Based on my 20+ years of experience in data science, leading teams up to 150 people, and still giving lectures on a part-time basis at one of the leading global universities, I summarized for you the core mistakes to avoid reaching your dream faster.

The mistakes are given in the order of the learning progress as a beginner data scientist.

#1 Investing too much time in assessing all the different types and options of courses available before you finally start — or eventually never start

I know that you are overwhelmed by all the courses, and you try not to make any mistakes. You want to invest your time and money effectively and select the right approach which promises the fastest and best success.

Unfortunately, there is no immediate success like in any technical and scientific field, and for the best possible success, you will not have any comparison.

The fact is that today, all established platforms, academies, and institutes have good courses. So, do not overthink and over-analyze the courses. Be brave and choose one, complete that course and then select another one.

The most crucial aspect is starting and doing. You cannot make a mistake here because you neither know your journey nor how it would have been different when choosing another one. No one can tell you that. Period.

It is also important to realize that learning is circular and not linear. Taking one data science course does not exclude that you are taking another one.

I do still data science, machine learning, and AI training after all my years of experience. In every still so “simply” beginner course, I detect a new aspect and a new view on the topic. And this is exactly what finally makes a high-demanded data scientist. It’s to understand all the different perspectives on a topic.

#2 You want to learn too many methods and tools at once instead of learning and understanding the methods one by one

Many aspiring data scientists think that having as many as possible methods mentioned in the CV help to get a job faster. But the contrary is true. When applying for a job and you started only six months ago with data science for every recruiter, it’s clear that it is buzzword dropping with no substance behind it.

If we look at regression models, there are many books only about regression. There are more than 50 regression types, and each comes with different preconditions. So, only have “regression” in your CV does not say anything. Also, regression models are still the most important models for applications and to set the basis of understanding for data science in general.

You must understand what is solved by a method; what are the assumptions; what do the parameters mean; what are pitfalls; and so on, and so forth.

Based on the CV and how the knowledge of regression is described, every experienced recruiter — or today, the algorithms behind the process — can identify the depth of your understanding.

It is better to have in-depth knowledge and experience in only a handful of methods than knowing many with no substance.

#3 You code everything from the beginning because you think this helps you to program better and faster

When starting coding, people think they must quickly begin coding and re-programming as many algorithms as possible. Also, here you should focus on understanding a few and not on quantity.

First, you need to understand the prerequisites of coding: linear algebra, mathematical induction, discrete mathematics, geometry — yes, this is the strength of the excellent programmers but often forgotten by data scientists, statistics and probability theory, calculus, Boolean algebra, and graph theory.

I did not become better and faster by coding more. I got good at programming by understanding the mathematical basis, reviewing the code of others, and run and test them on different data and problems.

Yes, coding is essential, but more important is to understand the (good) architecture of code. And this can only be learned by reviewing other code.

A fact is that code becomes more and more a commodity, and there are even no-code tools. The differentiator will not be anymore between the ones that can code and those that cannot, but the ones that understand its architecture and those that do not.

I show you another example: I assume you have already used TensorFlow. But do you understand what it is? What does it do? And why it is called “TensorFlow”? Do you know what a tensor is? Not just the mechanical calculation of a tensor product, but what does it mean geometrically?

#4 By learning the theory, you think you know everything but miss enough practical experience

Learning data science is try and error. Only when you make as much experience as possible, making all the errors and resolving them, you get a deeper understanding.

The theory is okay and vital. You need an understanding of the fundamentals.

Unfortunately, in practice, it rarely works like in theory. On the contrary, it often works precisely in a way, as you have learned you should not do it.

So, you must start from the beginning with practical examples. Often, you will not feel ready to do practical work: not enough knowledge of the basics or not enough programming experience.

But I strongly advise: start at the beginning even though you do not feel ready to do exercises. It has not to be a daylong or week-long project. A small 1–2 hours project is enough.

You can either start with a no-code tool like RapidMiner or KNIME or take somebody else’s code and apply it. E.g. take a simple sentiment analysis code and use it to Tweets or product description. Then you can start to alter the code for other examples and compare the results.

When you learned talking as a small child, you started with single words or expressions of two or three words. And step by step, you built up a feeling for the language. It is the same with practical experience in data science.

Pro tip: Learning is circular. So, store your work. Later you can come back, improve it, move it to GitHub, and add visualizations with Tableau.

#5 You think that certifications are a competitive advantage to get a data science job

Certifications are okay. There are many voices out there that tell you that you should not do certifications. But they can serve as a motivation, and finally, they show officially your progress and your eagerness to learn. I still do certificates. There is nothing wrong with it, and when you invest time, it is legitimate to have it.

But it is not a differentiator in the market. The fact is that there are thousands of people that have the same certifications. So, to have a competitive advantage, you must go beyond that.

For example, a student of mine approached me for support for an internship opportunity in the finance field. He wanted to apply what he has learned and get to know the culture and cooperation within a data science team. I could place him with a bank, and he writes a semester thesis from that. Yes, it is stressful to do the study and the internship and semester thesis in parallel. But it will give him an invaluable competitive advantage for job offers.

#6 You worry about the opinion of other people instead of building your own opinion based on facts

Most aspiring data scientists worry about the opinion of other data scientists. And the more arguments they hear, the more confused they are. Even though confusion is required to the path of clarity, it should not remain a steady state.

Each data scientist is an individual with her/his experience, learning and career path and opinion. I am used to saying, “if you have two data scientist in a room, you have at least four different opinions.”

It is good to take opinions as inspiration and as a guide to search for information, but not as the information itself.

Search for hard facts. Draw your logical conclusions, validate, and update them again. This is an important skill to progress successfully in your data science career.

#7 Not caring about business and domain knowledge

Many data scientists think they can apply the methods to every problem and industry, but I can tell you that’s wrong from more than 20 years of experience.

Too often, I saw data scientists presenting findings to the business people, and the reaction was, “oh, we know this already. What we need is ‘why that happens’ and ‘how to solve it.’ Or, in the worst case, ‘this is absolute nonsense because this is not how our business works.’ Boom!

It is more important to have domain knowledge than knowing all the sexist and fanciest methods. A data scientist is solving a business problem, not a technical problem. By solving a business problem, you bring value to the company’s business, and you have only so much value as the value of your solution. You do this successfully when you know the business.

I worked in many different industries. Each time before I even started to engage with the business, I read a lot about the industry.

I started with Wikipedia, learned the big picture and about the companies
I looked up the annual reports and investor relations information of the top 10 companies in an industry
I read all the news articles of the last few years about this industry and companies
I contacted my LinkedIn contacts who work in this industry

Only then, I started to interact with the business.

Half of your learning should contain the development of industry and business knowledge.

#8 You are not studying and learning on a consistent and ongoing basis

It is very easy to be distracted or give up early because you do not understand the topic. Learning data science is a marathon and not a sprint. So, it is essential to build up a routine to study ongoing and consistent. Like in marathon training, you train in small units but daily.

Also, as written before, learning is circular. Having once studied a topic does not mean that you have mastered it.

Let me give you an example. In the mathematical finance lectures, I had to learn many limit theorems. The exam went excellent, and I was convinced that I understand them. But seven years later, when I had to review code for the valuation of complex structure financial products, the scales fell from my eyes, and I realized that I did not understand it until that moment of code review.

So, book daily, or at least weekly, a few hours to learn. It does not matter whether you are an aspiring or already a senior data scientist.

The learning should consist of new data science topics, already learned topics but from another perspective, e.g. another course or book, new technologies and technology trends, industry and business knowledge, data visualization and data storytelling, and applications to data.

It adds layer and layer of understanding, and in the job interview, you will be able to give convincing answers by presenting the holistic view from different perspectives.

#9 No storytelling with the data

In a data science job, you will primarily communicate your findings to non-technical people, notably, the people from the business. And the business is financing your job. Without their commitment, your job and the data science team would not exist.

Your job is to bring value to the business. It is not to apply fancy methods only for the sake of application.

A friend of mine is the data science lead of a global bank. When they are hiring data scientists, they send them two weeks in advance a dataset and ask for a 20 minutes presentation. No further input is given. They want to see the storytelling. They are not interested in the methods applied — except a candidate would tell absolute nonsense about the methods used. What they want to see is, first, the framing of the business problem and why it is important to solve. Second, what should be solved and last, how it is solved, and the result in a business context. “This is the most important work we do the whole day. A candidate must not be perfect in that but show that she/he has understood what is important in our job.”

So, learn data storytelling — there are even free courses about that — and learn visualization of data in a business context.

#10 Learning on your own without interactions with the data science community

Many people think they can learn data science through their own hard work. All the other data scientists are seen as competitors, and one is reluctant to exchange knowledge.

But living in your world where you only read and learn based on your selection is highly biased, and many perspectives on a topic or method are missing. Further, the open discourse about a topic and gaining experience in argumentation is missing — a skill needed by any data scientist.

Any experienced recruiter knows after one or two questions if you are a one-person show or if you have a vivid network that helps you to gain knowledge exponentially. This benefits the company and increases your market value and demand.

So, it’s crucial to develop a network. This can be done by attending bootcamps, hackathons, and Meetup meetings.

Now, you know theoretically what you should avoid.

Any of these mistakes is a potential showstopper for your data science job.

I know that you still will make several of these mistakes. I am not different. It is in human nature to think that “I am different” — even though the data says the contrary. But the awareness of these potential mistakes will help you to re-adjust your path faster and thus become more effectively a demanded data scientist.

Do you like my story? Here you can find more.

Hands-On Step-By-Step Guidance to Grow Your Job Opportunities
How to leverage Meetup meetings strategically to get your dream data science job

The Ultimate Guide on the Data Science MicroMasters Programs on edX 2021
Which of the 6 programs should you choose?

The Top Technology Trends and Their Impact on Data Science, Machine Learning and AI
An action plan for you and your career

Bio: Isabelle Flückiger is a Senior Executive with international C-level advisory experience in end-to-end digital, data and new technology transformation projects, with key industry experience in banking, insurance, chemicals, utilities and pharma/life sciences.

Original. Reposted with permission.

Related: