Poll Results: How long to become a good data scientist

KDnuggets readers think it takes about 5 years to become a good data scientist, with AU/NZ taking the longest view and Latin American analysts the most optimistic. Asia, North America, and Western Europe show surprising unanimity. Developing the analytic intuition, learning to automate, and data cleansing are among the top obstacles.

Recent KDnuggets poll asked:
How long does it take for a beginner to become a good data scientist ?

The median and the average answer is about 5 years.

Here is the breakdown by region, with AU/NZ taking the longest view of becoming a good data scientist (6.9 years), and Latin Americans being the most optimistic (3.9 years).

There was also a suprising consensus among US/Canada, W. Europe, and Asia, where the average answer was the same - 4.9 years. 

How long does it take for a beginner to become a good data scientist ?    [278 votes total]
< 1 year (6)  2%
1-2 years (33)  12%
2-4 years (85)  31%
5-8 years (91)  33%
> 8 years (35)  13%
Not sure (28)  10%


In the table below, the bar height corresponds to the number of voters from that region. 

Region (Count) % Avg Years to become a good data scientist
AU/NZ (9) 6.9 years
E. Europe (19) 5.9 years
US/Canada (143) 4.9 years
W. Europe (60) 4.9 years
Asia (25) 4.9 years
Africa/Middle East (9) 4.4 years
Latin America (12) 3.9 years


Ronald Dodge, Good Data Scientist
To be a good data scientist, for most people, I would tend to think at least 1 to 2 years of experience after gaining their BBA in Information Systems or related degree, if not more like 4 years or so. For me, I picked up on the required skills with 90% to 95% of the skills being self-taught. I learned the skills while I was in high school going far beyond what they were teaching in high school back in the late 1980's. But then I also have this gifted skill of thinking logically, which gave me a major advantage.

Even with me naturally having the skills, it still took about 18 months for me to develop what I call good programming practices without being taught how to program by anyone or even getting help other than for the research I have done over the internet as different situations and cases came up.

One of the major benefits that came about from automating a lot of the processes, it freed up time to be able to look at other aspects of data and to continue learning other things. It also allowed fellow co-workers to be able to focus their time on other important matters. One of the first major tasks I had automated using various tools (some self created), I took the work of a production statistician from 80 labor hours to 20 labor hours per week within a 2 week period and eventually took it down to just 4 hours a week. In that same process, I also changed the process of how operators and assistants reports information reducing the amount of manual processes they have to do, which actually the error rate of data from about 40% down to about 5%.

One of the biggest challenges with automating processes, you have to keep those checks and balances in play. While in college, they teach about 20% of the code is doing the actual work, I would challenge that and claim only 5% to 10% of the code is doing the work, when you take into account of the good programming practices, and all of the various errors the program needs to take into account.

Of the various things I have done, one of the tasks that is more difficult to learn is the data cleansing aspect. There are various different ways and formats data can come in, and learning how to convert that unstructured or semi-structured to structured data is not so easy to do. In the past, such data would at least be semi-structured, but still require additional cleansing.

Today, we are now having to cleanse through unstructured data, which means we are now having to teach computers how to process natural languages like us humans. That is one area I had a very hard time learning growing up, which English is my native language, but yet, I didn't really have it down until the 8th grade as a result of a learning disability issue. However, that LD issue forced me to learn the art of memorization (an acquired skill for me as it took me 10 years growing up to conquer this skill with no real help from anyone to conquer this fete).

I was in that 1% group who loved those sentence diagrams (aka fish bone diagrams), as that was the only real way how I could learn the grammar of English. This was done via the combined skills of the art of memorization (acquired skill) and being able to think logically (gifted skill). That is one of the biggest traits I have come to realize as a required trait to be able to succeed in this line of work (IS/Programming), that is to be able to go into that logical mind-mode of thinking instantly, and to be able to go into that deep mind-mode of thinking in key situations.

Anyhow, I think we are at the point we need to teach computers how to process natural languages (there's already a jump start in this area), and be able to break that down into meaningful information. Think of the possibilities of systems with this break through. If you read the book, "I,Robot" by Isaac Asimov, which I read in my English Composition class, you could see this break through would take it to that next major step towards systems doing the major decisions as depicted in the book.

Salil Kalghatgi, Characteristics of a good data scientist
From my experiences, beginners need a few key skills to hone their data scientist prowess. I would argue that it is not mathematical expertise which identify good data scientists, but instead creativity and communication. Having said that, I also see these skills in most good mathematicians (some just prefer pursuing theoretical problems). Creativity is hard to learn.

Ravila, How long for data scientist
I voted for 5-8 years of experience post undergraduate degree. Though this field is not closed off to anyone the people who can catch up with it quickly are: applied math, statisticians, physics, operations researchers, computational social scientists, computational biologists ...
Basically fields that require mathematical modeling, coding skills and ability to perform in depth data analysis ... this is just to get started.

The real distinguisher is being able to bring all these tools together, with domain experience, under time constraints so as to impact the operations/processes of the organization. This simply requires experience.

Mahdi Nasiri, experience
I think that data analysis needs basic knowledge. we can study this knowledge in university or courses but in real world it has more and more challenges.
I work about 6 years in data mining and in any year I think I knew very low knowledge about it in last year because I have had good experience on real data and more than 50% of data mining work in data mining is data analysis and data cleaning. so I think data analysis or data mining need more years maybe you must be Nuha and have 900 years old :))

William Ross, How long to become a good data scientist
It depends, of course, on what constitutes a good data scientist. I would place considerable importance on the ability to pair data of all sorts with appropriate and insightful models which inform the questions being asked. This takes considerable and broad experience - not just knowledge of computational methods. So I would tend to say something in the 4- 8 year range, depending on the intensity and diversity of the experience.