KDnuggets Home » News » 2016 » Aug » Opinions, Interviews » How to Become a Data Scientist – Part 1 ( 16:n31 )

2016 Silver BlogHow to Become a Data Scientist – Part 1


Check out this excellent (and exhaustive) article on becoming a data scientist, written by someone who spends their day recruiting data scientists. Do yourself a favor and read the whole way through. You won't regret it!



CHAPTER TWO: LOOKING INWARDS

Now we are making progress. Having successfully digested the information in Chapter One, you are nearly ready to begin formulating your personal goals and objectives. But first – some introspection is required – so grab a coffee, find a quiet spot, and have a deep think about:

  1. Why do you want to be a data scientist?
  2. What type of data science interests you?
  3. What natural capabilities or relevant skills do you already possess?

Why is this important? Simply put: data science is an expert field, so unless you have already mastered a lot of what we covered in Chapter One, it is not an easy (or quick) journey. There is an important message here, which addresses questions one and two: you need to have the right reasons for going down this path, otherwise – chances are – you will give up when the going gets tough (and it will).

To elaborate on this message, enter Dylan Hogg. Dylan was previously a software engineer and is now Head of Data Science at The Search Party, a start-up that has built a platform that utilises machine learning (NLP) to link employers with relevant candidates (the future of recruitment!). Considering he has made the transition from software engineering to data science (a journey he is still on), we discussed what it takes, and he said:

“Regardless of education or experience, there’s something more fundamental, which is your nature of curiosity, determination and tenacity. There are so many times when you hit a problem: perhaps the algorithm isn’t performing in the way it needs to, or perhaps the technology is being a pain. Either way, you can study machine learning algorithms or software engineering best practice, but if you’re not really determined, you're going to give up and not get through it”

There you go: you won’t just face problems when you are learning; you will face them continually in your working life, so you better make sure you are motivated for the right reasons, and not just because you think having ‘scientist’ in your title is cool.

But what about question three? Why do your relevant skills matter? Well, where you are starting from affects what type of data science you are most suited to, and what you need to learn for the area that interests you. So to answer this question sufficiently, it is necessary to explore the typical paths to data science, beginning with the wider scientific field.

Note: There are many quantitative disciplines where you will find people with the ability to transition into data science. I won’t cover them all here, but the point is this: if you take the time to really understand the different nuances of data science, you should be able to figure out how relevant your current skillset is, whatever your background.

Other Scientific Disciplines

Science

This is not the most common route to data science; statistics and computer science are, as we will consider next. But with scientists from many fields having highly relevant skillsets (especially in the world of physics), many have made this jump.

For an explanation on why, allow me to introduce the individual I alluded to within the introduction: Will Hanninger, a Data Scientist with Commonwealth Bank of Australia. In a previous life, Will was a particle physicist with CERN where he worked on the discovery of the Higgs boson (very cool), and this is what he had to say:

“In physics, you naturally learn a lot of what you need in data science: programming, manipulating data, getting the raw data and transforming it in a useful way. You learn statistics, which is important. And crucially: you learn how to solve problems. These are the basic skills needed for a data scientist”

So the skillset is highly transferable, with the main box ticked: problem solving. The differences tend to arise in the tools and techniques; for example, while machine learning is synonymous with data science, it is less common in wider science. In any case, we are talking about very smart people here; they have the ability to learn tools and techniques in a short timeframe.

I also met Sean Farrell for this project; Sean’s background is in astrophysics and he moved into commercial data science with Teradata Australia, where he wrote an excellent blog post on this topic: Why Science’s Loss is a Gain for Data Science. The following passage is particularly pertinent:

“Until recently there haven’t been any formal training pathways to become a Data Scientist. Most Data Scientists come from backgrounds in statistics or computer science. However, while these other career paths develop some of the skills listed above, they typically don’t cover all of them. Statisticians are very strong on the maths and stats side, but generally have weaker programming skills. Computer scientists are very strong in the programming arena, but typically don’t have as strong a comprehension of statistics. Both have good (yet different) data analysis skill sets but can struggle with creative problem solving, which is arguably the hardest skill to teach”

To avoid misunderstanding, remember the context here. Sean isn’t saying that all data scientists from statistics or computer science lack creative problem solving; the argument he is making is that science filters extremely effectively for problem solving, arguably more so than statistics/computer science.

Statistics

Depending on your perspective, statistics can be viewed as a mathematical tool that facilitates the scientific process, or alternatively: a science in itself. Given this ambiguity then, if you are coming from a statistics background, are you ready-made for data science? Semantics aside, it depends on a few factors:

  • Firstly, do you have experience with machine learning techniques? As we learnt in Chapter One, statistical modelling and machine learning are related, and they overlap in many ways. However the latter possesses significant advantages when applied to massive datasets, and with the adoption of machine learning continuing to rise in all areas of industry, it really is synonymous with all types of data science
  • Secondly, at the risk of repeating myself: what area of data science interests you? Clearly a statistics background is better suited to Type A positions, so if your goal is Type B work, you will have some learning to do
  • Finally, do you have practical experience working with data? Data wrangling is often a comparative weakness of those coming from statistics, and as we know: it is a crucial component of commercial data science