2016 Silver BlogHow to Become a Data Scientist – Part 1

Check out this excellent (and exhaustive) article on becoming a data scientist, written by someone who spends their day recruiting data scientists. Do yourself a favor and read the whole way through. You won't regret it!



Computer Science / Software Engineering

If you have studied artificial intelligence/computer science to a high level, then it is likely you are already in a good position for Type B data science. But there is the other well-trodden path to consider: the experienced software engineer who wants to move into data science.

Software engineers

A software engineer might, or might not have experience in machine learning – it depends. But either way, this background is clearly more suited to Type B data science, which requires a solid grounding in software engineering principles. I discussed this with James Petterson who is a Senior Data Scientist at Commonwealth Bank of Australia (and previously a software engineer), and here is what he said on the matter:

“A lot of data science work is software engineering. Not always in the sense of designing robust systems, but simply writing software. A lot of tasks you can automate and if you want to run experiments, you have to write code, and if you can do it fast, it makes a huge difference. When I did my PhD, I had to run tens of thousands of experiments every day, and at this scale, it wasn’t possible to do them manually. Having an engineering background meant I could do this with speed, whereas a lot of the students from other backgrounds struggled with basic software issues: they were really good at mathematics but implementing their ideas would take a long time”

And Dylan added:

“Good software engineering practices are so valuable when you want to create a robust implementation of a machine learning algorithm in a production environment. It’s all sorts of things – like maintainable code, a shared code base so multiple people can work on it, things like logging, being able to debug problems in production, scalability – to know that once things ramp up, you’ve architected it in such a way so that you can parallise it, or add more CPU, if needed. So if you’re looking for the type of roles where you need to get these things into a platform, as opposed to doing exploratory research or answering ad-hoc business questions, software engineering is so valuable”

I think that says it all, but to summarise: if you are a software engineer with a good disposition for mathematics, you are in a great position to become a (Type B) data scientist, providing you are prepared to put in the work to master statistics/machine learning,

Mathematics

To make an obvious statement: mathematics underpins all areas of data science. Therefore, it seems reasonable to expect that many mathematicians are now plying their trade as data scientists. However, there are relatively few coming directly from mathematics, and this peculiarity peaked my interest.

One explanation is that there are fewer graduates from mathematics (both pure and applied) compared to the other relevant fields of study, but this fails to tell the whole story. And so to dig deeper, I turned to Boris Savkovic, Lead Data Scientist at BuildingIQ (a start-up that uses advanced algorithms to optimise energy use in commercial buildings). Boris has a background in Electrical Engineering and Applied Mathematics and having worked with many mathematicians in his time, he provided the following insights:

“Many mathematicians have a love of theoretical problems, beautiful equations and seeing deep meaning in theorems, whereas commercial data science is empirical, messy and dirty. While some mathematicians love this, many hate it. The real world is complex, you cannot sandbox everything, you have to prioritise, appreciate the incentives of others, compromise the math and technology for short-term vs. medium-term vs. long-term, worry about diminishing returns (80/20 rule) and deal with both deep theory and deep practice, and everything in-between. In short: you have to be flexible and adaptable to deal with the real world. And this is ultimately what commercial data science is about: finding faster and better practical solutions that make money. For those with heavy mathematical/theory backgrounds who want to understand everything to the last degree, this can be very difficult, and I have seen a number of mathematics PhDs struggle badly when transitioning from research/academia to commercial data science”

It is important to note that Boris was referring more to pure mathematicians, and he added that he has also worked with many excellent applied mathematicians in his career. This seems logical because pure mathematics is likely to attract those with a love for the theory, as opposed to real world problems. And theoretical work won’t involve much interaction with data, which is – you know – quite important for data science.

There are exceptions of course and it ultimately comes down to individual character, not purely what someone has studied. And clearly: a lot of what mathematics graduates learn is highly transferable, so picking up the specific statistical/machine learning techniques shouldn’t be too difficult (if not already known).

In terms of suitability, most mathematicians are probably best equipped to learn the tools and theory for Type A data science. However, there are mathematicians who study computer science (theoretical computer science is essentially a branch of mathematics) and so people with this background may be more suited to Type B data science.

There is an important lesson to take from all this, and it comes down to understanding the reality of what commercial data science involves. If you truly understand the challenges and that is what you are seeking, then go for it. But if you have a love for the theory more than the practical application, you might want to reassess your thinking.

The Blank Canvas

If you are just starting out, perhaps you are in school, you enjoy maths, science and computing, and you like the sound of this thing called data science, well good news: you can choose your path without being constrained by a pre-existing background. And there are now a number of specific data science related courses, which cover both computer science and mathematics/statistics. Just be prepared for the long haul; you will not become a data scientist over night, as we will see in Part Two, where we will be examining: how to learn.

Bio: Alec Smith is a specialist recruiter within the field of data science and engineering. The position of an agency recruiter offers a unique, cross-sector perspective of commercial analytics and he leverages this viewpoint to write about various topics within data science, technology and hiring. Originally from the UK, he is currently plying his trade in Sydney, Australia. Follow Alec on Twitter @dataramblings.

This post was originally published on Experfy's Blog.

Related: