Transitioning to Data Science: How to become a data scientist, and how to create a data science team
"A good data scientist in my mind is the person that takes the science part in data science very seriously; a person who is able to find problems and solve them using statistics, machine learning, and distributed computing."
By Amir Feizpour, Royal Bank of Canada
It is difficult to define data science these days: every company claims to be doing data science and everyone claims to be a data scientist. Practitioners are puzzled by their fuzzy job descriptions, and people who are trying to become data scientists are frustrated by the lack of standard definitions. In this conversation, at Toronto Machine Learning Summit 2017, we have tried to demystify data science and clarify what it means to be a data scientist.
Data science is applying scientific methods to understanding data in order to solve business problems. “A good data scientist in my mind is the person that takes the science part in data science very seriously; a person who is able to find problems and solve them using statistics, machine learning, and distributed computing.” said Amir Hajian, Director of Research at Thomson Reuters Labs. In other words, data scientists are people who can “reason through data using inferential reasoning, think in terms of probabilities, be scientific and systematic, and make data work at scale using software engineering best practices,” said Baiju Devani, Vice President of Analytics at Aviva. He added that It is important to recognize that, “there is no deterministic path to the problems you're solving or the solutions you find, so you have to be ok with fuzziness,” He also states that you need to, “have that experimental mindset that allows you to work with vague problem and solution definitions.” In some sense, the best data scientists are people with “good statistical knowledge, programming and technical skills, and industry experience” according to Lindsay Farber, Senior Data Scientist at MoneyKey.
Ozge Yeloglu, Chief Data Scientist at Microsoft Canada also reminds us that in business there is a need “to step back and get educated about what data science really is. It is the use of data to address business problems which sometimes means there is no need to use machine learning or artificial intelligence.” Part of the misunderstandings that exists around data science originate from the hype that exists. Every business is in rush to get into data science without taking the time to understand why and how. Big companies are in a fierce competition to hire as many data scientists as they can without taking the foundational steps needed. “If you have 100-200 data scientists (compared to say Facebook who has 500-600 data scientists), you are in the wrong business”, said Baiju Devani. He added “while these organizations have big problems to solve through data science or machine learning, they have not thought about operationalization to bring those solutions at scale.” On the other hand, there are startups who are too eager to solve everything with machine learning without thinking about the proper scale of the problems. “If you are a startup with a thousand clients, you should not be doing sentiment analysis instead of calling and talking to every one of your clients,” said Baiju Devani. Practitioners are also rushing into becoming data scientists without understanding what it means. “Everyone who has taken a few online courses thinks they have transitioned without putting nearly enough time and effort into it,” said Ozge Yeloglu.
The hype is not completely unjustified, however. There has been quite a few interesting business use cases that have been creating an excitement in the industry. “I really feel happy when our clients come to us with a real data business problem and we can help them with our resources while leveraging their domain knowledge; that makes me optimistic that the hype can get us to the reality sooner than later,” said Ozge Yeloglu. “Organizations want to find this new way of of doing business in a new direction that was not possible before,” said Amir Hajian. He added that practitioners want to be data scientists because “they want jobs that do not end, and are not repeating the same thing over and over again every day.” He said “it is the perfect time for people who are not computer scientists but want to get into the industry and do something good.” This is, afterall, a field that is not as rigid; “it is changing and it's finding its own way and it's evolving so you have to be ready for that and you have to find your favorite thing and your niche and and you have to keep going.”
Given all the uncertainty that exists, there are few things that can be done to increase the likelihood of success as a data-driven business. One of the most important steps is to “ask what the objective is and why you need data science, and you need to keep asking why until you get a good answer to do that and then the rest falls into place,” said Baiju Devani. Next step is to create an efficient data science team. The core of the team should come from the business itself, rather than any new hires. “If you can find that one person in your team who is a domain expert already and understands the data and most likely is already doing traditional Business Intelligence, and if they want to become the data scientist, let them be and invest in them,” said Ozge Yeloglu. Once that core is formed and depending on the priority, timeframe, and the complexity of the problem you might end up hiring people with various skill sets. Most of the time the best solution is to have a data engineer to build the infrastructure and then a data scientist on top of it in addition to the business domain experts. In fact, “for every data scientists there are other roles that you need: data visualization experts, developers, and UI/UX designers,” said Amir Hajian.
The composition and culture of a data team is indeed very important. That diversity of backgrounds and mentalities provides the right ingredient to be successful. “We've talked a lot about data science being a science but I think it's as much artistic and requires a lot of creativity and it is clear from research that diversity leads to creativity,” said Baiju Devani. “At my company I had to work with other data scientists in my team to come up with better answers,” said Lindsay Farber. “It is hard work that is beyond coming up with an algorithm: you have to clean the data, talk to the customer, build the whole model and then present it to the stakeholders, and be patient and come up with different ways of putting it in front of the customer and explaining it to them; that's where you see that people who have different technical backgrounds can make a huge impact because they are looking at things differently and and they have the expertise,” said Amir Hajian.This is especially important in terms of social, gender and racial diversity for companies in public sector or social innovation. “One story that I heard recently was about one group that built a model for detecting a disease and it worked great in training, but failed when tested in real life; their training data was all for white Caucasian and it was failing on all Latino population and African Americans and Asians; it was an eye-opener for companies who have social impact that if your team looks the same then they're gonna think the same,” said Ozge Yeloglu.
There are many nuances in creating a data science team that taking it loosely will certainly lead into failure. So far, it should be clear how different technical roles and expertise along with soft skills intertwine to create a team that can achieve great objectives in data science. Selecting individuals to join such a team is a great challenge and needs to be done with care. “To me team culture is one of the biggest things, so I look for the fit for my team and for our clients. In my team data scientists are technical people with very high communication skills who can talk with people ranging from data scientists to VPs and CIOs. A typical question that I ask in interviews is a very open-ended case study and see if the candidate is able to ask the right questions, and figure out what the right technical solution is, and I observe that thought process end to end.” said Ozge Yeloglu. She added “obviously, technical skill is important but as long as they know Python or R, I trust that they're gonna keep learning and improving. This is important because data science is one of those areas that keeps growing, so you can't really stay static.” Another important thing that hiring managers should look into in interviews is the ability of the candidate to brainstorm and work with them and with the team. “I try to have a conversation to see if you are able to hold a conversation around data science beyond whatever technically you are good at. If we go beyond that I tend to pick up a paper and send it to the candidate and ask them to come in for a paper review. I think that works well, like an oral exam, better than having a time-pressured write-your-sort-algorithm,” said Baiju Devani. It is also crucial to find people who are good at teamwork. “I want to stress that having the communications skill with the head of marketing or the head of operations and everyone else in the company is really important. You need to talk to them to understand their data needs and show them what they need to see even though they might not necessarily be able to describe it,” said Lindsay Farber. A great way to test for many of the important skills is to ask the candidates to walk the interviewer through one of their most interesting projects. “I have discovered the best candidates that way and it never fails because that immediately gives the candidate hands to show you what exactly they can do and what technology they know, and how they think about problems, and how they communicate. You can see that there are some people who can't do it and some people who come ready and open their laptops and show you what they have built,” said Amir Hajian.
Some of these characteristics come from personality but experience is also very important. That causes a big contradiction as there is almost no experienced data scientist out in the market. Majority of people looking for data science jobs are right out of school and therefore don't have the business experience needed. “If you're lacking that industry experience I think doing Andrew Ng’s course and learning Python or working with really clean unrealistic data sets from Kaggle is one way to do it, but I think to build on that you need to get out there, come to meetups and summits, talk to other data scientists to find out what they do, and take a course in person. That helps you understand what different types of data scientists do,” said Lindsay Farber. She also thought it is important not to aim too high for the first job; “just get into a company at a junior level and work your way up. When you show interest and work hard, you will have that data analytics and data science background and the business insight to create a job for yourself.” It is also crucial to tell others what is special about you beyond things that every can do like taking online courses. A good way to do that is “to pick a problem that you're interested in and start building something around it. Teach yourself everything that you need to solve that problem. That makes the process of learning fun and you'll start learning all the technologies and the science you need as a data scientist”, said Amir Hajian. You can do that if you are coming from academia or if you are already working in industry and looking into switching to data science. “You should find that one extended project that is not in your job description but it will push you to get into the data science world. You can actually use your company's data, real-world data, and get experience that way and then you can show that you have the potential and desire,” said Ozge Yeloglu. If you have a Kaggle competition dataset, go ahead and think outside of the box and figure out what else you can do with it. Even augment that with so many open-source datasets that exist and hopefully something interesting comes out of it.
We talked about the cultural match of a candidate and a company previously, and how companies monitor for that. The other side of the coin are people who are considering jobs and companies to join and they should ask questions in interviews or even before getting to the interviews when they're collecting info about the company. “The most interesting thing I heard when I was looking for a data science job was that when the interviewer is interviewing you, you are also interviewing the interviewer. That is your chance to find out what life is going to be like at this role. The important questions are: ‘who am I going to work with?’, ‘what is the team like’, and ‘what are the problems that you're solving’. Ask them to walk you through the most exciting project that they have recently done and and listen very carefully and ask a lot of questions about it because that will show you everything about that company and that team”, said Amir Hajian. An important point to remember in such conversations is that people work for managers not necessarily the company. “Your manager is really the immediate person that you're gonna deal with every day and they're gonna decide your success in that company one way or another. So I think it's really important to get a call with the hiring manager. Even if you're not lined up for that call ask the recruiter for it. Then actually learn from that person and and see if you connect with them,” said Ozge Yeloglu. She added “I really love it when someone asks me about the growth opportunities because that tells me that they are aware that this is not an easy job and they have to really keep learning every day.” It is also important for people to research what the company and the team they want to join do beforehand. “Go on YouTube and see if any of the data scientists have given talks recently and watch all of them; see if they've published any papers read them and so that you bring that up in the interview instead of them telling you about it and then you can showcase your knowledge,” said Lindsay Farber. It might be slightly harder but you should also “try to figure out if you can talk to people who the data science team works with closely whether that's on the business side or most likely from the engineering side. Figure out how well the engineering and data science teams work together and what kind of dynamics there are,” said Baiju Devani.
Finally, it is important for practitioners to have a realistic picture of what their lives will be like as data scientists. “If you're coming from academic background, remember that companies are not looking for this scholar person who knows everything and can talk about the philosophy of it. The company that is interviewing you has problems and there is a reason that they are interviewing you. Most probably they want you to help them with that specific problem. If you can figure that out and convince them that you are going there to help them to solve that problem you have that job. So be confident, figure out what the problem is, and tell them how you're going to solve it for them,” said Amir Hajian. Lindsay Farber added “you need to realize that you're not going to just wave a magic wand and solve everything with machine learning. 70% of the job is cleaning data and pre-processing. You also need to realize that you can have a huge impact doing just regular reporting basic statistics. You can take your company almost 80% of the way there and that you can then use hard core data science and build your own custom models to take that to the next level.” In some sense, “you're just a glorified SQL coder or business intelligence reporter most of the time. However, you got to be passionate about data and remember there are always ups and downs to any cycle, but if you are really passionate and excited about data then you will have a very interesting and fulfilling career,” said Baiju Devani.
here are huge opportunities for data scientists to interact and learn from each other. One of the best ways to find your best next job opportunity or hire is to network with like-minded people. Keep an open mind, and be respectful of everyone, and try to help others as much as you can, and they will come back and help you when they can. It is very crucial to establish good professional relationships and maintain them. “This might sound like a no-brainer to some of us but don't burn the bridges! For example, If you're going through an interview and if you realize it's not going the way that you were hoping don't get upset and defensive, rather ask them for their suggestions for things that you can improve,” said Ozge Yeloglu. This is a small community of intellectuals that works hard to improve its collective wisdom and achieve great things, after all.
Click here for details on the panelists.
Click here for a video of the panel.
Bio: Amir Feizpour is a Data Scientist at Royal Bank of Canada, and is an accomplished researcher with experience in physics, analytics, and data science.
- A Day in the Life of a Data Scientist
- Another Day in the Life of a Data Scientist
- Emotional Intelligence for Data Science Teams