Cracking the Data Scientist Interview
After interviewing with over 50 companies for Data Scientist/Machine Learning Engineer, I am going to frame my experiences in the Q&A format and try to debunk any myths that beginners may have in their quest for becoming a Data Scientist.
By Ajit Samudrala, Data Scientist at Symantec
After completing my Data Science internship at Sirius in August 2018, I have started searching for a full-time position in Data Science. My initial search was haphazard with mediocre resume and Linkedin profile. Unsurprisingly, it took me a month to start the ball rolling. After 40 days into my search, I received my first response from Google for Data Scientist position in one of their Engineering teams. I was simmering with excitement, as I didn’t expect a call from Google even in my wildest dreams. I couldn’t make it to onsite but it was a great learning experience. Thereafter, I interviewed with Apple, SAP, Visa, Walmart, Nielsen, Symantec, Swiss Re, AppNexus, Catalina, Cerego, and 40 other companies for Data Scientist/Machine Learning Engineer. Finally, I have joined Symantec in their Mountain View campus. I am going to frame my experiences in the Q&A format and try to debunk any myths that beginners may have in their quest for becoming a Data Scientist.
1. What was the toughest part of the whole job search process?
The erratic nature of the interviews. Data Scientist is a very generic term; I have seen several different flavors of it during my job search. For example, the position I interviewed with Google is primarily focused on Statistical Modeling and Experiment Design. On the other hand, the interview with Cerego is based on Deep Learning and NLP. There are some companies which put as much emphasis on software development and coding as on Data Science. I mostly found Deep Learning roles demand decent knowledge of software development.
The problem with interviewing all types of roles is one may become a jack of all trades master of none. It happened with me; I found myself working on statistics for a day and jump to ML/DL on the other day. While it is good that you are learning both the worlds, it takes longer to get a firm grip on both.
2. What was the best part of the whole job search process?
I interviewed with big companies to startups across the domains. Getting to know how these companies/teams are using Data Science to solve business problems was an eye-opener to me. I was particularly blown away by a couple of use cases, one in healthcare and the other in robotics. At the end of my job search, I felt I am going to miss those intriguing introductions with the managers.
3. What are the primary skills required to ace the interviews?
As suggested in the first question, the skills required vary for each role. Nonetheless, I strongly recommend having a strong foundation in SQL, Python, ML/DL, Statistics, and Distributed Computing. Having a decent knowledge of Computer Science fundamentals like Algorithms and Data Structures is a huge plus, especially if you are interviewing with technology companies. Once you get the fundamentals right, you can deep dive into your area of interest.
4. There are thousands possibly lakhs of online Data Science resources/mentors/podcasts available. Where should I learn?
Don’t fall prey for all the brouhaha that surrounds Data Science/ML. There are many institutions, people, and books that claim to teach ML in 100 days or 3 months. Though I cannot comment on the authenticity of those claims, I strongly recommend staying away from them and sticking to popular and reliable sources like Coursera, MIT OpenCourseWare, Stanford Online Courses, NPTEL, etc. Moreover, you need not shell out any bucks for learning from these sources.
5. Where should I spend the majority of my time apart from learning concepts?
Spend your time on stats.stackexchange and Kaggle. Use StackExchange to improve theoretical knowledge and Kaggle for the application. Many concepts in Statistics/ML are not very obvious, so blatantly search for most basic questions like “Does Random Forests always performs better than Decision Trees?”, and I bet there are some Samaritans who might have answered many such questions at length on StackExchange.
That being said, I have seen many Data Science interviews have been shifting their focus from testing theoretical knowledge to the application part through take-home assignments and case studies. Kaggle comes in handy as there many beautiful kernels which elaborate on the thought process and approach in arriving at the solution.
6. How to ace take-home assignments and case studies?
Most interviewers look for your approach rather than results in take-home assignments. So it’s ok to be creative and fail.
Prepare an ML template with all reusable functions. Make an API on top of Scikit-Learn and Matplotlib so that you can quickly perform EDA and build basic models. Once you are done with the basic models, you can start being creative by stacking different models or using one model prediction in the other or any other crazy stuff that may raise the eyebrows of an interviewer. If it works, it is well and good. If it fails, you will still get marks for trying something different.
Regarding the case studies, the best sources according to me are the official data science blogs of companies like Google, FB, Twitter, eBay, Zillow, etc. By reading these blogs, you get an understanding of how these companies tackled a business problem with Machine Learning/Statistical Modeling and the challenges they encountered in the due course.
7. How to stand out of the crowd?
There are several different ways you can do that. According to me, developing a skill to read, digest, and implement research papers puts you at the forefront of the crowd. Though it is a herculean task for beginners, starting with simple papers by implementing easy components of it is a way to go. Initially, I used to struggle to read a research paper but after a few months, I was able to at least implement basic components from them.
Writing articles on popular data science blogs may also help you to gain some extra points. Chose a topic that isn’t much explored or give a unique perspective of a concept to reap maximum benefits from your work.
8. How important is domain expertise?
In certain fields, the interviewers may give considerable weight to domain expertise. However, for an entry-level data scientist, I believe it shouldn’t be a problem and one shouldn’t hesitate to apply for a Data Scientist role in any field.
9. Should I apply for jobs with exorbitant skills listed on JD?
Yes, you should. I received calls from roles whose JD consists of atypical terms like VAEs, GANs, Transformers, NLU, RF learning, C++, etc. Though cracking them may be difficult, you may still come out with a good learning experience.
10. Should I care to do any certifications or invest time in learning any tools?
While it is good to be certified, I don’t think it will add any substantial value to your application, atleast in Data Science. Regarding tools, it is good to have knowledge of popular deployment tools. Apart from them, spend your time mastering open source frameworks like Hive, Kafka, Spark, etc.
11. What is my future plan?
I will keep learning from my work and highly talented co-workers. Also, I am planning to learn RF learning and Java. If there is any good course or book, please let me know. I will also try to be active on Medium and post my learnings.
12. Random Tips
- Speak the language of Data Scientists in interviews.
- Write clean and comprehensible code in takehome assignments.
- Practice easy and mid-level problems from LeetCode to rock Whiteboard round.
- Don’t get demotivated if you fail; I failed in 43 interviews before getting my first job offer.
- Be prepared for the unpredictability of the interviews. Many companies don’t have a streamlined process for Data Science interviews yet. One of the interviews I had was completely based on Bayesian inference.
- Don’t get horrified by all the crazy stuff you read on the internet. Most companies don’t need them or use them. Get the basics right first.
If there is any common question I missed, please comment under the post. I will keep updating the post based on your feedback. You can connect to me on LinkedIn here.
P.S: This article is targeted at beginners or who are trying for the first switch into Data Science.
Bio: Ajit Samudrala is a Data Scientist at Symantec.
Original. Reposted with permission.
- Think Twice Before You Accept That Fancy Data Science Job
- Why You Shouldn’t be a Data Science Generalist
- Netflix Data Science Interview Questions: Acing the AI Interview