Secrets to a Successful Data Science Interview

Are you puzzled as to what to prepare for data science interviews? That you are reading this document is a reflection of your seriousness in being a successful data scientist.

comments

By Himanshu Jain, Data Scientist / Machine Learning Engineer & Suresh Venkatasubramanian, Principal Data Scientist

Header image

So, you have been called to appear for a data science interview… But, are you unclear about how to crack data science interviews? Are you worried about rejections in data science interviews? Are you puzzled as to what to prepare for data science interviews? If you have above questions in your mind, then you have reached at right place. We are interested in sharing what we have learned the hard way based on our experience being interviewees as well as interviewers.

That you are reading this document is a reflection of your seriousness in being a successful data scientist. Being interviewed in Data Science, a field that is already wide, and rapidly expanding, is as much a challenge for you as a candidate as it is for interviewers. Interviews try to assess your Data Science competency in a few rounds that last for one hour each while your learning has been for an entire lifetime. This puts interviewers (we have been there!) in an unenviable position, namely, how to assess someone very passionate and knowledgeable –like you? Good interviewers consider you as a potential future colleague: they need your help in this. We believe that if you are the interviewer and we are the ones being interviewed, you would perhaps ask from us cooperation along the lines mentioned below.

How it Starts?

It all starts with what you have written in the resume. Expect questions to be asked based on anything that is related to Data Science from your resume. It is understandable that they are deep in your past (> three years) but don’t be fazed by them. We believe you have it in you to give a good acquittal of yourself on anything that is on your resume. Please revisit memory lane. Get reacquainted with your past. Given that you solved something with method X in the past, do you have a better method Y if you are to solve it now? Also, ask yourself: Am I writing something on the resume just so that the interviewer gets impressed but doesn’t ask questions based on them?If you are the interviewer, would you think it is fair? Note: this applies to tech blogs, github links present in the resume as well.

Problem Solving: Open & Close-ended

Your data science depth will be assessed based on your ability to solve problems. Such problems could be:

Given a requirement, how to solve it. E.g. Your business is losing customers, find causes and solutions.
Given a solution, could it be improved? E.g. Existing solution has a ‘conversion rate’ of 90%, how would you make it 91%.

Are you able to reuse your past approaches and make them work in a new but different context? Are you also in a position to rethink your old approaches in the context of what you have learnt recently?
These are open-ended questions, they may not have complete solutions within the scope of the interview. Remember, these questions are not for getting billion-dollar startup ideas or patentable ideas from you for free; far from it, these are purely for assessing your problem-solving skills. In fact, some of them may have resonance in data science problems the interviewers are trying to solve as part of their work.

Now that we have done with open-ended questions, here’s a sample of close-ended questions: derive back propagation for CNN, explain gradient boosting etc. Former helps the interviewer assess your reaction to unplanned circumstances, while the latter establishes a baseline filter rooted in standard understanding. Strive your best to ace both. Practise is the key.

Machine Learning

Data Science uses Machine Learning as one of the key techniques. Yes, it may also use neuroscience, behavioural economics, game theory, statistical mechanics, complexity theory, non-Euclidean geometry and myriad areas you are an expert in. However, problems you are expected to solve in the industry context require a sound knowledge of ML techniques as the primary skills; interviewers will be delighted if you are an expert in other topics, but please be an expert in ML as well.

This is how they will assess you on your ML skills.

Topics covered in standard Machine Learning courses and books: CS229 (Stanford), CS4780 (Cornell), 6–034 (MIT), PRML (Chris Bishop), Mining of Massive Datasets. <Include your favourite courses and books here.> Note: PRML is indicative, we don’t mean you must go through the whole book. Reinforce course content with readings from such standard books whenever possible.
ML methods that are mentioned in the resume but not as part of the courses or books.
Mathematics for ML: Linear Algebra, Probability & Statistics, Multivariable Calculus, basics of optimization (typically these are covered in standard ML courses)

Deep Learning

You are a deep learning ninja. Despite your strong feelings, why do we still think you need to have a sound understanding of ‘conventional’ Machine Learning also? ‘Black-box’ nature of deep learning models (you may entirely disagree) makes it difficult for the interviewers to know what you did versus what the model did. However, ML in its conventional form seems to be a good common denominator for all candidates. How does it look to be a person who has built a bi-directional LSTM but doesn’t know how SVM’s work? Please go ahead, dazzle the interviewers with your DL skills, but after you’ve proven a point or two with your ML chops.

You should know why you chose one model / algorithm / approach / architecture above others. Hyperparameters play a key role in DL. Develop a sound understanding of model tuning. Difference between a good model and a not-so-good one may lie in your choice of hyper parameters.

We recommend below material to go through before going for interviews:

Deep learning specialization(set of 5 courses) offered by deeplearning.ai at Coursera
Geoff Hinton’s Neural Networks course from Coursera for a deeper understanding of concepts.
A comprehensive deep learning course that covers encoders http://www.cse.iitm.ac.in/~miteshk/CS7015.html
For the long term, you may go through the deep learning book authored by Yoshua Bengio and Ian Goodfellow.

Note: We didn’t say you *must* go through all of these. Acquiring familiarity with the subject by perusing high quality content gives you confidence that is transferable across situations.

Experience with Tools

Companies give priority to candidates who not only articulate good data science solutions but also can efficiently implement using the right tools. It will be great if you can make yourself comfortable with the tools available in the industry. You should be at least aware of python or R from the language point of view, scikit learn library for ML algorithms, Keras / tensorflow / pytorch / caffe for deep learning. Do not neglect querying languages eg. HQL,SQL and distributed frameworks like hadoop and spark. Good organizations have training programs for all these skills. Selected candidates often go through these training courses. Familiarity with the above tools gives your candidature the much needed edge. By all means undergo training after you get selected. This will give exposure to the problems the organisation solves and opportunity to interact with other scientists in the organization. These are very important experiences to acquire as a new-joiner.

Productionization and Deployment of Models

Interviewers expect you to know how a machine learning model is used, how it is productionised and deployed and the overall end-to-end architecture of your model. It is very difficult to hire a person who does not know how his model will be used by the client/service/customer/product. Without deployment, what you have built is a POC. At the very best, it works on your laptop for demo purposes. For your ML model to be useful it needs to be part of a software pipeline. Know how to interact with engineers towards deploying models. Be prepared to roll up your sleeves and get into the act without anyone prompting you. By doing so, you are enhancing your value and standing in the team.

Life cycle Management of Models

As a great data scientist, you should understand the full life cycle of the ML model. You should know how your model should change when the world it tries to model changes. (This happens more frequently than you may think, for the real world isn’t under your control.) You will decide how frequently your models need to be retrained for them to remain fresh. You may also want automate the retraining process for saving your precious time. Re-purpose the time saved in iterating model building towards improving performance. Create opportunities for yourself to explain how you’ve managed a model through its life cycle.

Debugging ML Models

Debugging is a very important skill in software industry. A software is highly risky if it cannot be debugged. You should understand your model in depth. You are expected to be aware of the internals of the algorithms that you have used. You should know how to do root cause analysis and debug your models. Interviewers expect you to know how you will improve the results when models have high bias or high variance, what you will do to avoid exploding gradients and vanishing gradients and how you will optimize memory during training etc.

Why it is still a good idea to reverse a linked list?

While it is true that Data Structures and Algorithms are outsourced to packages, your ability to make inferences from data is aided by your understanding of algorithms and data structures. While interviewers won’t ask you to implement skip lists or balance k-d trees, they would still expect you to understand order complexity of algorithms, be familiar with basic data structures such as linked lists, stacks, trees, hash-tables and heap, and be comfortable with algorithms such as sorting, shortest paths, string processing and the like. In other words, hygiene questions from data structures and algorithms. You may think it is inappropriate to judge you through this lens, but remember you are making it easy for the competition. We are sure you are more than up to it.

Know the company

The very fact that you have been called for an interview is an indication that your prior experience has been considered as a potential fit by the HR and by the HM (hiring manager). Don’t stop there! It is your responsibility to research and know what the group you have been called for interviewing does and how you can add value — this is a great distinguisher.

What to ask when asked, Do you have any questions?

Beware, this is no invitation to cozy up with the interviewer. Be formal and polite as you’ve been throughout the interview.

This is a great opportunity to display your understanding of how to function as part of a data science team. It’s also an opportunity to direct the interviewer to your strengths that weren’t covered in the interview. For example you might be good at code reviews. You could enquire how code review works. You may highlight your favourite approach, say walkthroughs. You might want to know the dev platforms used. You may ask if there are any open source contributors in the team (this is the time to re-emphasize if you are one). You could ask about the delivery cycle. If the team is distributed across timezone, check how the interactions work. Share your experience working in such teams if applicable.

Ask if it’s OK for you to know at a high level the project(s) the interviewer is part of. These are far better questions to ask than wanting to know about the WFH policy or how the typical day for the data scientist is like. Phrase these questions carefully, for you aren’t the interviewer! Of course, it’s best to ask just two questions as the conversation that ensues do not give you more openings than that. So chose them carefully and be guided by the interview context. Most importantly, be a patient listener, that’s a key to memorable conversations.

Note: Don’t try to hypnotise the interviewer by going over the top. As always, be dignified. Remember, no job is greater than your dignity.

Some Helpful Tips

Please get clarifications. If you think the question is ambiguous, ask the interviewer clarifying questions. In fact, this is an important skill for data scientists, namely, understanding requirements. Sooner you ask these questions the better, for it saves time, which in turn helps the interviewer assess you better.
Don’t move discussions away from ML topics. Assume that you are not prepared to go deep in a certain topic, but the interviewer is questioning you in that direction. Believe us, it is not a good idea to deflect the conversation for three reasons.

This is unethical, period.
If it works and you get selected, you might be given work related to something that you weren’t prepared to go deep into during the interview.
It doesn’t work in most cases. Interviewers are as smart as you are, and they know when you are trying to avoid or deflect a conversation. You end up creating a poor impression.

It is honourable to admit that you are not very familiar with the topic. You may still want to try based on first principles if possible and also ask clarifying questions. Else no worries, the interviewer will move to another topic from your resume.

Handling Rejections

In case you do not make it this time, rest assured the interviewers didn’t do it for fun. Companies invest serious time and money in evaluating you, and if you do not make it, it simply means that you need more preparation. Youare not rejected, only your application is. Also, consider the fact that, in addition to you, many other smart and capable candidates are interviewed for the position you have applied for. It is in the very nature of the process to select only a few candidates. If it is not you, it is not the end of the world for you. (Frankly, do not give anyone that kind of power over you.) Note down the questions that were asked of you without any judgment. But don’t analyse yet; allow one or two days to pass for you to regain your composure. Then reflect without rancour how you could be a better candidate next time. Identify opportunities for improvement and put in a plan of action, and start acting on it, for it is of no use living in the past or in a state of inaction. Companies very much expect you to apply again in the next cycle (check with the HR about the period) –many of us have applied more than once before getting selected. Your HR contact also would strive to provide feedback, but please… please… please… do not start a rebuttal chain. Behave like a data scientist in such situations: If the data (interview result) is at variance with your hypothesis (your preparation), move to a better hypothesis, i.e, be better prepared next time.

OK, this may sound a little philosophical. However we think it is practical and important in making you a better candidate and a better person in general. Develop two kinds of self-awareness: Internal and External. Former is about how aware you are of yourself; latter is about how aware you are of others’ perception of you. It should be clear as to how these two come together in creating a fruitful interview experience for you. We recommend you to go through Insight by Tasha Eurich.

What is a fair interview?

You are perhaps thinking, this seems like a wordy and a long list filled with synthetic friendliness sprinkled all over that places a lot of demands on me in a contrived manner (that isn’t the intention, and it is not artificial friendliness either; sorry about the wordy part though). You might be wondering what exactly you would get in return as part of a fair interview process. Here’s what we offer to our candidates. You may expect similar experience from other reputed organizations as well.

Fair assessment

No questions that map to: I have thought up a 10-digit random number, guess it correctly. No, we don’t expect you to give the answer we have in mind; as long as it is a right answer, your answer is as good as ours.
We do not project our personalities on you. E.g. I find it easy, hence you ought to find it easy as well.
Expect answers the moment the question is completed. It is perfectly fine for us to listen to your silence or loud thinking before you answer.
No trick questions or misleading questions.
What we came to know two days ago, we don’t expect you to know them for a lifetime. Hence, despite we having read a bunch of great Medium articles recently that really made us feel like geniuses, we wouldn’t be asking any questions based on them (sniff!)
A couple of bad answers will be more than compensated by a good overall performance.

Questions that have a context in your experience and knowledge

If your resume is rich enough for you to be invited for the interviews, surely it is rich enough for us to confine our questions to its content. Something not in the resume is good enough evidence that it is outside of your expertise. This excludes basic ML concepts.

Courteous treatment

Interviewers introduce themselves.
Interviewers inform you how the interview is structured.
Interviewers check with you if you need refreshments.
Interviewers pay attention to you.
Interviewers will never do anything unethical. E.g pass rude or sarcastic comments about your present or past organizations or educational institutes; Encourage you to breach your NDA’s.
Interviewers will never make you uncomfortable. E.g they won’t browbeat you or make fun of your answers.
Interviewers accompany you to lunch, or hand you over to someone else who would take you to lunch.

Wish you a great preparation. May the Force be with you!!!

We would love to hear your feedback. Don’t forget to share your experiences with us on how these tips helped you in cracking the data science interview.

Contact Us:

Suresh (Principal Data Scientist, Walmart Labs) — suresh.venkatasubramanian@gmail.com
Himanshu (Data Scientist, Walmart Labs) — himanshu6589@gmail.com

Disclaimer: Above views are personal and not to be construed as our organization’s stated position. However, we do expect any professional organization to have a fair selection process that is reflective of the points mentioned above.

Himanshu Jain is a Data Scientist / Machine Learning Engineer at Walmart Labs.

Suresh Venkatasubramanian is Principal Data Scientist at Walmart Labs.

Original. Reposted with permission.

Related: