Data science job market – what it’s like
Data scientist interviews can be complex and there is no definite recipe for the success. Understand the complications and processes of an interview and what you should be careful about before accepting the offer.
By Trey Causey.
Update: I was remiss in not originally including Erin Shellman’s excellent Crushed it! Landing a data science job, which is probably the best guide to preparing for the actual data science interview that I’ve ever read. It’s full of good resources, so read it first and then head back here.
Sooner or later you’re going to find yourself looking for a data science job. Maybe it’s your first one or maybe you’re changing jobs. Even if you’re fully confident in your skills, have no impostor syndrome, and have tons of inside leads at great companies, it’s a tremendously stressful experience. The process of looking for a new job is often one that occurs secretly and confidentially and then is so exhausting that discussing the process is the last thing you want to do. I hope to change that.
I recently went through this myself and thought I’d record my thoughts on the process while they’re still fresh. I interviewed a lot. Some went well, some didn’t go well at all. The reason for this was sometimes me, sometimes them, often both. Sometimes I didn’t get selected for an on-site interview. Other times I withdrew from the process after seeing that it wouldn’t be a good fit for me. I took notes throughout, though, and here they are.
Warning: What follows are my personal thoughts, extrapolating from a small sample, and generalization from anecdotes. Precisely the kind of thing that data scientists hate! But, despite the frequent misquotation, the plural of anecdote is data, so this discussion should start somewhere.
Interviewers: What do they want?
Many companies still have no idea what they’re looking for when they’re looking to hire a data scientist. As Robert Chang, a data scientist at Twitter, lays out in this superb post, there are two kinds of data scientists — those who are stronger at analysis (type A) and those who stronger at building things (type B). As things stand today, there seems to be a strong bias in hiring requirement for type B data scientists. Quite frequently, I encountered interviewers that were essentially looking for software engineers who knew a little stats / ML.
And that’s fine if the role requires you to mainly be a software engineer that knows some stats / ML. However, I think many interviewers default to this profile because they’ve been hiring software engineers for years and “know” how to do it (more on that below), so they fall back on that process when it comes to hiring data scientists. Simply put, however, if the job is analysis heavy, a technical screen that is almost entirely software engineering questions is not a good idea and won’t select individuals who are good for the actual work involved.
Everyone wants a “full-stack” data scientist but haven’t really reflected on whether or not this is what they actually need. Don’t assume that what the recruiter or hiring manager says they need is what they actually need for the role — this is especially true if you’re the first data scientist being hired.
No one knows everything and everyone has strengths and weaknesses. It should be fine to admit these weaknesses and identify the areas in which you’d like to grow. Good organizations will welcome this as both a sign of self-awareness and an opportunity to grow. Others will say “maybe we can work with that” and not call you back while they look for someone who either doesn’t need that help or doesn’t admit to it.
Interviews: The standard process
The interview process is mostly the same everywhere, with slight deviations. You’ll usually have some kind of initial phone call with a recruiter who will ask you some general questions about your skills and background. They may try to get you to offer a salary number at this stage. There are different takes on this, but I lean towards not discussing salary at this stage. They’ll say they just want to make sure that you’re in the same ballpark, but if they’re a reasonable company, they already know what market value is (better than you do, most likely). If you’re being referred by an internal employee, this call may or may not happen.
Next, you typically do a technical phone screen. This may or may not involve writing code over the phone or in a screen sharing environment. You may or may not have access to your usual development environment. It may literally just be a shared Google document. If you’re anything like me, this is pretty unnerving. If you don’t pass this screen, you won’t be advanced to the on-site interview.
It’s especially unnerving because interviewers often resort to ‘classic’ programming interview questions that are drilled into computer science undergrads but are often quite puzzling to non-CS-trained data scientists. Many will recommend reading and memorizing Cracking the Coding Interview for these types of questions but, if you are a type A data scientist, it’s also worth asking yourself if this is a good signal that they want someone with your skills. These kinds of questions include “reverse a linked list” or “invert a binary tree”. These are also the kinds of questions that will be done on a whiteboard during the on-site at some companies.
If you’re very nervous about this kind of question or don’t perform well in this kind of environment, offer to provide code samples that better reflect your ability and style.
Assuming you pass this session, prepare for the on-site interview, which will be anywhere from 3 to 7 hours of sitting in a single room and talking to people for 30-60 minutes each. I think the uncertainty of this portion of the process is very unsettling, but there’s not much to be done about that. I think the best places to prepare the candidate for what they will be talking about and tell the candidate what to expect in terms of the number of meetings & length.
Prepare to interview a lot if you’re really focused on finding a good fit for you.
Interview questions: What to expect
Prepare to be asked terrible questions. Some of the worst questions I’ve been asked:
- I have a random number generator. What number does it produce and why?
- Anything involving dice or urns
- Can you tell me the certain piece of trivia about parameter rho from distribution delta? (Names changed to protect the innocent PDFs)
- Here’s a problem it took my team six months to solve. Please solve it for me on the whiteboard.
The last kind of question can actually be really fun if it’s proposed as an honest-to-goodness brainstorming question and not one that you’re expected to “solve” by finding the same solution as the interviewer.
There may or may not be coding on the whiteboard questions (if they’ve read my post, maybe there won’t be). You should be prepared to talk about the complexity of your solution in terms of time or space. Depending on the interviewer, this may be as simple as talking about what you need to persist in memory or keep track of to talking about the big-O notation for your solution. If it’s a model, is it slow at training time? Prediction time? What are the trade-offs of your approach?