Data science job market – what it’s like

Data scientist interviews can be complex and there is no definite recipe for the success. Understand the complications and processes of an interview and what you should be careful about before accepting the offer.

You should spend some time memorizing formulas such as binomial probabilities, Bayes’ Rule, and so on. Be acquainted with the most common probability distributions. Understand and be capable of explaining model fitting procedures like stochastic gradient descent and maximum likelihood as well as model evaluation metrics. Be prepared to answer how you would implement these things ‘from scratch’ without the help of a package or library. William Chen’s probability cheatsheet is quite good for this.

Be prepared to talk in detail about any of the projects listed your resume. At a recent interview, I spent the bulk of one session answering questions about the oldest project listed on my resume, and I was pretty rusty on the details. However, I really enjoy both asking and answering questions like this, as they give you a chance to demonstrate your mastery of a topic. The one caveat, though, is that you will always know more about these things than the person asking about them, so don’t gloss over “obvious” details.

Depending on the kind of role you’re applying for, there may also be more product-focused questions. This is something that’s more difficult to prepare for but may be very important, especially if the role is highly integrated with a product team. Spend time using the product that the company produces (when possible), think about it as both a user and a data scientist. What is good or bad about the experience? How would you know from collecting data if the experience was good or bad? What are the kinds of things you would want to optimize for? What are both the short-term and long-term consequences of doing so? Is there existing instrumentation?

Questions you should ask

Through all of this interrogation and puzzle-solving, it’s important to remember that you are interviewing them as well. You need to find a good fit. You need to want to work there. I’ve been in several bad fits which have been difficult, but it’s also provided me with very specific questions to ask. Offer to sign an NDA if they say they can’t give you details on these answers. You don’t want to go into a new job uncertain about what you’ll actually be doing and working on. These questions include:

  • What does success look like for this position? How will I know if I am accomplishing what is expected of me?
  • What is the last project you shipped? What was the goal, how long did it take, what were the stumbling blocks, what tools did you use, etc.
  • What will my first 90 days in this role look like? First 180 days?
  • Who will I report to and how many people report to that person? Do they have regular 1:1 with their team members?
  • Why did the last person who quit this team leave? The company?
  • If a startup, how long is your runway? How are financial decisions made?
  • What would be my first project here? Has someone already been working on this or is this in the aspirational stage?
  • What is the current state of the data infrastructure? How much does work need to be done on getting the infrastructure and pipeline into shape before we start analyzing that data?

One note — if the company you’re interviewing with doesn’t leave ample time for questions and answers with each and every interviewer, this is a red flag. They do not see this as an opportunity for you to assess fit, only an opportunity for you to demonstrate your worth. Be very, very wary of this. I once had an interviewer at a prestigious company say to me, “OK, we have three minutes left. I’m trying to decide if I’ll ask you another question or if I want to leave time for you to ask questions. I think I’ll ask you another question.” Instant bad experience.

The troubling reality of interviews

Most companies are bad at hiring. They’ll treat you skeptically and make you prove “you can code” as if an existing body of work isn’t enough. They’ll make you solve problems by hand that you haven’t solved by hand since your undergrad days and probably wouldn’t solve by hand today because that’s how stupid mistakes get made. They’ll make you solve ridiculous problems that don’t reflect the actual day-to-day work of the position. They’ll say they do this to see “how you think” or “how you approach a problem”, but no one has any idea if these exercises are actually valid measures of those skills. The assumption that these kinds of questions actually measure some attributes is unspoken but widespread. Very little actual empirical work is done to see if these are actually good predictors of good employees.

For a great in-depth take on this, Ann Harter is a must-read.

It can be easy to internalize your performance in interviews as an overall reflection of your abilities as a data scientist and even as your worth as a person. I’ve done this. There is some signal in there — you can identify gaps in your area of knowledge, or at least identify things that you need to focus on learning in order to pass the interview stage. It may be mostly a waste of time for your day-to-day work, but you have to play the game on some level. It’s unfortunate, but that’s where we stand right now.

As I’ve gotten older and more experienced, I push back in interviews. I ask questions about what the purpose of a problem is or state that I don’t think this is a good evaluation of my skills or abilities. Some people probably see this as me thinking I”m “too good” to answer the questions everyone else has to answer, but I see it as doing my part to be a critical thinker about evaluation, prediction, and hiring. Hopefully you’ll do this too and, as more of us are in a position where we are building teams and hiring, we’ll think more carefully about what we’re trying to accomplish and how we can get there instead of just copying the same patterns that have been around for years.

Bio: Trey Causey is a Data scientist and computational social scientist, currently at Dato in Seattle, WA. Experienced in using statistics, machine learning and applied natural language processing to solve interesting problems across multiple domains.