Interview: Joseph Babcock, Netflix on Curiosity and Courage – Key for Success in Data Science

We discuss discovery vs. personalization, advice, trends, desired skills in data scientists, and more.

Twitter Handle: @hey_anmol

jbabcockJoseph Babcock is currently a Senior Data Scientist working on Discovery & Personalization algorithms and data processing at Netflix.

Before Netflix, he studied computational biology at The Johns Hopkins University School of Medicine, where his PhD research in the Department of Neuroscience employed machine learning models to predict adverse side-effects of drugs.

He also previously worked at Chicago-based Accretive Health as a Data Scientist, focusing on data related to patient billing and referrals.

First part of interview

Second part of interview

Here is third part of my interview with him:

Anmol Rajpurohit: Q9. How do you determine what data is "good enough" for your hypothesis? Is it merely a statistical measure or more than that?

data-good-enoughJoseph Babcock: To me it’s a combination of statistical measurement and business acumen. In some sense, ‘good enough’ is always whatever data we might need to sufficiently understand user preference such that we could improve engagement with our service. On a technical level, we often have to temper the desire for perfect information with the practical effect noise might have on a model’s accuracy, and whether a corner case is frequent enough to change an algorithm’s behavior.

AR: Q10. Some people argue that too much personalization can curtail the chances of discovering great content outside of an individual's usual entertainment consumption history. What are your thoughts of maintaining the balance between Personalization and Discovery?

netflix-discoveryJB: Indeed, I think this is where model generalization comes into play: we want to personalize to not just what you already enjoy, but what you are likely to enjoy but don’t know about yet. It’s something we keep in mind with most evaluation metrics for our algorithms, and in the research phase of designing new models.

AR: Q11. What is the best advice you have got in your career?

pay-it-forwardJB: My dissertation advisor in graduate school once explained a business trip as less about getting a particular task accomplished and more about building relationships; if you only visit people when you need something from them, it cheapens the interaction. I’ve found this to be broadly applicable, that assistance for coworkers or professional colleagues is best to ‘pay-forward’. It increases the likelihood of receiving genuine mutual help in the future, and I think just makes a more pleasant work environment in general.

AR: Q12. Which of the current trends in Big Data arena are of great interest to you? Why?

mooc-massive-open-online-coursesJB: I am fascinated by the growth of Massive Open Online Courses (MOOCs) as a method of disseminating knowledge about statistics, machine learning, and technology to an increasingly wider audience. This is in some ways paralleled by the push for open-access journals in academia, and the combination of easily accessible educational materials and tools, dynamic documentation of work through Github, and discourse through online forums.

I imagine that this will stimulate a growing body of informal research on all kinds of data, be it consumer web logs or scientific experiments. Also, while we may frequently hear that there is a shortage of Big Data talent in the marketplace, I think the growth of such resources may lessen this gap by making it easier than ever for anyone with Internet access to discover and cultivate a passion for analytics.

AR: Q13. What key qualities do you look for when interviewing for Data Science related positions on your team?

einsteins-curiosityJB: Curiosity and courage: more than any tool or methodology, the ability to self-educate and critically question is a key success factor, especially in a rapidly moving organization like ours. Likewise, having a voice and being unafraid to back up (with evidence) potentially unpopular opinions is an important aspect of how we make data-driven decisions.

AR: Q14. What was the last book that you read and liked? What do you like to do when you are not working?

maddaddamJB: Margaret Atwood’s MaddAddam, the conclusion of an innovative dystopian sci-fi trilogy whose other volumes (Oryx & Crake, Year of the Flood) I also recommend. Atwood always expertly blends linguistic precision with social commentary and a dark sense of humor, and this piece was no exception.

Outside work, I have a long-standing interest in men’s fashion that often finds me browsing the local department stores on weekends. I also am a dedicated reader of current events and arts magazines (The New Yorker, The Atlantic, Vanity Fair), and am also enjoying exploring San Francisco’s varied restaurant scene with my wife. Also, watching Netflix!