Exclusive Interview: Todd Holloway, Data Science Lead, Trulia
We discuss the responsibilities of Data Science and Analytics teams, significance of programming knowledge for data scientists, important soft skills, talent landscape in data science and more.
Todd Holloway leads the Data Science team at Trulia, which is applying machine learning, network science, NLP, and computer vision technologies to the large datasets found in the real estate domain. He received a liberal arts education at Grinnell College, and studied AI and data visualization at Indiana University. He previously worked on AI and Visualization at AT&T, Ingenuity Systems, and TheFind.
Todd recently delivered a talk at Big Data Innovation Summit 2014 held in Santa Clara on “Building a Data Science Team”. Over the past three years, Todd has built up a data science team to create new products and discover actionable insights from Trulia's big data. His talk described the decisions made in assembling the team, and what has worked and not worked over these past years.
Anmol Rajpurohit: Q1. What are the key differences in the roles and responsibilities of Data Science team and Analytics team at Trulia?
Todd Holloway: Let's talk about the similarities first. Both are composed of super talented people who get excited about many of the same things--data, stats, machine learning, visualization, and so on. We have other teams of data scientists at Trulia as well, including the Geo team (maps) and Economic Research team. The Data Science team is situated in engineering, and its role is to tackle those projects where the modeling component is interwoven with engineering challenges. These projects are usually product improvements--recommendations, search relevance, fraud detection, image classification--but also can be predictive models for internal use, such as sales efficiency.
AR: Q2. During your talk you mentioned that "everyone on the team should be able to code". Why do you think programming is a mandatory skill for data scientists? Do you consider it sufficient to be able to write pseudo-code?
TH: We like the approach of having the same one or two people work on a new data science project from end to end. This means our data scientists are involved in all aspects from conceptualization to experimentation to deployment and maintenance. This approach gives a sense of ownership, and leads to timely completion with good modeling decisions as they relate to deployment and maintenance. So our data scientists must be excellent engineers. However, even in organizations where a data scientist focuses only on experimentation or analysis, strong coding skills may lead to better solutions versus being limited to tools and stats packages.
AR: Q3. What motivated you to work in data science? How has the data science field changed since you started your career in it?
TH: I started studying artificial intelligence (and data visualization) about 12 years ago. And like many in AI, I found that the skills I picked up along the way--machine learning, complex systems, knowledge representation, and so on--had broad application. The big thing today is just how mainstream it has all become. Particularly in the last few years.
AR: Q4. What soft skills do you think are the most important for practitioners in the field of data science?
TH: Presentation skills are a big one. Some data scientists will pour themselves into their work to great effect, but when it comes time to discuss it, what we get is a truly haphazard power-point. There are many outlets for data scientists to practice presentation skills. You can give a recorded talk at a meetup, local conference, or boot-camp, and watch yourself. Or ask your PR department to record and critique you.
AR: Q5. What is your favorite interview question when hiring a data scientist?
TH: During the first phone screen, we like to ask non-technical questions that are revealing of a person's passion and tenaciousness. We'll ask something as simple as "of all your past projects, what's your favorite?", but look for them to absolutely light up when describing it. We think inspired data scientists build inspired products.
AR: Q6. Is "talent crunch" a real problem in data science? What has your personal experience been around it?
TH: I hear about it, but I don't see it firsthand. If anything, I see the opposite--I meet many data scientists unable to break into a quality role. But being in the tech industry in San Francisco is likely like living in a bubble. Our local data science meetup has 5500 members (meetup.com/Data-Mining/). We're probably a bit (or a lot) ahead of the curve here. Regardless of whether there is currently a shortage across all industries, what is obvious is that there is a huge secular trend towards more data collection and data-driven products, and more people focusing their careers on the skills involved in creating those products.
AR: Q7. Do you think Data Science can be learned through self-teaching using online resources and working on DIY projects?
TH: I think so. Having high quality materials online and outlets to practice data science (e.g. Kaggle) is such a new phenomenon that I still don't meet many data scientists who took that road. I expect that will change.
AR: Q8. What was the last book that you read and liked? What do you like to do when you are not working?
TH: "Infinite City: A San Francisco Atlas". I spend much of my free time savoring San Francisco, and all the awesome people and culture here. I feel incredibly fortunate to get to be a part of this city!