KDnuggets Home » News » 2015 » Apr » Opinions, Interviews, Reports » Interview: Michael Li, Data Incubator on Bridging the Data Science Skills Gap between Academia and Industry ( 15:n12 )

Interview: Michael Li, Data Incubator on Bridging the Data Science Skills Gap between Academia and Industry


We discuss the response from hiring companies, recommendations for aspirants, retaining data science talent, advice, and more.



michael-liDr. Michael Li is Executive Director at The Data Incubator. Michael has worked as a data scientist (Foursquare), quant (D.E. Shaw, J.P. Morgan), and a rocket scientist (NASA). He did his PhD at Princeton as a Hertz fellow and read Part III Maths at Cambridge as a Marshall scholar.

At Foursquare, Michael discovered that his favorite part of the job was teaching and mentoring smart people about data science. He decided to build a startup that lets him focus on what he really loves.

First part of interview

Here is second and last part of my interview with him:

Anmol Rajpurohit: Q5. What has been the feedback from hiring companies?

hiring-companiesMichael Li: Our hiring partners love that we’re presenting them with talented folks that have been pre-screened and evaluated technically and see us as complementing the traditional recruitment agencies they already work with.  They also appreciate the opportunity to network with our Fellows in an informal setting before deciding whether or not to set up an interview.

AR: Q6. For the current PhD or Master's students aspiring to be a Data Incubator fellow, what would you suggest they focus on during their degree program? data-science-students

ML: On the technical side, being a data scientist is about combining math and computer science.  Having a strong background in mathematics and statistics is what allows you to interpret your findings from all this data.  Having a strong background in computation is what will give you the tools necessary to manipulate all this data.

AR: Q7. Once you have hired the best data scientists, the next challenge is to retain them in today's super-competitive hiring market. What strategies do you recommend to retain elite data scientists?

retain-data-scientistsML: The sparknotes version is to make sure you provide data scientists with adequate support in terms of technical infrastructure and continuing education, ownership over the work that they’re doing -- including interfacing with other groups within the company -- and visibility into their impact so your data scientists connect with your organization’s purpose.  You can check out this Harvard Business Review article I wrote to get the full story.

AR: Q8. In the recent few years, we have seen sharp increase in the number of programs and certificates offered by universities in the field of Data Science. What do you consider as the indispensable components of any data science related academic curriculum?

asking-the-right-questionsML: Obviously, we focus a lot on machine-learning algorithms (NLP, random forests, SVM etc …) and “big data technologies” (MapReduce, Spark, Scalding) but I think one of our biggest value adds is helping them ask the right question.  As any experienced data scientist knows, moving the needle is often not about applying the fanciest algorithm but applying a simple analysis to answer the right question.  That’s a tough thing to get coming out of academia and is a lot of what we’re trying to get fellows to do.

AR: Q9. What is the best advice you have got in your career?

adviceML: Industry is very “output” driven -- it doesn’t matter how you do it, as long as you get the right answer.  Academia is still very “input” driven -- e.g. countless dissertations are granted for intellectually interesting but ultimately impractical machine-learning ideas that don’t do much better than linear regression.  For people making that transition from academia to industry, one common hurdle is often: “the simple idea does really well … I clearly need to use something more sophisticated to impress my colleagues, even when there is little practical benefit from improving.”

AR: Q10. Which of the current trends in Big Data arena are of great interest to you? What do you think would be the most significant developments in 2015?

ML: We see a lot of interest in unstructured data across many industries and migration away from slow batch-based analyses to real-time answers -- even for large datasets.  We’re following this by emphasizing much more work with topics like natural language processing and online machine-learning algorithms. unstructured-data Related: