Interview: Ted Dunning, MapR on Apache Mahout & Technology Landscape in ML
We discuss Apache Mahout, its comparison with Spark and H2O, trends, advice, desired qualities in data scientists and more.
First part of interview
Here is second and last part of my interview with him:
Anmol Rajpurohit: Q7. Which trends in the space of recommender systems appear the most interesting to you? Why?
Ted Dunning : I think that the idea of synthetic indicators is very exciting. These are tags that can be applied to content and users based on external properties or relations that allow some pretty amazing capabilities.
AR: Q8. What motivated you to work on Apache Mahout? How do you compare Mahout with Spark and H2O?
With respect to Spark and H2O, it is difficult to make direct comparisons. Mahout was many years ahead of these other systems and thus had to commit early on to much more primitive forms of scalable computing in order to succeed. That commitment has lately changed and the new generation of Mahout code supports both Spark and H2O as computational back-ends for modern work.
That inter-relationship makes direct comparison even harder in some ways. I think that there is so much to work on in machine learning that it is hard to say that one project is directly competitive with another when, in fact, they actually work together in many ways.
Clearly Mahout has a huge lead over the other systems in the way that it compiles linear algebra expressions into efficient programs for back-ends like Spark (or H2O). Clearly also, H2O has a huge lead over Spark's MLLib in terms of numerical performance and sophisticated learning algorithms. Mahout is also the only system that fully supports indicator-based recommendation systems, which is a huge difference as well.
AR: Q9. What is the best advice you have got in your career?
AR: Q10. Is "talent crunch" a real problem in Big Data? What has been your personal experience around it?
TD: Yes. The talent-crunch is a real problem. But finding really good people is always hard.
People over-rate specific qualifications. Some of the best programmers and data scientists I have known did not have specific training as programmers or data scientists. Jacques Nadeau leads the MapR effort to contribute to Apache Drill, for instance, and he has a degree in philosophy, not computing. One of the better data scientists I know has a degree in literature. These are widely curious people who are voracious learners. Combine that with a good sense of mathematical reasoning and a person can go quite far.
A great example of this same bias happens when people ask questions in interviews to which they already know the answer. I don't want to hire people who know what I know. I want to hire people who know what I don't know. If I learn something important from a candidate during an interview, that is one of the best indications that they are a good hire. If they learn from me, I don't consider that a great indicator.
TD: I want people who are switched-on, curious about things, willing to try new things and who are willing to tell me when I am wrong (hopefully somewhat gently). I also want people who get things done and understand the value of simplicity.
AR: Q12. What was the last book that you read and liked?
Related: