Interview: Pedro Domingos: the Master Algorithm, new type of Deep Learning, great advice for young researchers

Top researcher Pedro Domingos on useful maxims for Data Mining, Machine Learning as the Master Algorithm, new type of Deep Learning called sum-product networks, Big Data and startups, and great advice to young researchers.

By Gregory Piatetsky, @kdnuggets, Aug 19, 2014.

This is the second part of my interview with Prof. Pedro Domingos, a leading researcher in Machine Learning and Data Mining, winner of ACM SIGKDD 2014 Innovation Award, widely considered the Data Mining/Data Science "Nobel Prize".

Here is the first part: Interview: Pedro Domingos, Winner of KDD 2014 Data Mining/Data Science Innovation Award.

Many of Prof. Domingos award winning research ideas are implemented in software which is freely available, including

Alchemy: Algorithms for statistical relational AI alchemy.cs.washington.edu
VFML: A toolkit for mining massive data sources www.cs.washington.edu/dm/vfml/
NBE: A Bayesian learner with very fast inference www.cs.washington.edu/ai/nbe
BVD: A bias-variance decomposition for zero-one loss www.cs.washington.edu/homes/pedrod/bvd.c
RISE: A unified rule- and instance-based learner www.cs.washington.edu/homes/pedrod/rise.c
SPN: Sum-product networks for tractable deep learning spn.cs.washington.edu

To learn more about his research, here are some of his most cited papers via Google Scholar and Citeseerx.

Gregory Piatetsky: Q7. You published a very good article "A few useful things to know about Machine Learning" which lists 12 key observations. Are there a few additional ones that you would add for data mining / data science ?

Pedro Domingos: Yes!

Data is either curated or decaying; minding the data is as important as mining it.
Every number has a story, and if you don't know the story, you can't trust the number.
Model the whole, not just the parts, or you may miss the forest for the trees.
Tame complexity via hierarchical decomposition.
Your learner's time and space requirements should depend on the size of the model, not the size of the data.
The first job you should automate is yours; then you can mine a thousand things in the time it took you to mine one.

There's many more, and I'll have more to say about some of these in my award talk at KDD-2014.

GP: Q8. When you were visiting MIT CSAIL Lab in 2013, you were working on a new book. Can you tell us about this book? What other work you did there as a visiting scientist?

PD: It's a popular science book about machine learning and big data, entitled "The Master Algorithm: Machine Learning and the Big Data Revolution."

It's almost done, and will come out in 2015. The goal is to do for data science what "Chaos" did for complexity theory, or "The Selfish Gene" for evolutionary game theory: introduce the essential ideas to a broader audience, in an entertaining and accessible way, and outline the field's rich history, connections to other fields, and implications.

Now that everyone is using machine learning and big data, and they're in the media every day, I think there's a crying need for a book like this. Data science is too important to be left just to us experts! Everyone - citizens, consumers, managers, policymakers - should have a basic understanding of what goes on inside the magic black box that turns data into predictions.

At MIT I worked with Josh Tenenbaum on a joint research project we have. The goal is to be able to go all the way from raw sensor data to a high-level understanding of the situation you're in, with Markov logic as the glue that lets all the pieces come together. Josh is a cognitive scientist, and his role in the project is to bring in ideas from psychology. In fact, one of the funnest parts of my sabbatical was to hang out with computer scientists, psychologists and neuroscientists - there's a lot you can learn from all of them.

GP: Q9. What are the major research directions on which you are working currently?

PD: I'm working on a new type of deep learning, called sum-product networks. SPNs have many layers of hidden variables, and thus the same kind of power as deep architectures like DBMs and DBNs, but with a big difference: in SPNs, the probabilistic inference is always tractable; it takes a single pass through the network, and avoids all the difficulties and unpredictability of approximate methods like Markov chain Monte Carlo and loopy belief propagation. As a result, the learning itself, which in these deep models uses inference as a subroutine, also becomes much easier and more scalable.

Sum-product networks, a new type of deep learning

The "secret sauce" in SPNs is that the structure of the network is isomorphic to the structure of the computation of conditional probabilities, with a sum node where you need to do a sum, and a product node where you need to do a product.

In other deep models, the inference is an exponentially costly loop you have to wrap around the model, and that's where the trouble begins. Interestingly, the sums and products in an SPN also correspond to real concepts in the world, which makes them more interpretable than traditional deep models: sum nodes represent subclasses of a class, and product nodes represent subparts of a part. So you can look at an SPN for recognizing faces, say, and see what type of nose a given node models, for example.

I'm also continuing to work on Markov logic networks, with an emphasis on scaling them up to big data. Our approach is to use tractable subsets of Markov logic, in the same way that SQL is a tractable subset of first-order logic.

One of our current projects is to build something akin to Google's knowledge graph, but much richer, based on data from Freebase, DBpedia, etc. We call it a TPKB - tractable probabilistic knowledge base - and it can answer questions about the entities and relations in Wikipedia, etc. We're planning to make a demo version available on the Web, and then we can learn from users' interactions with it.

GP: Q10. Big Data and Machine Learning are among the hottest tech areas, and many researchers in data mining and machine learning have been involved in start-ups. Have you considering start-ups and why have you not started a company?

Startup

PD: That's what my wife keeps asking me. Seriously, I do think there's a startup in my future. There are two reasons I haven't done it yet. First, I want to do a startup that's based on my research, and in the last decade my research has been fairly long-term. This means there's a longer arc until it's ready for deployment, but hopefully when it is the impact is also larger.

Second and related, I want to do a startup that has at least the potential to be world-changing, and many stars have to align for that to happen. I often see colleagues do a startup without giving much thought to all the non-technical issues that are even more important than the technical ones, which is not a recipe for success. In the data science space, it's rare for a startup to be a complete failure, just because the acqui-hire value of a company is so high, but if that's all you wind up with then maybe it wasn't the greatest use of your time.

GP: Q11. What is your opinion on "Big Data" boom - how much is hype and how much is reality? Is there a Machine Learning "boom" going on now? (Note: Gartner latest "Hype Cycle" report has "Big Data" in the trough of disillusionment).

PD: There's a fair amount of hype, but at heart the big data boom is very real. I like the "army of ants" metaphor: it's not that any single big data project will drastically change your bottom line - although it does on occasion - but that when you add up all the places where data analysis can make a difference, it really is transformative. And we're still only scratching the surface of what can be done. The bottleneck really is the lack of data scientists.

Machine learning is booming along with big data, because if data is the fuel and computing is the engine, machine learning is the spark plugs.

To date machine learning has been less of a meme in industry or the public's mind than data mining, data science, analytics or big data, but even that is changing.

I think the term "machine learning" has a longer half-life than "data science" or "big data," and that's good, because there's progress to be made in both the short and the long term.

GP: Q12. What advice would you give to young researchers interested in Machine Learning, Data Mining, Data Science?

Advice

PD:

Swing for the fences in everything you do; incremental research is not worth your time.

Learn everything you can, but don't necessarily believe any of it; your job is to make some of those things outdated.

Don't be intimidated by all the math in the textbooks; in this field, the math is a servant of the data, not the other way around.

Listening to the data - doing experiments, analyzing the results, digging deeper, following up on surprises - is the path to success.

If you're not confused and flailing most of the time, the problem you're tackling is probably too easy.

Talk continually with people from not just one company or industry, but many, and try to figure out what problems they have in common. That way you know you'll have a lot of impact if you solve one of them.

Read widely, but with a view to the research problems you care about; the greatest insights often come from putting previously separate things together.

Work with tomorrow's computing power in mind, not today's.

Beware of hacking; a hack feels clever, but it's the opposite of a general solution.

Complexity is your greatest enemy. Once you think you've solved a problem, throw out the solution and come up with a simpler one. Then do it again.

And of course, have fun - no field has more scope for it than this one.

GP: Q13. What do you like to do in your free time, when away from a computer? What book have you read and liked recently?

PD: I like to read books and listen to music. I'm a movie buff, and I enjoy traveling. My tastes in all of these things are pretty eclectic. I'm also a swimmer and long-distance runner. And, most of all, I spend time with my family.

A fascinating book I've read recently is

"The Scientist in the Crib: What Early Learning Tells Us About the Mind," by Alison Gopnik, Andy Meltzoff and Pat Kuhl. Infants and small children go through an amazing series of learning stages, assembling piece by piece the consciousness we adults take for granted. I can't help thinking that the answers to a lot of our questions in machine learning are right there in the baby's mind, if only we can decode them from the often-astonishing experimental observations that Gopnik and Co. summarize in the book.

On the fiction side, the best book I've read recently is probably "The Road," by Cormac McCarthy. It's about a father and son trying to survive in a post-apocalyptic world, and it's a powerful, unforgettable book.

BIO: Pedro Domingos is Professor of Computer Science and Engineering at the University of Washington. His research interests are in machine learning, artificial intelligence and data mining. He received a PhD in Information and Computer Science from the University of California at Irvine, and is the author or co-author of over 200 technical publications.

He is a member of the editorial board of the Machine Learning journal, co-founder of the International Machine Learning Society, and past associate editor of JAIR. He was program co-chair of KDD-2003 and SRL-2009, and has served on numerous program committees. He is a winner of the SIGKDD Innovation Award, the highest honor in the data mining field. He is a AAAI Fellow, and received a Sloan Fellowship, an NSF CAREER Award, a Fulbright Scholarship, an IBM Faculty Award, and best paper awards at several leading conferences.

Related:

Interview: Pedro Domingos: the Master Algorithm, new type of Deep Learning, great advice for young researchers

More On This Topic

Latest Posts

Top Posts