KDnuggets Exclusive: Part 2 of the Interview with Yann LeCun

We discuss how far AI is likely to go, how Data Science to Statistics is like Computer Science was to Math, Big Data hype and reality, and advice to beginning Data Scientists.

By Gregory Piatetsky, Feb 20, 2014.

Here is Part 1 of the interview.

Yann LeCun Yann LeCun is of the leading experts in Deep Learning - a breakthrough advance in machine learning which has been achieving amazing successes, a founding Director of NYU Center for Data Science, and was recently appointed the Director of the AI Research Lab at Facebook.

Gregory Piatetsky: 5. Looking longer term, how far will AI go? Will we reach Singularity as described by Ray Kurzweil?

Yann LeCun: We will have intelligent machines. It's clearly a matter of time. We will have machines that, without being very smart, will do useful things, like drive our cars autonomously.

How long will it take? AI researchers have a long history of under-estimated the difficulties of building intelligent machines. I'll use an analogy: making progress in research is like driving a car to a destination. When we find a new paradigm or a new set of techniques, it feels like we are driving a car on a highway and nothing can stop us until we reach the destination. Self-Driving Car

But the reality is that we are really driving in a thick fog and we don't realize that our highway is really a parking lot with a brick wall at the far end. Many smart people have made that mistake, and every new wave in AI was followed by a period of unbounded optimism, irrational hype, and a backlash. It happened with "perceptrons", "rule-based systems", "neural nets", "graphical models", "SVM", and may happen with "deep learning", until we find something else. But these paradigms were never complete failures. They all left new tools, new concepts, and new algorithms.

Sigmoid CurveAlthough I do believe we will eventually build machines that will rival humans in intelligence, I don't really believe in the singularity. We feel like we are on an exponentially growing curve of progress. But we could just as well be on a sigmoid curve. Sigmoids very much feel like exponentials at first. Also, the singularity assumes more than an exponential, it assumes an asymptote. The difference between dynamic evolutions that follow linear, quadratic, exponential, asymptotic, or sigmoidal shapes are damping or friction factors. Futurists seem to assume that there will be no such damping or friction terms. Futurists have an incentive to make bold predictions, particularly when they really want them to be true, perhaps in the hope that they will be self-fulfilling.

GP: 6. You were ( are ?) also a Director for the NYU Center for Data Science. How will you combine your work at Facebook and at NYU?

Yann LeCun: I have stepped down as (founding) director of the NYUNYU Center for Data Science.

The interim director is S. R. Srinivasa "Raghu" Varadhan, possibly the most famous probability theorist in the world. NYU has initiated a search for a new permanent director. I have invested a huge amount of energy into the creation of CDS. We now have an MS program in Data Science, and will soon have a PhD program. We have 9 open faculty positions for the center, we have won a very large, five-year grant from the Moore and Sloan foundations in collaboration with Berkeley and University of Washington, we have a partnership with Facebook and other companies, we will soon have a new building. The next director is going to have all the fun!

GP: 7. The term "Data Science" has emerged recently and has been described as an intersection of Statistics, Hacking, and Domain/Business Knowledge. How is Data Science different from previous terms like "Data Mining" and "Predictive Analytics" ? If it is a new science, what are its key equations / principles ?

Yann LeCun: Data Science pertains to the automatic or semi-automatic extraction of knowledge from data. This concept permeates many disciplines, each of which has a different name for it, including statistical estimation, data mining, predictive analytics, system identification, machine learning, AI, etc.

On the methods side, statistics, machine learning, and certain branches of applied mathematics could all claim to "own" the field of data science. But in reality, Data Science is to Statistics, Machine Learning, and Applied Math as Computer Science was to Electrical Engineering, Physics, and Mathematics in the 1960s. The same way computer science became a full-fledged disciplined, rather than a sub-field of mathematics or engineering is its importance to society.

With the exponential growth of data generated by our digital world, the problem of automatically extracting knowledge from data is growing rapidly. This is causing the emergence of Data Science as a discipline. It is causing a redrawing of the boundaries between Statistics, Machine Learning, and Applied Mathematics. It is also creating a need for tight interactions between "methods" people and "domain" people in science, business, medicine, and government.

My prediction is that 10 year from now, many top universities will have Data Science departments.

GP: 8. What is your opinion on "Big Data" as a trend and as a buzzword? How much is hype and how much is real?

Yann LeCun: I like the joke circulated around social networks that compared big data to teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it. [GP: this joke came from Dan Ariely Facebook post]

I have seen people insisting on using Hadoop for datasets that could easily fit on a flash drive and could easily be processed on a laptop.

There is hype, to be sure. But the problem of how to collect, store and analyze massive amounts of data is very real. I'm always suspicious of names like "big data", because today's "big data" is tomorrow's "little data". Also, there are many important problem that arise because of too little data. It is the often the case for genomics and medical data. There is never enough data.

GP: 9. Data Scientist has been called "the sexiest profession of the 21st century". What advice would you give to people who want to enter the field?

Yann LeCun: If you are an undergrad, take as many math, stats, and physics courses as you can, and learn to program (take 3 or 4 CS courses).

If you have an undergraduate degree, apply to NYU Master of Science in Data Science .

GP: 10. What is a recent book you read and liked? What do you like to do when away from a computer /smartphone?

I design and build miniature flying contraptions, I tinker with 3D printers, I hack microcontroller-based widgets, and I hope to get better at making music (I seem to collect electronic wind controllers). I read mostly non-fiction, and I listen to a lot of jazz (and to many other types of music).