Age of AI Conference 2018 – Day 2 Highlights

Here are some of the highlights from the second day of the Age of AI Conference, February 1, at the Regency Ballroom in San Francisco.

Fireside Chat with Alex Krizhevsky

Interviewer: Emil Mikhailov

How did you come up with the idea for AlexNet?

I was reading some paper on matrix multiplication algorithms on the GPU.  I found the Nvidia CUDA documentation really good and it was easy to come up with something very quickly.  So, I got the idea to re-implement the original LeNet architecture on the GPU which proved to be very fast compared to traditional.  I found coding on the GPU to be very elegant.

How did you enter Machine Learning?

During my undergrad, I took Geoff Hinton’s course and was delighted when he agreed to do a small project with me during my undergrad.  That was the period when the Netflix competition was going on and when I joined the U of Toronto team to participate in this.  When I joined them, they were in the first position, but when I left, they were in the third.  My efforts impressed Geoff enough to take me as a grad student.  If I had not gotten that, I would have probably gone in for Comp Sc. theory.  I felt then that the end goal of computer science was AI/ML.  There was a lot of luck involved and being in the right place at the right time was vital.

How important is good knowledge of Math to do Machine Learning?

To do good research, I feel I should be better at Math.  Luckily I got a chance to learn some Math and apply it early which helped me.  These days, I don’t know … I suppose, it still helps to know Math well.

Your website shows that your current status is “Regress to the Mean” – what does that mean?

Yes, I have gotten this question a few times.  It’s actually a self-deprecating joke and like all good jokes, let me explain it.  In Statistics, there is a phenomenon that if you observe a variable and it takes extreme value in the first measurement, then the next time you observe it, its value is likely to be closer to the mean.  So, in that sense, if all my previous achievements were good, then they were probably extreme and because I was lucky; so, from now I will regress to the mean.  

How do you compare the research environments between academic environment and the product environment at Google?

I found the environments pretty similar.  In both places, there was freedom to choose what you want to research on.  I guess it is still like that in Google now.

There’s a bet going on about when AGI will happen.  What’s your bet?

I don’t know.  It’s also a little difficult to define.  I think we will know when we see it.  I don’t know when it will happen.  Probably in the next 50-60 years.  There are others who make a much more aggressive claim that it will happen in the next 10 years or so.  I honestly think it is more of a personality that drives this thinking.  Some have this visionary personality and they probably see it more clearly or maybe they don’t see the obstacles, without meaning to denigrate their thinking in any way.

What’s next for you?

I will reveal it later.  

[Alex also answered a bunch of audience questions.  Many expressed their thanks to him for creating the AlexNet.]


Google Scholar link:

Alex’s personal webpage:

Link to SuperVision


Tim Urban @ Wait, But Why

The Road to Superintelligence


  • Listening to Tim Urban in flesh-and-blood is like taking the main nerd-coaster ride.  Not one to mince brutally frank expressions, he’s completely animated and full of emotive gusto.  So, it’s fun to just sit there and enjoy all that he passionately throws at you.  To describe the content in my words would do injustice to his livewire performance.  
  • To the most, who are familiar with his iconic blog Wait, But Why and its signature 2nd 4th grade stick figures to illustrate concepts from Fermi Paradox to Superintelligence, most of what he said in this talk, was mostly what we have read before.  The difference was to experience it.


  • The content was mostly from these two posts: and
  • The essence: Imagine that the history of humanity is condensed into a 500-page book.  At page 450, humanity invents agriculture.  Towards the end of page 499, United States appears.  Most of our recent developments, especially around computers, internet and AI have happened on page 500 in the last few lines.  
  • Unlike all previous human creations, AI is a completely different beast that is progressing exponentially.
  • If a video recording of this talk emerges somehow, don’t miss it!  Else, console yourself with this earlier TED talk of his.


Bo Ewald, President @ D-Wave Systems

The Journey from Digital to Quantum Computing, an Introduction


  • Bo Ewald is an industry veteran, having cut his teeth through the Los Alamos National Laboratory, Cray Research, Silicon Graphics and others.  He is currently the President of D-Wave Systems.
  • D-Wave Systems has been around since 1999 and has had about $175 Million in funding to design and manufacture quantum computing and superconducting electronics.  It has about 150 patents.
  • In 1981, Richard Feynman voiced his seminal ideas in his lecture “Simulating Physics with Computers”.  He said, “Nature isn't classical, dammit, and if you want to make a simulation of nature, you'd better make it quantum mechanical, and by golly it's a wonderful problem, because it doesn't look so easy.” [Source: Quantum simulation, Nature]
  • Bo met Feynman in 1983 in Los Alamos.  Many years later, when his team was showing Feynman the Cray supercomputer and how it crunched 160 million FLOPS, Feynman said to him, “You know young man, someday all these computers will be replaced by quantum computers.”
  • For a really layperson’s introduction to quantum technology, try Prof Leo Kouwenhoven’s talk here.


  • Quantum computing is investment-heavy and though VCs have been cautious, money has been pouring in.  Check this nice site for details: the list of quantum computing startups here and the list of active VCs in this space here.
  • Those looking for contrarian views on quantum computing can check out this interview with Prof Gil Kalai here and get some good context by listening to Prof Michel Devoret’s talk on ‘The Quest for the Robust Quantum Bit’.
  • D-Wave claims it has achieved up to 2000 qubits now.  “At the end of a cycle (about 10,000 times a second), we read out the value of the bit, by forcing it to become digital.”  There are different approaches to quantum computing.  D-Wave shocked this space in 2007 by announcing the first commercially available quantum computer and that it uses the quantum annealing technique, which does not entangle all the qubits and cannot program qubit by qubit.  Read up this fascinating story with good details and credit for the following table:

    Source: Science, Dec 2016, Vol 354, Issue 6316

  • D-Wave is not at the level of exceeding existing computational capabilities but is sufficiently advanced.  It has several paying customers and about 50 ‘proto-labs’ in the field.
  • The D-Wave container is a Faraday Cage that blocks off RF interference.  16 layers of system shielding between the quantum chip and the outside preserves the quantum calculation.  It operates in high vacuum: the pressure is 10 billion times lower than atmospheric pressure.  
  • Lockheed bought the first quantum computing machine from D-Wave.  It was a 128-qubit machine.  
  • Google became D-Wave’s second customer, which it has shared with NASA Ames.  This started out at 500 qubits and has progressed to 1000 – 2000 qubits now.  It has run for 22 months without a failure.
  • Volkswagen is experimenting with a D-Wave machine to optimize traffic flow in Beijing.  Recruit Communications is trying to optimize advertising in Japan with D-Wave systems.  There are other customer projects via cloud access.
  • The D-Wave machine does about 10,000 instructions per second but each one does a lot to find a low energy state in a complex hilly landscape.  There’s no compiler yet or higher level software to use it yet.  It operates between 10 -15 milli-Kelvin, which is colder than the outer space.  The power consumption is almost zero it uses superconductors.  Compared to a 2500 kW traditional supercomputer consumption, the D-Wave 2000Q System uses about 25kW, mostly for the cooling.
  • Bo Ewald was optimistic but largely cautious about how the next milestones would shape up.  


Autonomous Driving

Factory Optimization

Bio: Jitendra Mudhol and his CollaMeta team are passionate about designing and developing Machine Learning applications in Manufacturing.  He is an Executive Fellow at Santa Clara University's Miller Center for Social Entrepreneurship guiding their Data Science strategy and Machine Learning applications.  You may reach him at jsmudhol at collameta dot com.