Age of AI Conference 2018 – Day 1 Highlights

Here are some of the highlights from the first day of the Age of AI Conference, January 31, at the Regency Ballroom in San Francisco.

Lisha Li, Principal at Amplify Partners

Deep Learning Est Morte!  Vive Differentiable Programming


Key Points:

  • Lisha Li has generously posted the slide deck, a recording and a (rough) transcript of her talk here.
  • Lisha made a case for differentiable programming – that it’s a conceptual apparatus, a framework to apply its constituents like Machine Learning.  She began with her favorite shape is called the gomboc – as it has a single stable point of equilibrium. Modern computational optimization found mathematical proof for the existence of the gomboc’s conjecture in 2006.  Differentiable programming uses a similar template of hypothesizing, testing and iterating on ideas rapidly.  It explores a set of complex, non-intuitive set of solutions to find the right optimization.
  • Interesting reference: Yann LeCun quote how differentiable programming is “a little more than a rebranding of the modern collection Deep Learning techniques”.Hence, it’s the title of the talk.
  • Christopher Olah’s seminal blog post on differentiable programming and other types of programming, in which he imagines how we would look back on Deep Learning thirty years from now and establishes a connection between optimization and functional programming.
  • Neural network models can be assumed to be assembled from building blocks and trained with backpropagation.  Traditionally, these have included feedforward, convolutional and recurrent.  Adding continuous and differentiable algorithmic elements makes them usable in deep learning.  
  • Differentiable programming may be viewed as being optimizable (differentiability) and has a chained function composition (credit: Atılım Güneş Baydin) which permits: successive transformations, successive levels of distributed representations and the chain rule of calculus propagates derivatives.
  • Differentiable programming is well suited for three areas:
    • Feature engineering is done by optimization, not you.  
    • It’s very suitable for end to end approaches.  
    • It has incredible potential as a generative method.
  • Feature engineering is done by optimization, not you:
    • Can the architecture be flexible enough to handle diverse inputs without explicit feature engineering?  The neural net could then design the appropriate algorithm suited for the input.
    • Zoph and Le worked on using a recurrent network to generate model descriptions of neural networks and trained this RNN with reinforcement learning to maximize their accuracy on a validation set.  Results are impressive – models generated from scratch outperformed human-invented architectures.
    • If indexes are viewed as models that learn the sort order or structure or lookup keys, neural nets outperformed cache-optimized B-Trees by up to 70% in speed, saving lots of memory.
    • How does one find clustering in sparse graphs?  A data-driven approach to community detection in graphs required less computational steps and performed much better than rigid parametric models.
  • It’s very suitable for end to end approaches
    • Microsoft’s Multiworld Testing Decision Service uses RL to make decisions quickly in real time and incorporate new data (based on context) into learned policies, showing a 70% life against classical AB testing.
    • AlphaGo Zero: Was not only a leapfrog by bypassing data from real human games, but used a novel RL approach to combine policy and value.  Check out Seth Weidman’s discourse here on the three tricks that made it shine or Tim Wheeler’s dive into why it works here.
    • Automated Farming: Used the example of Iron Ox to show a possible end to end implementation.
  • It has incredible potential as a generative method:
    • Combining books and movies: Using context-aware CNN to achieved neural sentence embedding trained in an unsupervised way, impressive results are shown in this website.  
    • Design: If you input a sketch of desired outcome, the code to generate this is the output.
    • Drawing: Using collaborative interfaces, is it possible to have neural networks generate the work flows that humans could use or adapt to current work flows?
    • Music: If a neural network is fed music files of real piano performances, output is created music that is hard to distinguish from human created.
  • All these success stories do not mean it’s a cakewalk.
    • For example, with Adversarial examples (Goodfellow paper), we need to understand the weaknesses better.  
    • Causality: How are humans different from animals?How do we understand and infer cause and effect?  Judea Pearl’s work laid the foundation for causal inference that humans implicitly use Bayesian networks and probability theory to observe, think and infer about cause and effect in this world.  What happens in model agnostic neural networks?
    • Infrastructure 3.0 –if most of a typical DL code goes into feature extraction, cleaning and shaping, not much goes into the modeling.  Do we need to go to different compute substrate (remember, the human brain is one of the most efficient computation machines)?
  • Q&A: What are the cases where differential programming would not work?  Should not be applied?
    Its applications are wide, but it could be more of practical reasons why differential programming may not work.  For example, just because I can use it to generate music does not mean all the artistes will be disenfranchised.


Christopher Olah’s blog post

Neural Architecture Search with Reinforcement Learning

The Case for Learned Index Structures

Community Detection with Graph Neural Networks

A Multiworld Testing Decision Service

Mastering the game of Go without human knowledge

End-to-End Training of Deep Visuomotor Policies

Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books

pix2code: Generating Code from a Graphical User Interface Screenshot

Learning to Create Piano Performances


Dr. Radhika Dirks, XLabs

Artificial Alchemy and Impossible Intelligences


Key Points:

  • Radhika divided her talk into two parts: Hardware as Paradigms and Software as Morphologies
  • Alchemy has gotten a negative connotation over time but it does not have to be so.What is computing?  The roots of computing are shaky.  Computing is alchemy.  How do we measure complexity?  The roots of information and complexity come from Shannon’s work on communication theory.
  • The XLabsteam looksfor black swans to invest in.  An example is Seldn: an AI platform that used complexity physics to predict societal disruptions.  E.g. predicting labor strikes before anyone else could.  Measuring violence across different countries – calling out Syria and Iraq two weeks before media could mention ISIS.  The team does not look for specific technologies or verticals but for blueprints to the future.  
  • AI products either Replace, Aid or Amplify human endeavors.  XLabs focuses on the Amplify.
  • These are some of the themes that inform the XLabs’ blueprint:
    • Creatively destroying the internet – how would one envision a completely new portal to experience the web
    • Amplifying Genetics – how doesone move towards super-wellness?  Today’s health is all about death prevention.  How doesone predict phenotype from genotype?
    • Real Human Connections – there are one thousand apps for match-matching but zero apps for relationship building.  Increased digital connectedness has somehow created increased emotional unavailability.
  • What is Intelligence?The connection between Alchemy and AI lies in the work of Hubert Dreyfus in 1965.  This 2x2 gives a relationship between the scale of intelligence between humans and computers.
  • Morphologies need to make sense.  At – new kind of Bell Labs – the team invests in futuristic moonshots via AI, UC and NT.  They do not focus on the Tech, or the Vertical but, on the blueprint for the future.



Radhika Dirks Google Scholar page


Claude Shannon, Information Theory

Alchemy and Artificial Intelligence


Stephen Wolfram, Founder and CEO,  Wolfram Research

Invited Talk


Key Points:

  • Stephen Wolfram always invites intense curiosity and interest.  A piece of well-known trivia: the thesis committee for his PhD in particle physics in Caltech consisted of: Richard Feynman, Peter Goldreich, Frank Sciulli and Steven Frautschi.  His Wolfram Mathematica has contributed to legions of grateful students who have used it in schools and universities for an astounding range of technical computing work.
  • His talk focused on the Wolfram Alpha and its features.  Wolfram Alpha is a computational knowledge engine.  In the works for about 13 years now, it has about 15 million lines of code and there are about 10 stages of the curation process.  About 20 companies are using Wolfram Alpha internally.
  • It uses the Wolfram language – in development for 31 years now.  Built and used for Mathematica.  Interacting with it is through Notebooks - we invented and used the Notebook concept for 29 years now.It’s a knowledge based language.  It’s also a symbolic language.  
  • Stephen gave a demo of the capabilities of the Wolfram language using a notebook (like Jupyter).  
  • The principle behind Wolfram Alpha is the theory of cellular automata – the ability to generate complex from very simple rules and steps and models.  How doesone mine this computation behavior?The typical Engineering approach is highly constrained as Stephenforesaw what the output should be.  There is a very conceptual principle behind that: the principle of computational equivalence.
  • In the universe of computations, there are computations that are:
    • Not found or conceived by humans
    • Not understandable by humans
    • Humans don’t care about
  • He also talked about Leibowitz – philosophical language for everything a human does – an unfinished project – codify all human laws – actually very relevant today – able to express everything happening in the world a symbolic way.  E.g. smart contracts.  Ultimate smart contract – the one that humans will write with AI
  • Stephen Wolfram very graciously and patiently answered a lot of audience questions for an hour.  These included questions around his contribution to the music for the movie Arrival, about sentient AI, about the future of knowledge based computation, etc.

Day 1 ended with an open bar and a fabulous live jazz band.  It gave a chance for all to unwind, network, speak to the vendors exhibiting and exchange notes.

Bio: Jitendra Mudhol and his CollaMeta team are passionate about designing and developing Machine Learning applications in Manufacturing.  He is an Executive Fellow at Santa Clara University's Miller Center for Social Entrepreneurship guiding their Data Science strategy and Machine Learning applications.  You may reach him at jsmudhol at collameta dot com.