Age of AI Conference 2018 – Day 1 Highlights
Here are some of the highlights from the first day of the Age of AI Conference, January 31, at the Regency Ballroom in San Francisco.
These are some of the highlights from the Day 1 of the Age of AI Conference, held on January 31 and February 1, 2018, at the Regency Ballroom in San Francisco.
The Conference owes its origins in the San Francisco Artificial Intelligence meetup that Emil Mikhailov started for the interested ones to learn, network and share. The community now boasts of 4,700+ members and has previously hosted heavyweights like Andrew Ng and Nvidia CEO Jensen Huang.
The Regency Ballroom boasts good location and acoustics. The best part was the technical focus of the Conference, well punctuated with some‘global minima’ but thoughtprovokingtouches.I will strive to do some justice to the rich technical content.
Here are the highlights of Day 1, Wednesday, January 31.
Balaji Laxminarayanan, Senior Research Scientist, DeepMind
Understanding Generative Adversarial Networks
Key Points:
 This talk delves into some theory behind Generative Adversarial Networks (GANs). How do GANs relate to other ideas in probabilistic machine learning? Most models in machine learning and statistics are of the prescribed probabilistic model type – they come with a conditional loglikelihood function. E.g. object recognition classifiers. Implicit probabilistic models, merely use a stochastic procedure to generate the data. E.g. if we know broadly how the model works, we use it to generate data to study ecology, climate and weather patterns.
 Hypothesis testing is a principle for learning in implicit generative models and done by density ratio estimation. This is via four approaches: Class probability estimation, Density ratio matching, Divergence minimization and Moment matching. A summary of these approaches is in the figure below.
 The team trained a generator by maximum likelihood and by Wasserstein GAN (WGAN), compared them by using two tools: real NVP to compute the exact logprobability densities and an independent critic to compare the approximate Wasserstein distances on the validation set. They found that: Wasserstein distance can compare models. Wasserstein distance can be approximated by training a critic. Training by WGAN leads to better samples but worse logprobabilities.
 For learning latent variable models (that is, statistical models that have hidden or unobserved variables), two popular approaches are variational autoencoders (VAEs) and GANs. GANs can train on large datasets, are fast to simulate and when trained on images, generate visually compelling sample images. But, they can become unstable in optimization leading to modecollapse, where the generated data does not represent the diversity of the underlying data distribution. VAEs help in inference of the latent variables, very useful in representation learning and visualization, do not suffer from modecollapse, but alas, generate blurry images. Hence the rationale for combining the best of these two methods. They found that: gradient penalties stabilize (nonWasserstein) GANs as well and one needs to think of both – the ideal loss function and the optimization.
 GANs for imitation learning: See this YouTube video (link) that summarizes paper/effort by Josh Merel et. al. Balaji concluded with some of the other areas of exciting research around using ideas from convergence of Nash equilibria, connections to Reinforcement Learning (RL) and control theory.
Links:
Learning in Implicit Generative Models
https://arxiv.org/abs/1610.03483
Comparison of Maximum Likelihood and GANbased training of Real NVPs
https://arxiv.org/abs/1705.05263
Variational Approaches for AutoEncoding Generative Adversarial Networks
https://arxiv.org/abs/1706.04987
Many Paths to Equilibrium: GANs Do Not Need to Decrease Divergence At Every Step
https://arxiv.org/abs/1710.08446
Learning human behaviors from motion capture by adversarial imitation
https://arxiv.org/abs/1707.02201
Balaji’s website
http://www.gatsby.ucl.ac.uk/~balaji/
Tarin Ziyaee, CTO at Voyage.auto
Voyage
Spun out of Udacity’s selfdriving car program, Udacity Vice President Oliver Cameron cofounded Voyage. Crunchbase says InMotion Ventures has lead the $15M investment in this driverless car startup working at ‘Level 4 automation’.
Key Points:
 Autonomous driving today is where flight was about a 100 years ago. Progress is going to be incremental. We focus on the algorithmic part and have partners such as Carmera (HD maps) providing other expertise.
 Voyage hasdeployed in Florida (link: The Villages) and CA in a private retirement community as a door to door selfdriving taxi service to the residents. Currently, they all have a safety driver in the car.
 It’s a geofenced area with 150,000 residents and 750 miles of road that keeps it bounded. Yet, it has all the complexities that one might find elsewhere, including even darting deer and waddling ducks, cyclists, weddings that happen, etc. Voyage cars can go up to 2530 miles/hour.
 Considering the tough competition in the autonomous driving space, Voyage is playing with different monetization models.
 Three tenets have guided Voyage’s‘hygienic design principles’
 Do not infer, that which you can measure.
 Universal approximators are good, but universally approximating, not so good.
 Don’t boil the ocean.
Links:
Tarin’s Google Scholar:
https://scholar.google.com/citations?user=xA1RnAIAAAAJ&hl=en
Augustus Odena, Researcher at Google Brain
GANs and Geometry
Key Points:
 GAN variants have been spawning like rabbits but this study pointed out that none outperformed the original.GANs are also hampered by unstable training and by the lack of proper evaluation metrics.
 This paper showed the GAN training model can be decomposed into three geometric steps: separating hyperplane search, discriminator parameter update away from the separating hyperplane, and the generator update along the normal vector direction of the separating hyperplane. The geometric GAN converges to a Nash equilibrium between the discriminator and generator. However, GANs are usually trained using gradient descent techniques designed to find a low value of the cost function, not to find the Nash equilibrium and so these algorithms may fail to converge.
 A matrix of all firstorder partial derivatives of a vectorvalue function is called a Jacobian matrix in vector calculus. When this matrix is a square matrix, both the matrix and its determinant are referred to as the Jacobian in literature.
 One starts with the Jacobian of the Generator in GAN. The generator takes elements of Z to elements of X. Thus, its Jacobian is of the shape dim(X) x dim (Z). There is a different Jacobian J_z for every point in z. J_z tells how sensitive G(z) is to changes in z.
 Two main methods to evaluate GANs: Inception Score and Frechet Inception Distance (FID)
 Unconditioned GAN: No control on the modes of data generated. It is possible to condition the GAN by feeding extra information (e.g. class label or multimodal) to both the generator and the discriminator as additional input layer. But then, how does one measure this conditioning? Is it related in any way to the Inception Score and/or FID?
 Turns out there’s a (surprising) correspondence and they are causally linked. Here’s how to tell:
 Feed noise z and slighted perturbed noise z’ through the generator.
 Measure how different G(z) and G(z’) are
 If too different, penalty!
 If too same, penalty!
 Thus, one can measure the conditioning of the generator. It corresponds to the Inception Score and the FID. One can intervene to improve the conditioning which makes the GAN perform better.
Links:
Geometric GAN
https://arxiv.org/abs/1705.02894
Improved Techniques for Training GANs
https://arxiv.org/abs/1606.03498
Augustus’ Google Scholar link:
https://scholar.google.com/citations?user=vuwLi4MAAAAJ&hl=en
Roman Trusov, Researcher at XIX.ai
Semantic Segmentation in the Wild
Key Points:
 Semantic segmentation is understanding an image and assign each pixel an object class. So, the task is to group pixels into regions that contain objects of a certain class. Examples include: robot vision, autonomous driving and medical imaging.
 To perform semantic segmentation using Neural Networks, the traditional feature extraction is redundant as it builds a ‘deep representation’ from the whole image and is even detrimental for quality. Segment the image first and then apply feature extraction.
 There is no consensus on the training routine: use a large batch or a small learning rate
 An inference engine is needed. Depending on the architecture, 3x5x speedups may be seen if some best practices are followed. These include: conversion to static graph, dynamic memory allocation, graph optimization, disabling backward pass, etc.
 There is a tradeoff between accuracy and performance. So, semantic segmentation does not scale to execution on frame by frame in real time. So, video from a dashcam at 15fps or at the most 25fps is doable.
Links:
Roman Trusov’s Quora page:
https://www.quora.com/profile/RomanTrusov
Christian Szegedy, Research Scientist at Google
Towards AutoFormalization of Mathematics
Key Points:
 Using Math to express makes solutions easier to implement and selfreferential. It has natural reproducibility, the language of choice for anything related to reasoning, allows the deepest, most hierarchical and complex content ever created and required for programming, physics, etc.
 The Mizar Mathematical Library is a system for formalizing and proofchecking mathematics invented by Andrzej Trybulec, collected over 44 years. Its verification engine is designed to preserve human understanding of proof steps.
 Do computers really understand text? Recurrent neural networks have improved machine translation. Idea is to use an Autoformalization approach to NLP. The hope is that at the end of this process, it will become a strong translator between a formal and informal language process. Once you have this kind of a mathematical language interpreter, then it could be extended to almost anything.
 The challenges are twofold: One, Premise Selection, to pick a few of the possible 150,000 premises needs 100% recall. Previously proposed approach suggests use kNN search with handcrafted features. Two, Large search space means one has to use bruteforce and so a fast handcrafted heuristic is used for selecting the next proof step.
 Using Deep Learning for Premise Selection, avoids handengineering features and is an important step towards automatic theorem proving.
 Using a few proof guidance strategies with deep neural networks, they found firstorder proofs of 7.36% of the firstorder logic translations of the Mizar Mathematical Library theorems that previously did not have Automated Theorem Provers generated proofs.
 Humans prefer higher order logic and there are four major theorem provers:
 Isabelle (SML)
 Coq (OCaml)
 HOL4 (PolyML)
 HOLlight (OCaml)
Christian’s team use the HOLlight (OCaml) theorem prover.
Links:
Google’s Multilingual Neural Machine Translation System
https://arxiv.org/abs/1611.04558
Kaliszyk, Cezary, and Josef Urban. “MizAR 40 for Mizar 40”. Journal of Automated Reasoning 55.3 (2015): 245256
Schulz, Stephan. “Ea brainiac theorem prover.” AI Communications 15.2 3 (2002): 111:126
DeepMathDeep Sequence Models for Premise Selection
https://arxiv.org/abs/1606.04442
Deep Network Guided Proof Search
https://arxiv.org/abs/1701.06972
Christian’s Google Scholar page: https://scholar.google.com/citations?user=3QeF7mAAAAAJ&hl=en
Pages: 1 2
Top Stories Past 30 Days

