Age of AI Conference 2018 – Day 2 Highlights

Here are some of the highlights from the second day of the Age of AI Conference, February 1, at the Regency Ballroom in San Francisco.



These are some of the highlights from the Day 2 of the Age of AI Conference, held on January 31 and February 1, 2018, at the Regency Ballroom in San Francisco.

The Conference owes its origins in the San Francisco Artificial Intelligence meetup that Emil Mikhailov started for the interested ones to learn, network and share.  The community now boasts of 4,700+ members and has previously hosted heavyweights like Andrew Ng and the Nvidia CEO Jensen Huang.  

The Regency Ballroom boasts good location and acoustics.  The best part was the technical focus of the Conference, well punctuated with some ‘global minima’ but thought-provoking touches.  I will strive to do some justice to the rich technical content.

Here are the highlights of Day 2, Thursday, February 1.  Click here (insert hyperlink here) for the Day 1 Highlights.

 

Nicholas Papernot, PhD Fellow @ Google, @ Penn State University

 
Security and Privacy in Machine Learning

Image

  • Nicholas Papernot is a pleasure to listen to.  He explains very clearly, a lot of complex concepts.  You may recollect that he spoke, along with Nicholas Carlini, at the recent ODSC West 2017, on “Tutorial on Adversarial Machine Learning with CleverHans”, which you can re-read here.
  • Just to recap a little: In 2014, Szegedy et al. found that adversarial examples can transfer between machine learning models.  In 2015, Goodfellow et al. found that a specific adversarial network (using MNIST) has a small but higher error rate on transferred examples rather than on white-box examples.  In 2017, Papernot et al. showed that black box attacks could succeed without any access to the training data.
  • This talk, which focused on Deep Learning in Adversarial context, was split into two parts:
    • Integrity at the interface
    • Privacy
  • The attack surface spans the physical domain, digital representation, Machine Learning (ML) model and again the physical domain.  This talk focused on the ML model aspect.
  • Types of adversaries and the threat model: In the case of the white-box adversary, the attacker may see and inspect the model and then plan the attack.  In the case of the black-box: the attacker does not see the model but is still able to marshal an attack by querying.
  • The team designed a map to answer: what input features of x make the most significant changes to the output?  Called the Jacobian-based Saliency Map Attack (JSMA) showed that a small perturbation could induce large output variations.  This is different from Kullback-Leibler (KL) divergence suggested by Miyato et al., 2015 and elastic net optimization (Goodfellow et al., 2016).  Check the cleverhans repository if you want to check these out.
  • Integrity at the interface
    • Not restricted to a specific ML approach
    • Not restricted to only images.  E.g. malware can be infected
    • The strength is in the Jacobian approach which allows control of specific features to be modified.  E.g. the XML file that contains the manifest needed only a few lines of code to be added to achieve the modification
    • Using Reinforcement Learning (RL), he showed a video game of Ping-Pong.  The surprising fact was that there was no need to introduce perturbation in each frame.  Implications: Since RL is used in robotics, these attacks could make them dangerous.
    • Nicholas then dived into the black-box attacks.  The black-box adversary is only able to observe the labels given by the deep neural network and, is able to train a substitute model by using the target deep neural network as an oracle to construct a synthetic dataset.  Using this synthetic dataset, the attacker proceeds to build an approximation of the model learned by the oracle.  This approximation is the substitute network that mounts the attack.  
    • Adversarial example transferability: Samples crafted to mislead a model A are likely to mislead model B even though it does not have access to it.  When a decision tree was used instead, it was also misclassifying the input.  Ensemble methods is not a good approach and does not help.
    • Using novel methods for estimating the previously unknown dimensionality of the space of adversarial inputs, higher dimensions are more likely to intersect.
    • To check if the provider of the remote platform mattered, the team tested on MetaMind, Amazon Web Services and Google Cloud Platform and found no significant difference:
      Image

    • It is easier to steal someone’s model than it is to train your own for adversarial attacks
    • Defense against such attacks fall into two categories: (a) Reactive: detecting the adversarial samples and (b) Proactive: improving the training phase to make the model more robust.  However, note that strong attacks can subvert all detection methods (Carlini & Wagner, 2017).
    • Adversarial examples offer insights into many fundamental ML challenges: fairness, model based optimization, safety, etc.
  • Privacy in ML
    • Saltzer and Schroeder defined privacy as: The term “privacy” denotes a socially defined ability of an individual (or organization) to determine whether, when, and to whom personal (or organizational) information is to be released.
    • Differential Privacy has established itself as a standard of what an algorithm should do with private data: A natural approach to defining privacy is to require that accessing the database teaches the analyst nothing about any individual. But this is problematic: the whole point of a statistical database is to teach general truths, for example, that smoking causes cancer. Learning this fact teaches the data analyst something about the likelihood with which certain individuals, not necessarily in the database, will develop cancer.  We therefore need a definition that separates the utility of the database (learning that smoking causes cancer) from the increased risk of harm due to joining the database. This is the intuition behind differential privacy [Source: Cynthia Dwork’s classic: A Firm Foundation for Private Data Analysis; to view this source, click here]

      Image

    • The team’s definition of the problem: Preserve privacy of training data when learning classifiers.  Goals: (a) Differential privacy protection guarantees, (b) Intuitive privacy protection guarantees, and (c) Generic (which are independent of the learning algorithm)
    • Using Private Aggregation of Teachers Ensemble (PATE) strategy: An ensemble of teach models is trained on disjoint subsets of sensitive data.  Then a student model is trained with “auxiliary unlabeled non-sensitive data” with aggregate output of the ensemble so that “the student learns to accurately mimic the ensemble.”  The strategy is “strengthened by restricting student training to a limited number of teacher votes, and by revealing only the topmost vote after carefully adding random noise.”  
    • A big advantage of this approach is that it establishes a precise guarantee of training data privacy.
    • There is basic tension or tradeoff “between security or privacy and precision of ML predictions in machine learning systems with finite capacity”.  This paper (the last link below) is due to appear in ICLR ’18.

 
Links:

Google Scholar link:
https://scholar.google.com/citations?user=cGxq0cMAAAAJ&hl=en

The Limitations of Deep Learning in Adversarial Settings
https://arxiv.org/abs/1511.07528

Practical Black-Box Attacks against Machine Learning
https://arxiv.org/abs/1602.02697

The Space of Transferable Adversarial Examples
https://arxiv.org/abs/1704.03453

Adversarial Classification
https://homes.cs.washington.edu/~pedrod/papers/kdd04.pdf

On the Protection of Private Information in Machine Learning Systems: Two Recent Approaches
https://arxiv.org/abs/1708.08022

Semi-Supervised Knowledge Transfer for Deep Learning from Private Training Data
https://arxiv.org/abs/1610.05755

SoK: Towards the Science of Security and Privacy in Machine Learning
https://arxiv.org/abs/1611.03814

 

Alexey Kurakin, Research Software Engineer @ Google Brain

 
Some Adversarial Examples

Image

  • What are adversarial examples?
    Alexey began with the Szegedy, Ian Goodfellow, et. al. 2014 paper in which they covered two main properties of neural networks: semantic information is stored in the space rather than individual units and two, deep neural networks “learn input-output mappings that are fairly discontinuous to a significant extent” – meaning, by maximizing a network’s prediction error, one can apply an imperceptible perturbation to the network and cause it to misclassify an image; now, another network trained on a different subset of the dataset can also misclassify the same image when the perturbation is applied to it.  So, the nature of these perturbations is not an artifact of the learning. 
    He showed short videos. Click here to see them.  
    First video:

    • Bookshelf vs prison cells or bannisters
    • Washing machine vs. something else

    Second video:

      • Banana, add a toaster → still a banana
      • Banana, add a messed-up toaster image → toaster
      • Turtle – a 3d paper model misclassified as rifle
    • Taxonomy of Adversarial examples
    1. Targeted vs. Non-targeted: Non-targeted means you are not targeting a specific output
    2. White box and Black box
    3. Physical vs Digital attack

    Do check out link to a paper below about other taxonomy entries, like Perturbation Scope, Perturbation Measurement, etc.

  • How to craft adversarial attacks?
    Most are based on white box digital attacks.  Different methods of crafting white box adversarial examples: Basic Iterative Method (BIM), Fast Gradient Sign Method (FGSM), Zeroth Order Optimization (ZOO), Universal adversarial perturbation, One pixel attack, etc.

  • Some theory and interesting observations:
    • It is possible to defend against adversarial attacks.
    • Most models are linear or can be approximated pretty well as linear – that is, their behavior is linear.
    • The problem with linear models is: if you increase the input to infinity, then the output also goes to infinity.
  • Almost all defenses are effective only for the weak attacks.  
  • How to be robust to adversarial attacks?
    • Flawed defense: non-differentiable transformation: White box methods rely on computing gradients.  What if the adversary cannot compute the gradients?
    • Flawed defense: Image preprocessing: May work if the adversary does not know about it.  Won’t work if the adversary knows about the transformation.
    • Flawed defense – stochasticity: What if the adversary never can be sure about exact function used by the classifier?  Basically, uses random transformation.  Still, this is not protected.  Carlini paper showed that this can be broken by de-randomization.
    • Flawed defense – detector of adversarial examples: With a 2-stage system.  Detector could be trained to work very well.  Entire construction could be fooled if the adversary is aware of detector.  Won’t work in a white box system.
    • Adversarial Training: Let’s just add the adversarial images to the training set.  Need to generate adversarial examples on the fly.  
    • Gradient Masking: Ensemble adversarial training (Tramer et al. 2017 – paper submitted to ICLR).  Here, the model learns to ‘mask’ gradients instead of becoming robust.
    • Thermometer Encoding: Since linearity causes this vulnerability, let’s make models nonlinear.  But, it is too difficult to train nonlinear models.  Idea: what if only the first layer is nonlinear?  

Image

Links:

Google Scholar link:
https://scholar.google.com/citations?user=nCh4qyMAAAAJ

Intriguing properties of neural networks
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42503.pdf

Deep Neural Networks are Easily Fooled Nguyen, Yosinksi, Clune, 2015
https://arxiv.org/abs/1412.1897

Adversarial Examples: Attacks and Defenses for Deep Learning
https://arxiv.org/abs/1712.07107

Explaining and Harnessing Adversarial Examples
https://arxiv.org/abs/1412.6572

Towards Evaluating the Robustness of Neural Networks
https://arxiv.org/abs/1608.04644

Adversarial Spheres
https://arxiv.org/abs/1801.02774