Skip to content

Latest commit

 

History

History
137 lines (70 loc) · 6.42 KB

README.md

File metadata and controls

137 lines (70 loc) · 6.42 KB

This Repository is about Adversarial and Robustness in Machine Learning.

Presented on NeurlPS 2018 by J. Z. Kolter and A. Madry.


  1. Basic Attack: by maximizing loss of correct class label. (FGSM method: gradient of loss function with respect to perturbation and clip it to the bounding area)

  2. Target Attack: by maximizing loss of correct class label and minimizing loss of target class label.

  3. Binary Classification using Linear models on MNIST

    • Basic model after 10 epochs:

    error rate of 0.0004 means 1 wrong in test set.

    • Noise created for Linear models:

    Vertical line (like a 1) in black pixels, and a cirlce (like a 0) in in white. The intuition here is that moving in the black direction, we make the classifier think the image is more like a 1, while moving in the white direction, more like a 0.

    • Model error rate:

    From 0.0004 error rate to 82.8% error rate.

    • Training Robust Classifier:
    • Adversarial Training Set:

    No adversarial attack can lead to more then 2.5% error on the test set.

    • Non-adversarial Training Set:

      We’re getting 0.3% error on the test set. This is good, but not as good as we were doing with standard training; we’re now making 8 mistakes on the test set, instead of the 1 that we were making before.

      Trade off between clean accuracy and robust accuracy, and doing better on the robust error leads to higher clean error.

    • Optimal perturbation for this robust model:

  4. Neural Networks

    1. Solving inner maximization problem

      1. Lower bounding techniques:

        1. FGSM : which takes gradient of loss function with respect to perturbation

          Constructing adversarial examples usign FGSM on Conv2D mode.

          Error rate FGSM:

        2. Projected Gradient Descent:

        3. Steepest Descent:

        4. Randomization:

          Error rate with randomization:

        5. Targeted Attack

          Target attack = 2 (The actual 2 is unchanged, because the loss function in this case is always exactly zero)

          Target attack = 0 (we are maximizing is the class logit for the zero minus the class logit for the true class. But we don’t actually care what happens to the other classes, and in some cases, the best way to make the class 0 logit high is to make another class logit even higher.)

        6. Targeted Attack (minimizing all other classes)

        7. Non-ℓ∞ norms

      2. Exactly solving

        1. Mixed integer formulation

        2. Finding upper bound and lower bound

        3. Final integer programming formulation

        4. Certifying robustness

      3. Upper bounding technique

        1. Convex relaxation

        2. Interval-propagation-based bounds

    2. Solving outer minimization problem

      1. Adversarial training with adversarial examples
      2. Relaxation-based robust training
      3. Training using provable criteria