On this page I collect neural network defenses against adversarial examples, which have been proven to be at most as robust as a simple baseline (the corresponding undefended model or a random guessing), or not robust at all. Many defenses listed here were proposed, when there was not yet consent regarding how to thoroughly evaluate neural network defenses. By publishing On Evaluating Adversarial Robustness and On Adaptive Attacks to Adversarial Example Defenses, Carlini et al. and Tramèr et al. gave suggestions on how to properly claim robustness of a newly developed neural network defense mechanism. A key suggestion was to adaptively attack the considered defense.

An attacker model, which assumes the attacker to be oblivious to the protection mechanism in place, is merely enough to perform sanity checks, but not enough to make reliable claims regarding robustness. Therefore, when faced with an adaptive attacker, many proposed defenses turned out to be quite less robust than previously claimed.

My goal by curating this collection, is not at all to blame the authors of the listed defenses. As mentioned above, the methodology of how to properly evaluate a defense had to develop over the past few years. Instead, my goals are to

  1. urge practitioners not aware of the state of the art in the machine learning robustness research, to not blindly use the collected defense mechanism in their applications
  2. gather in a single place ideas, which (in the way they were implemented) do not enhance robustness of neural networks

If you are instead looking for state of the art robust models, I refer you to the collections hosted at Robust-ML and RobustBench.

These are the sources I used to curate my collection:

This is the actual collection of non-robust (or at least not as robust as claimed) defenses:

Defense Basic Description Author(s) Year Attacker Publication Author(s) Year
A Kernelized Manifold Mapping to Diminish the Effect of Adversarial Perturbations Taghanaki et al. 2019 Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks Croce and Hein 2020
A New Defense Against Adversarial Images: Turning a Weakness into a Strength randomly perturbing input, checking if closest AE is further away than some threshold Yu et al. 2019 On Adaptive Attacks to Adversarial Example Defenses Tramèr et al. 2020
Adversarial and Clean Data Are Not Twins adversarial retraining (binary classifier) Gong et al. 2017 Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods Carlini and Wagner 2017
Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks Mustafa et al. 2019 Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks Croce and Hein 2020
Adversarial Example Detection and Classification With Asymmetrical Adversarial Training combine base classifier with robust, binary »class-predicate-classifiers« Yin et al. 2019 On Adaptive Attacks to Adversarial Example Defenses Tramèr et al. 2020
Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics hidden layer statistics (PCA on conv) Li and Li 2016 Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods Carlini and Wagner 2017
Adversarial Logit Pairing Adversarial retraining (force logit similarity of benign and adversarial image pairs) Kannan et al. 2018 Evaluating and Understanding the Robustness of Adversarial Logit Pairing Engstrom et al. 2018
APE-GAN: Adversarial Perturbation Elimination with GAN APE-GAN (similar to MagNet) Shen et al. 2017 MagNet and "Efficient Defenses Against Adversarial Attacks" are Not Robust to Adversarial Examples Carlini and Wagner 2017
Are Generative Classifiers More Robust to Adversarial Attacks? Li et al. 2018 On Adaptive Attacks to Adversarial Example Defenses Tramèr et al. 2020
Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples Retraining by amplifying »important« neuron weigths Tao et al. 2018 Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples? Carlini 2019
Barrage of Random Transforms for Adversarially Robust Defense (BaRT) »… stochastically combining a large number of individually weak defenses into a single barrage of randomized transformations to build a strong defense …« Raff et al. 2019 Demystifying the Adversarial Robustness of Random Transformation Defenses Sitawarin et al. 2021
Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality Input statistics (local intrinsic dimensionality, LID) Ma et al. 2018 Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples Athalye et al. 2018
Countering Adversarial Images using Input Transformations Input preprocessing (cropping, scaling, compression, …) Guo et al. 2017 Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples Athalye et al. 2018
Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning Papernot and McDaniel 2018 On the Robustness of Deep K-Nearest Neighbors Sitawarin and Wagner 2019
Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser Input preprocessing (denoising based on network trained on latent vectors) Liao et al. 2017 On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses Athalye and Carlini 2018
Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models Defense-GAN (Like PixelDefend, but with a GAN instead of a PixelCNN) Samangouei et al. 2018 Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples Athalye et al. 2018
Deflecting Adversarial Attacks with Pixel Deflection Pixel Deflection (input preprocessing) Prakash et al. 2018 On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses Athalye and Carlini 2018
Detecting Adversarial Examples from Sensitivity Inconsistency of Spatial-Transform Domain primal & dual classifier + sensitivity statistics Tian et al. 2021 Evading Adversarial Example Detection Defenses with Orthogonal Projected Gradient Descent Bryniarski et al. 2021
Detecting Adversarial Samples from Artifacts distributional detection (density) Feinman et al. 2017 Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods Carlini and Wagner 2017
Detecting Adversarial Samples from Artifacts distributional detection (Bayesian uncertainty) Feinman et al. 2017 Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods Carlini and Wagner 2017
Detection based Defense against Adversarial Examples from the Steganalysis Point of View Input preprocessing (analysize images for »hidden features« and train a binary classifier to detect Aes) Liu et al. 2018 Evading Adversarial Example Detection Defenses with Orthogonal Projected Gradient Descent Bryniarski et al. 2021
Dimensionality Reduction as a Defense against Evasion Attacks on Machine Learning Classifiers dimensionality reduction Bhagoji et al. 2017 Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods Carlini and Wagner 2017
Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks Retraining with soft labels Papernot et al. 2016 Towards Evaluating the Robustness of Neural Networks Carlini and Wagner 2017
DLA: Dense-Layer-Analysis for Adversarial Example Detection DLA: advernarial retraining on Benign/Adversarial pairs, binary classifier on hidden layer activations Sperl et al. 2019 Evading Adversarial Example Detection Defenses with Orthogonal Projected Gradient Descent Bryniarski et al. 2021
Early Methods for Detecting Adversarial Images dimensionality reduction (PCA) Hendrycks and Gimpel 2016 Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods Carlini and Wagner 2017
Efficient Defenses Against Adversarial Attacks Training data augmentation, BReLU activation Zantedeschi et al. 2017 MagNet and "Efficient Defenses Against Adversarial Attacks" are Not Robust to Adversarial Examples Carlini and Wagner 2017
EMPIR: Ensembles of Mixed Precision Deep Networks for Increased Robustness against Adversarial Attacks Training an ensemble of classifiers with different precision in weights and activation Sen et al. 2020 On Adaptive Attacks to Adversarial Example Defenses Tramèr et al. 2020
Ensemble Adversarial Training: Attacks and Defenses Tramèr et al. 2017 GenAttack: Practical Black-box Attacks with Gradient-Free Optimization Alzantot et al. 2018
Error Correcting Output Codes Improve Probability Estimation and Adversarial Robustness of Deep Neural Networks Training a diverse ensemble of binary classifiers (on partitions of the classes) Verma and Swami 2019 On Adaptive Attacks to Adversarial Example Defenses Tramèr et al. 2020
Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks dimensionality reduction (color space and median filter) Xu et al. 2017 Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong He et al. 2017
Gotta Catch 'Em All: Using Honeypots to Catch Adversarial Attacks on Neural Networks Honeypot: Lure attackers to generate obvious Aes Shan et al. 2019 Evading Adversarial Example Detection Defenses with Orthogonal Projected Gradient Descent Bryniarski et al. 2021
Improving Adversarial Robustness via Promoting Ensemble Diversity Pang et al. 2019 Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks Croce and Hein 2020
Improving Adversarial Robustness via Promoting Ensemble Diversity training via regularization a diverse ensemble of classifiers Pang et al. 2019 On Adaptive Attacks to Adversarial Example Defenses Tramèr et al. 2020
Jacobian Adversarially Regularized Networks for Robustness Chan et al. 2019 Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks Croce and Hein 2020
MagNet: a Two-Pronged Defense against Adversarial Examples MagNet Meng and Chen 2017 MagNet and "Efficient Defenses Against Adversarial Attacks" are Not Robust to Adversarial Examples Carlini and Wagner 2017
ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation Preprocess input by randomly discarding parts of it, use matrix estimation on gaps, train on that input Yang et al. 2019 On Adaptive Attacks to Adversarial Example Defenses Tramèr et al. 2020
Mitigating Adversarial Effects Through Randomization Input preprocessing (randomized rescaling and randomized padding) Xie et al. 2017 Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples Athalye et al. 2018
Mitigating Evasion Attacks to Deep Neural Networks via Region-based Classification Cao and Gong 2017 Decision Boundary Analysis of Adversarial Examples He et al. 2018
Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks Inject randomness into inference (interpolate input multiple times with random samples and average prediction on those) Pang et al. 2019 On Adaptive Attacks to Adversarial Example Defenses Tramèr et al. 2020
On Detecting Adversarial Perturbations adversarial retraining (binary classifier) Metzen et al. 2017 Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods Carlini and Wagner 2017
On the (Statistical) Detection of Adversarial Examples adversarial retraining (»trash class«) Grosse et al. 2017 Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods Carlini and Wagner 2017
On the (Statistical) Detection of Adversarial Examples distributional detection (maximum mean discrepancy) Grosse et al. 2017 Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods Carlini and Wagner 2017
PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples PixelDefend (Use a generative model [PixelCNN] to project data back to the manifold) Song et al. 2017 Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples Athalye et al. 2018
Randomization matters How to defend against strong adversarial attacks Boosted Adversarial Training (BAT): Combine an adversarially trained network AT and a regular network trained on adversarial examples for AT Pinot et al. 2020 Adversarial Vulnerability of Randomized Ensembles Dbouk and Shanbhag 2022
Resisting Adversarial Attacks by k-Winners-Take-All Retraining, k-Winner-Take-All layers (instead of ReLU) Xiao et al. 2019 On Adaptive Attacks to Adversarial Example Defenses Tramèr et al. 2020
Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness Training using MMC loss Pang et al. 2019 On Adaptive Attacks to Adversarial Example Defenses Tramèr et al. 2020
Robustness to Adversarial Examples through an Ensemble of Specialists Ensemble of classifiers operating on class subsets Abbasi and Gagné 2017 Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong He et al. 2017
Stochastic Activation Pruning for Robust Adversarial Defense Inject randomness into inference (»weighted dropout«) Dhillon et al. 2018 Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples Athalye et al. 2018
The Odds are Odd: A Statistical Test for Detecting Adversarial Examples Inject randomness into inference (noisy input), statistical test Roth et al. 2019 On Adaptive Attacks to Adversarial Example Defenses Tramèr et al. 2020
Thermometer Encoding: One Hot Way To Resist Adversarial Examples Retraining with discretized inputs Buckman et al. 2018 Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples Athalye et al. 2018
Thwarting Adversarial Examples: An L0-RobustSparse Fourier Transform Input preprocessing (»compression« and projection to discrete cosine transformation coefficients) Bafna et al. 2018 On Adaptive Attacks to Adversarial Example Defenses Tramèr et al. 2020
Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One Grathwohl et al. 2019 Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks Croce and Hein 2020

If you are aware of a defense, for which it has been proven that it is essentially less robust than the corresponding undefended neural network, or than random guessing, please let me know! I would be very glad if you'd open an issue or submit a pull request, providing the necessary information (defense publication and attacker publication most importantly). You may also send me an email to [email protected]. This collection is meant to be updated over time.


