On this page I collect neural network defenses against adversarial examples, which have been proven to be at most as robust as a simple baseline (the corresponding undefended model or a random guessing), or not robust at all. Many defenses listed here were proposed, when there was not yet consent regarding how to thoroughly evaluate neural network defenses. By publishing On Evaluating Adversarial Robustness and On Adaptive Attacks to Adversarial Example Defenses, Carlini et al. and Tramèr et al. gave suggestions on how to properly claim robustness of a newly developed neural network defense mechanism. A key suggestion was to adaptively attack the considered defense.

An attacker model, which assumes the attacker to be oblivious to the protection mechanism in place, is merely enough to perform sanity checks, but not enough to make reliable claims regarding robustness. Therefore, when faced with an adaptive attacker, many proposed defenses turned out to be quite less robust than previously claimed.

My goal by curating this collection, is not at all to blame the authors of the listed defenses. As mentioned above, the methodology of how to properly evaluate a defense had to develop over the past few years. Instead, my goals are to

urge practitioners not aware of the state of the art in the machine learning robustness research, to not blindly use the collected defense mechanism in their applications
gather in a single place ideas, which (in the way they were implemented) do not enhance robustness of neural networks

If you are instead looking for state of the art robust models, I refer you to the collections hosted at Robust-ML and RobustBench.

These are the sources I used to curate my collection:

Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods, Carlini and Wagner 2017
MagNet and "Efficient Defenses Against Adversarial Attacks" are Not Robust to Adversarial Examples, Carlini and Wagner 2017
Towards Evaluating the Robustness of Neural Networks, Carlini and Wagner 2017
Adversarial Example Defense: Ensembles of Weak Defenses are not Strong, He et al. 2017
On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses, Athalye and Carlini 2018
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples, Athalye et al. 2018
Evaluating and Understanding the Robustness of Adversarial Logit Pairing, Engstrom et al. 2018
Adversarial Risk and the Dangers of Evaluating Against Weak Attacks, Uesato et al. 2018
Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples?, Carlini 2019
Logit Pairing Methods Can Fool Gradient-Based Attacks, Mosbach et al. 2019
Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks, Croce and Hein 2020
Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective, Machado et al. 2020
On Adaptive Attacks to Adversarial Example Defenses, Tramèr et al. 2020
Evading Adversarial Example Detection Defenses with Orthogonal Projected Gradient Descent, Bryniarski et al. 2021
Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples, Pintor et al. 2021
Demystifying the Adversarial Robustness of Random Transformation Defenses, Sitawarin et al. 2021
Adversarial Vulnerability of Randomized Ensembles, Dbouk and Shanbhag 2022
Robust-ML
RobustBench

This is the actual collection of non-robust (or at least not as robust as claimed) defenses:

Defense	Basic Description	Author(s)	Year	Attacker Publication	Author(s)	Year
A Kernelized Manifold Mapping to Diminish the Effect of Adversarial Perturbations		Taghanaki et al.	2019	Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks	Croce and Hein	2020
A New Defense Against Adversarial Images: Turning a Weakness into a Strength	randomly perturbing input, checking if closest AE is further away than some threshold	Yu et al.	2019	On Adaptive Attacks to Adversarial Example Defenses	Tramèr et al.	2020
Adversarial and Clean Data Are Not Twins	adversarial retraining (binary classifier)	Gong et al.	2017	Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods	Carlini and Wagner	2017
Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks		Mustafa et al.	2019	Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks	Croce and Hein	2020
Adversarial Example Detection and Classification With Asymmetrical Adversarial Training	combine base classifier with robust, binary »class-predicate-classifiers«	Yin et al.	2019	On Adaptive Attacks to Adversarial Example Defenses	Tramèr et al.	2020
Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics	hidden layer statistics (PCA on conv)	Li and Li	2016	Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods	Carlini and Wagner	2017
Adversarial Logit Pairing	Adversarial retraining (force logit similarity of benign and adversarial image pairs)	Kannan et al.	2018	Evaluating and Understanding the Robustness of Adversarial Logit Pairing	Engstrom et al.	2018
APE-GAN: Adversarial Perturbation Elimination with GAN	APE-GAN (similar to MagNet)	Shen et al.	2017	MagNet and "Efficient Defenses Against Adversarial Attacks" are Not Robust to Adversarial Examples	Carlini and Wagner	2017
Are Generative Classifiers More Robust to Adversarial Attacks?		Li et al.	2018	On Adaptive Attacks to Adversarial Example Defenses	Tramèr et al.	2020
Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples	Retraining by amplifying »important« neuron weigths	Tao et al.	2018	Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples?	Carlini	2019
Barrage of Random Transforms for Adversarially Robust Defense (BaRT)	»… stochastically combining a large number of individually weak defenses into a single barrage of randomized transformations to build a strong defense …«	Raff et al.	2019	Demystifying the Adversarial Robustness of Random Transformation Defenses	Sitawarin et al.	2021
Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality	Input statistics (local intrinsic dimensionality, LID)	Ma et al.	2018	Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples	Athalye et al.	2018
Countering Adversarial Images using Input Transformations	Input preprocessing (cropping, scaling, compression, …)	Guo et al.	2017	Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples	Athalye et al.	2018
Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning		Papernot and McDaniel	2018	On the Robustness of Deep K-Nearest Neighbors	Sitawarin and Wagner	2019
Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser	Input preprocessing (denoising based on network trained on latent vectors)	Liao et al.	2017	On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses	Athalye and Carlini	2018
Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models	Defense-GAN (Like PixelDefend, but with a GAN instead of a PixelCNN)	Samangouei et al.	2018	Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples	Athalye et al.	2018
Deflecting Adversarial Attacks with Pixel Deflection	Pixel Deflection (input preprocessing)	Prakash et al.	2018	On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses	Athalye and Carlini	2018
Detecting Adversarial Examples from Sensitivity Inconsistency of Spatial-Transform Domain	primal & dual classifier + sensitivity statistics	Tian et al.	2021	Evading Adversarial Example Detection Defenses with Orthogonal Projected Gradient Descent	Bryniarski et al.	2021
Detecting Adversarial Samples from Artifacts	distributional detection (density)	Feinman et al.	2017	Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods	Carlini and Wagner	2017
Detecting Adversarial Samples from Artifacts	distributional detection (Bayesian uncertainty)	Feinman et al.	2017	Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods	Carlini and Wagner	2017
Detection based Defense against Adversarial Examples from the Steganalysis Point of View	Input preprocessing (analysize images for »hidden features« and train a binary classifier to detect Aes)	Liu et al.	2018	Evading Adversarial Example Detection Defenses with Orthogonal Projected Gradient Descent	Bryniarski et al.	2021
Dimensionality Reduction as a Defense against Evasion Attacks on Machine Learning Classifiers	dimensionality reduction	Bhagoji et al.	2017	Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods	Carlini and Wagner	2017
Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks	Retraining with soft labels	Papernot et al.	2016	Towards Evaluating the Robustness of Neural Networks	Carlini and Wagner	2017
DLA: Dense-Layer-Analysis for Adversarial Example Detection	DLA: advernarial retraining on Benign/Adversarial pairs, binary classifier on hidden layer activations	Sperl et al.	2019	Evading Adversarial Example Detection Defenses with Orthogonal Projected Gradient Descent	Bryniarski et al.	2021
Early Methods for Detecting Adversarial Images	dimensionality reduction (PCA)	Hendrycks and Gimpel	2016	Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods	Carlini and Wagner	2017
Efficient Defenses Against Adversarial Attacks	Training data augmentation, BReLU activation	Zantedeschi et al.	2017	MagNet and "Efficient Defenses Against Adversarial Attacks" are Not Robust to Adversarial Examples	Carlini and Wagner	2017
EMPIR: Ensembles of Mixed Precision Deep Networks for Increased Robustness against Adversarial Attacks	Training an ensemble of classifiers with different precision in weights and activation	Sen et al.	2020	On Adaptive Attacks to Adversarial Example Defenses	Tramèr et al.	2020
Ensemble Adversarial Training: Attacks and Defenses		Tramèr et al.	2017	GenAttack: Practical Black-box Attacks with Gradient-Free Optimization	Alzantot et al.	2018
Error Correcting Output Codes Improve Probability Estimation and Adversarial Robustness of Deep Neural Networks	Training a diverse ensemble of binary classifiers (on partitions of the classes)	Verma and Swami	2019	On Adaptive Attacks to Adversarial Example Defenses	Tramèr et al.	2020
Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks	dimensionality reduction (color space and median filter)	Xu et al.	2017	Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong	He et al.	2017
Gotta Catch 'Em All: Using Honeypots to Catch Adversarial Attacks on Neural Networks	Honeypot: Lure attackers to generate obvious Aes	Shan et al.	2019	Evading Adversarial Example Detection Defenses with Orthogonal Projected Gradient Descent	Bryniarski et al.	2021
Improving Adversarial Robustness via Promoting Ensemble Diversity		Pang et al.	2019	Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks	Croce and Hein	2020
Improving Adversarial Robustness via Promoting Ensemble Diversity	training via regularization a diverse ensemble of classifiers	Pang et al.	2019	On Adaptive Attacks to Adversarial Example Defenses	Tramèr et al.	2020
Jacobian Adversarially Regularized Networks for Robustness		Chan et al.	2019	Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks	Croce and Hein	2020
MagNet: a Two-Pronged Defense against Adversarial Examples	MagNet	Meng and Chen	2017	MagNet and "Efficient Defenses Against Adversarial Attacks" are Not Robust to Adversarial Examples	Carlini and Wagner	2017
ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation	Preprocess input by randomly discarding parts of it, use matrix estimation on gaps, train on that input	Yang et al.	2019	On Adaptive Attacks to Adversarial Example Defenses	Tramèr et al.	2020
Mitigating Adversarial Effects Through Randomization	Input preprocessing (randomized rescaling and randomized padding)	Xie et al.	2017	Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples	Athalye et al.	2018
Mitigating Evasion Attacks to Deep Neural Networks via Region-based Classification		Cao and Gong	2017	Decision Boundary Analysis of Adversarial Examples	He et al.	2018
Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks	Inject randomness into inference (interpolate input multiple times with random samples and average prediction on those)	Pang et al.	2019	On Adaptive Attacks to Adversarial Example Defenses	Tramèr et al.	2020
On Detecting Adversarial Perturbations	adversarial retraining (binary classifier)	Metzen et al.	2017	Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods	Carlini and Wagner	2017
On the (Statistical) Detection of Adversarial Examples	adversarial retraining (»trash class«)	Grosse et al.	2017	Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods	Carlini and Wagner	2017
On the (Statistical) Detection of Adversarial Examples	distributional detection (maximum mean discrepancy)	Grosse et al.	2017	Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods	Carlini and Wagner	2017
PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples	PixelDefend (Use a generative model [PixelCNN] to project data back to the manifold)	Song et al.	2017	Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples	Athalye et al.	2018
Randomization matters How to defend against strong adversarial attacks	Boosted Adversarial Training (BAT): Combine an adversarially trained network AT and a regular network trained on adversarial examples for AT	Pinot et al.	2020	Adversarial Vulnerability of Randomized Ensembles	Dbouk and Shanbhag	2022
Resisting Adversarial Attacks by k-Winners-Take-All	Retraining, k-Winner-Take-All layers (instead of ReLU)	Xiao et al.	2019	On Adaptive Attacks to Adversarial Example Defenses	Tramèr et al.	2020
Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness	Training using MMC loss	Pang et al.	2019	On Adaptive Attacks to Adversarial Example Defenses	Tramèr et al.	2020
Robustness to Adversarial Examples through an Ensemble of Specialists	Ensemble of classifiers operating on class subsets	Abbasi and Gagné	2017	Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong	He et al.	2017
Stochastic Activation Pruning for Robust Adversarial Defense	Inject randomness into inference (»weighted dropout«)	Dhillon et al.	2018	Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples	Athalye et al.	2018
The Odds are Odd: A Statistical Test for Detecting Adversarial Examples	Inject randomness into inference (noisy input), statistical test	Roth et al.	2019	On Adaptive Attacks to Adversarial Example Defenses	Tramèr et al.	2020
Thermometer Encoding: One Hot Way To Resist Adversarial Examples	Retraining with discretized inputs	Buckman et al.	2018	Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples	Athalye et al.	2018
Thwarting Adversarial Examples: An L0-RobustSparse Fourier Transform	Input preprocessing (»compression« and projection to discrete cosine transformation coefficients)	Bafna et al.	2018	On Adaptive Attacks to Adversarial Example Defenses	Tramèr et al.	2020
Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One		Grathwohl et al.	2019	Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks	Croce and Hein	2020

If you are aware of a defense, for which it has been proven that it is essentially less robust than the corresponding undefended neural network, or than random guessing, please let me know! I would be very glad if you'd open an issue or submit a pull request, providing the necessary information (defense publication and attacker publication most importantly). You may also send me an email to ypo@informatik.uni-kiel.de. This collection is meant to be updated over time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Files

README.md

Latest commit

History

README.md

File metadata and controls