On this page I collect neural network defenses against adversarial examples, which have been proven to be at most as robust as a simple baseline (the corresponding undefended model or a random guessing), or not robust at all. Many defenses listed here were proposed, when there was not yet consent regarding how to thoroughly evaluate neural network defenses. By publishing On Evaluating Adversarial Robustness and On Adaptive Attacks to Adversarial Example Defenses, Carlini et al. and Tramèr et al. gave suggestions on how to properly claim robustness of a newly developed neural network defense mechanism. A key suggestion was to adaptively attack the considered defense.
An attacker model, which assumes the attacker to be oblivious to the protection mechanism in place, is merely enough to perform sanity checks, but not enough to make reliable claims regarding robustness. Therefore, when faced with an adaptive attacker, many proposed defenses turned out to be quite less robust than previously claimed.
My goal by curating this collection, is not at all to blame the authors of the listed defenses. As mentioned above, the methodology of how to properly evaluate a defense had to develop over the past few years. Instead, my goals are to
- urge practitioners not aware of the state of the art in the machine learning robustness research, to not blindly use the collected defense mechanism in their applications
- gather in a single place ideas, which (in the way they were implemented) do not enhance robustness of neural networks
If you are instead looking for state of the art robust models, I refer you to the collections hosted at Robust-ML and RobustBench.
These are the sources I used to curate my collection:
- Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods, Carlini and Wagner 2017
- MagNet and "Efficient Defenses Against Adversarial Attacks" are Not Robust to Adversarial Examples, Carlini and Wagner 2017
- Towards Evaluating the Robustness of Neural Networks, Carlini and Wagner 2017
- Adversarial Example Defense: Ensembles of Weak Defenses are not Strong, He et al. 2017
- On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses, Athalye and Carlini 2018
- Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples, Athalye et al. 2018
- Evaluating and Understanding the Robustness of Adversarial Logit Pairing, Engstrom et al. 2018
- Adversarial Risk and the Dangers of Evaluating Against Weak Attacks, Uesato et al. 2018
- Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples?, Carlini 2019
- Logit Pairing Methods Can Fool Gradient-Based Attacks, Mosbach et al. 2019
- Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks, Croce and Hein 2020
- Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective, Machado et al. 2020
- On Adaptive Attacks to Adversarial Example Defenses, Tramèr et al. 2020
- Evading Adversarial Example Detection Defenses with Orthogonal Projected Gradient Descent, Bryniarski et al. 2021
- Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples, Pintor et al. 2021
- Demystifying the Adversarial Robustness of Random Transformation Defenses, Sitawarin et al. 2021
- Adversarial Vulnerability of Randomized Ensembles, Dbouk and Shanbhag 2022
- Robust-ML
- RobustBench
This is the actual collection of non-robust (or at least not as robust as claimed) defenses:
If you are aware of a defense, for which it has been proven that it is essentially less robust than the corresponding undefended neural network, or than random guessing, please let me know! I would be very glad if you'd open an issue or submit a pull request, providing the necessary information (defense publication and attacker publication most importantly). You may also send me an email to [email protected]. This collection is meant to be updated over time.