Skip to content

v0.3.0 Updated API and Bug Fixes

Compare
Choose a tag to compare
@jinyongyoo jinyongyoo released this 25 Jun 12:50
· 642 commits to master since this release
7855b9e

New Updated API

We have added two new classes called Attacker and Trainer that can be used to perform adversarial attacks and adversarial training with full logging support and multi-GPU parallelism. This is intended to provide an alternative way of performing attacks and training for custom models and datasets.

Attacker: Running Adversarial Attacks

Below is an example use of Attacker to attack BERT model finetuned on IMDB dataset using TextFooler method. AttackArgs class is used to set the parameters of the attacks, including the number of examples to attack, CSV file to log the results, and the interval at which to save checkpoint.

Screen Shot 2021-06-24 at 8 34 44 PM

More details about Attacker and AttackArgs can be found here.

Trainer: Running Adversarial Training

Previously, TextAttack supported adversarial training in a limited manner. Users could only train models using the CLI command, and not every aspects of training was available for tuning.

Trainer class introduces an easy way to train custom PyTorch/Transformers models on a custom dataset. Below is an example where we finetune BERT on IMDB dataset with an adversarial attack called DeepWordBug.

Screen Shot 2021-06-25 at 9 28 57 PM

Dataset

Previously, datasets passed to TextAttack were simply expected to be an iterable of (input, target) tuples. While this offers flexibility, it prevents users from passing key information about the dataset that TextAttack can use to provide better experience (e.g. label names, label remapping, input column names used for printing).

We instead explicitly define Dataset class that users can use or subclass for their own datasets.

Bug Fixes:

  • #467: Don't check self.target_max_score when it is already known to be None.
  • #417: Fixed bug where in masked_lm transformations only subwords were candidates for top_words.