Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
It's been a long time since our last release (0.9.0) nearly a year ago! There have been numerous changes and new features added since then, which we've tried to summarize below. While this release carries the same major version as our previous release (0.x.x), if you have code that relies on 0.9.0, it is likely you'll need to adapt it before updating to 0.10.0. Looking forward, this will also be the last significant release with the 0.x.x numbering. The next release will be 1.0.0 and will include a major migration to the [Hydra configuration system](https://github.com/facebookresearch/hydra), with an eye towards modularizing fairseq to be more usable as a library. Changelog: New papers: - [Reducing Transformer Depth on Demand with Structured Dropout (Fan et al., 2019)](https://github.com/pytorch/fairseq/tree/master/examples/layerdrop/README.md) - [MBART: Multilingual Denoising Pre-training for Neural Machine Translation ({Liu*,Gu*,Goyal*} et al., 2020)](https://github.com/pytorch/fairseq/blob/master/examples/mbart/README.md) - [Neural Machine Translation with Byte-Level Subwords (Wang et al., 2019)](https://github.com/pytorch/fairseq/blob/master/examples/byte_level_bpe/README.md) - [Training with Quantization Noise for Extreme Model Compression ({Fan*,Stock*} et al., 2019)](https://github.com/pytorch/fairseq/blob/master/examples/quant_noise/README.md) - [Monotonic Multihead Attention (Ma et al., 2020)](https://github.com/pytorch/fairseq/blob/master/examples/simultaneous_translation/README.md) - [Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)](examples/unsupervised_quality_estimation/README.md) - [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)](https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md) - [Lexically constrained decoding with dynamic beam allocation](examples/constrained_decoding/README.md) - [Generating Medical Reports from Patient-Doctor Conversations Using Sequence-to-Sequence Models (Enarvi et al., 2020)](examples/pointer_generator/README.md) - [Linformer: Self-Attention with Linear Complexity (Wang et al., 2020)](examples/linformer/README.md) - [Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)](examples/criss/README.md) - [Deep Transformers with Latent Depth (Li et al., 2020)](examples/latent_depth/README.md) - [Better Fine-Tuning by Reducing Representational Collapse (Aghajanyan et al. 2020)](examples/rxf/README.md) Major new features: - TorchScript support for Transformer and SequenceGenerator (PyTorch 1.6+ only) - Model parallel training support (see [Megatron-11b](https://github.com/pytorch/fairseq/tree/master/examples/megatron_11b)) - TPU support via `--tpu` and `--bf16` options (7751229) - Added [VizSeq (a visual analysis toolkit for evaluating fairseq models)](https://facebookresearch.github.io/vizseq/docs/getting_started/fairseq_example) - Migrated to Python logging (fb76dac) - Added “SlowMo” distributed training backend (0dac0ff) - Added Optimizer State Sharding (ZeRO) (5d7ed6a) - Added several features to improve speech recognition support in fairseq: CTC criterion, external ASR decoder support (currently only wav2letter decoder) with KenLM and fairseq language model fusion Minor features: - Added `--patience` for early stopping - Added `--shorten-method=[none|truncate|random_crop]` to language modeling (and other) tasks - Added `--eval-bleu` for computing BLEU scores during training (60fbf64) - Added support for training huggingface models (e.g. `hf_gpt2`) (2728f9b) - Added FusedLAMB optimizer (`--optimizer=lamb`) (f75411a) - Added LSTM-based language model (`lstm_lm`) (9f4256e) - Added dummy tasks and models for benchmarking (91f0534; a541b19) - Added tutorial and pretrained models for paraphrasing (630701e) - Support quantization for Transformer (6379573) - Support multi-GPU validation in fairseq-validate (2f7e3f3) - Support batched inference in hub interface (3b53962) - Support for language model fusion in standard beam search (5379461) Breaking changes: - Updated requirements to Python 3.6+ and PyTorch 1.5+ - Main entry point scripts (eval_lm.py, generate.py, etc.) removed from root directory into `fairseq_cli` - Changed format for generation output; `H-` now corresponds to tokenized system outputs and newly added `D-` lines correspond to detokenized outputs (f353913) - We now log the stats from the log-interval (displayed as `train_inner`) instead of a rolling average over each epoch. - SequenceGenerator/Scorer does not print alignment by default, re-enable with `--print-alignment` - Print base 2 scores in generation scripts (660d69f) - Incremental decoding interface changed to use `FairseqIncrementalState` (4e48c4a; 88185fc) - Refactor namespaces in Criterions to support library usage (introduce `LegacyFairseqCriterion` for BC) (46b773a) - Deprecate `FairseqCriterion::aggregate_logging_outputs` interface, use `FairseqCriterion::reduce_metrics` instead (8679339) - Moved `fairseq.meters` to `fairseq.logging.meters` and added new metrics aggregation module (`fairseq.logging.metrics`) (1e324a5; f8b795f) - Reset mid-epoch stats every log-interval steps (244835d) - Ignore duplicate entries in dictionary files (dict.txt) and support manual overwrite with `#fairseq:overwrite` option (dd1298e; 937535d) - Use 1-based indexing for epochs everywhere (aa79bb9) Minor interface changes: - Added `FairseqTask::begin_epoch` hook (122fc1d) - `FairseqTask::build_generator` interface changed (cd2555a) - Change `RobertaModel` base class to `FairseqEncoder` (307df56) - Expose `FairseqOptimizer.param_groups` property (8340b2d) - Deprecate `--fast-stat-sync` and replace with `FairseqCriterion::logging_outputs_can_be_summed` interface (fe6c2ed) - `--raw-text` and `--lazy-load` are fully deprecated; use `--dataset-impl` instead - Mixture of expert tasks moved to `examples/` (8845dcf) Performance improvements: - Use cross entropy from apex for improved memory efficiency (5065077) - Added buffered dataloading (`--data-buffer-size`) (4115317)
- Loading branch information