Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Summary: Possibly breaking changes: - Set global numpy seed (4a7cd58) - Split `in_proj_weight` into separate k, v, q projections in MultiheadAttention (fdf4c3e) - TransformerEncoder returns namedtuples instead of dict (27568a7) New features: - Add `--fast-stat-sync` option (e1ba32a) - Add `--empty-cache-freq` option (315c463) - Support criterions with parameters (ba5f829) New papers: - Simple and Effective Noisy Channel Modeling for Neural Machine Translation (49177c9) - Levenshtein Transformer (86857a5, ...) - Cross+Self-Attention for Transformer Models (4ac2c5f) - Jointly Learning to Align and Translate with Transformer Models (1c66792) - Reducing Transformer Depth on Demand with Structured Dropout (dabbef4) - Unsupervised Cross-lingual Representation Learning at Scale (XLM-RoBERTa) (e23e5ea) - BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (a92bcda) - CamemBERT: a French BERT (b31849a) Speed improvements: - Add CUDA kernels for LightConv and DynamicConv (f840564) - Cythonization of various dataloading components (4fc3953, ...) - Don't project mask tokens for MLM training (718677e) Pull Request resolved: #1452 Differential Revision: D18798409 Pulled By: myleott fbshipit-source-id: 860a0d5aaf7377c8c9bd63cdb3b33d464f0e1727
- Loading branch information