Skip to content

Unofficial PyTorch Implementation for pNLP-Mixer: an Efficient all-MLP Architecture for Language (https://arxiv.org/abs/2202.04350)

License

Notifications You must be signed in to change notification settings

maum-ai/pnlp-mixer

Repository files navigation

pNLP-Mixer - Unofficial PyTorch Implementation

pNLP-Mixer: an Efficient all-MLP Architecture for Language

Implementation of pNLP-Mixer in PyTorch and PyTorch Lightning.

pNLP-Mixer is the first successful application of the MLP-Mixer architecture in NLP. With a novel embedding-free projection layer, pNLP-Mixer shows performance comparable to transformer-based models (e.g. mBERT, RoBERTa) with significantly smaller parameter count and no expensive pretraining procedures.

Requirements

  • Python >= 3.6.10
  • PyTorch >= 1.8.0
  • PyTorch Lightning >= 1.4.3
  • All other requirements are listed in the requirements.txt file.

Configurations

Please check configuration examples and also comments in the cfg directory.

Commands

Caching Vocab Hashes

python projection.py -v VOCAB_FILE -c CFG_PATH -g NGRAM_SIZE -o OUTPUT_FILE
  • VOCAB_FILE: path to the vocab file that contains
  • CFG_PATH: path to the configurations file
  • NGRAM_SIZE: size of n-grams used during hashing
  • OUTPUT_FILE: path where the resulting .npy file will be stored

Training / Testing

python run.py -c CFG_PATH -n MODEL_NAME -m MODE -p CKPT_PATH
  • CFG_PATH: path to the configurations file
  • MODEL_NAME: model name to be used for pytorch lightning logging
  • MODE: train or test (default: train)
  • CKPT_PATH: (optional) checkpoint path to resume training from or to use for testing

Results

The checkpoints used for evaluation are available here.

MTOP

Model Size Reported Ours
pNLP-Mixer X-Small 76.9% 79.3%
pNLP-Mixer Base 80.8% 79.4%
pNLP-Mixer X-Large 82.3% 82.1%

MultiATIS

Model Size Reported Ours
pNLP-Mixer X-Small 90.0% 91.3%
pNLP-Mixer Base 92.1% 92.8%
pNLP-Mixer X-Large 91.3% 92.9%

* Note that the paper reports the performance on the MultiATIS dataset using a 8-bit quantized model, whereas our performance was measured using a 32-bit float model.

IMDB

Model Size Reported Ours
pNLP-Mixer X-Small 81.9% 81.5%
pNLP-Mixer Base 78.6% 82.2%
pNLP-Mixer X-Large 82.9% 82.9%

Paper

@article{fusco2022pnlp,
  title={pNLP-Mixer: an Efficient all-MLP Architecture for Language},
  author={Fusco, Francesco and Pascual, Damian and Staar, Peter},
  journal={arXiv preprint arXiv:2202.04350},
  year={2022}
}

Contributors

Special thanks to:

TODO

  • 8-bit quantization

About

Unofficial PyTorch Implementation for pNLP-Mixer: an Efficient all-MLP Architecture for Language (https://arxiv.org/abs/2202.04350)

Topics

Resources

License

Stars

Watchers

Forks

Languages