GitHub

This is the repository for our paper "Impact of Sentence Representation Matching in Neural Machine Translation" and the original BiBERT Paper "BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation".

@article{jung2022impact,
  title={Impact of Sentence Representation Matching in Neural Machine Translation},
  author={Jung, Heeseung and Kim, Kangil and Shin, Jong-Hun and Na, Seung-Hoon and Jung, Sangkeun and Woo, Sangmin},
  journal={Applied Sciences},
  volume={12},
  number={3},
  pages={1313},
  year={2022},
  publisher={MDPI}
  abstract = "Most neural machine translation models are implemented as a conditional language model framework composed of encoder and decoder models. This framework learns complex and long-distant dependencies, but its deep structure causes inefficiency in training. Matching vector representations of source and target sentences improves the inefficiency by shortening the depth from parameters to costs and generalizes NMTs with different perspective to cross-entropy loss. In this paper, we propose matching methods to derive the cost based on constant word embedding vectors of source and target sentences. To find the best method, we analyze impact of the methods with varying structures, distance metrics, and model capacity in a French to English translation task. An optimally configured method is applied to English from and to French, Spanish, and German translation tasks. In the tasks, the method showed performance improvement by 3.23 BLEU in maximum, 0.71 in average. We evaluated the robustness of this method to various embedding distributions and models as conventional gated structures and transformer network, and empirical results showed that it has higher chance to improve performance in those variety."
}

@inproceedings{xu-etal-2021-bert,
    title = "{BERT}, m{BERT}, or {B}i{BERT}? A Study on Contextualized Embeddings for Neural Machine Translation",
    author = "Xu, Haoran  and
      Van Durme, Benjamin  and
      Murray, Kenton",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.534",
    pages = "6663--6675",
    abstract = "The success of bidirectional encoders using masked language models, such as BERT, on numerous natural language processing tasks has prompted researchers to attempt to incorporate these pre-trained models into neural machine translation (NMT) systems. However, proposed methods for incorporating pre-trained models are non-trivial and mainly focus on BERT, which lacks a comparison of the impact that other pre-trained models may have on translation performance. In this paper, we demonstrate that simply using the output (contextualized embeddings) of a tailored and suitable bilingual pre-trained language model (dubbed BiBERT) as the input of the NMT encoder achieves state-of-the-art translation performance. Moreover, we also propose a stochastic layer selection approach and a concept of a dual-directional translation model to ensure the sufficient utilization of contextualized embeddings. In the case of without using back translation, our best models achieve BLEU scores of 30.45 for En→De and 38.61 for De→En on the IWSLT{'}14 dataset, and 31.26 for En→De and 34.94 for De→En on the WMT{'}14 dataset, which exceeds all published numbers.",
}

Prerequisites

conda create -n bibert python=3.7
conda activate bibert

transformers >= 4.4.2
```
pip install transformers
```
Install our fairseq repo
```
cd BiBERT
pip install --editable ./
```
hydra = 1.0.3
```
pip install hydra-core==1.0.3
```

Training

The way to train a BiBERT model for translation is same with BiBERT. Note that use --concept_equalization to use our proposed matching method in training. The method is only worked on the training session.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github		.github
config		config
docs		docs
download_prepare		download_prepare
download_prepare_reverse		download_prepare_reverse
examples		examples
fairseq		fairseq
fairseq_cli		fairseq_cli
scripts		scripts
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
generate-dual-fine.sh		generate-dual-fine.sh
generate-dual.sh		generate-dual.sh
generate-rev-dual-fine.sh		generate-rev-dual-fine.sh
generate-rev-dual.sh		generate-rev-dual.sh
generate-rev.sh		generate-rev.sh
generate-wmt-dual.sh		generate-wmt-dual.sh
generate-wmt.sh		generate-wmt.sh
generate.sh		generate.sh
hubconf.py		hubconf.py
pyproject.toml		pyproject.toml
setup.py		setup.py
train-dual.sh		train-dual.sh
train-fine.sh		train-fine.sh
train-rev-dual-fine.sh		train-rev-dual-fine.sh
train-rev.sh		train-rev.sh
train-wmt-dual.sh		train-wmt-dual.sh
train-wmt.sh		train-wmt.sh
train.py		train.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prerequisites

Training

About

Releases

Packages

Languages

License

he1ght/BiBERT_CE

Folders and files

Latest commit

History

Repository files navigation

Prerequisites

Training

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages