Skip to content

Latest commit

 

History

History
94 lines (84 loc) · 2.49 KB

README.md

File metadata and controls

94 lines (84 loc) · 2.49 KB

Multilingual Question Answering

This repository contains ressources for question answering in multiple languages.

Data

In the data folder, there are 9 monolingual or cross-lingual datasets, based on SQuAD v1.1 dev set (https://rajpurkar.github.io/SQuAD-explorer/explore/1.1/dev/) and its translations on French and Japanese from https://github.com/AkariAsai/extractive_rc_by_runtime_mt . In the SQuAD dataset, the input is a question-paragraph pair in English and the output is the location of the answer in the paragraph. The 9 datasets correspond to the 9 combinations where the paragraph is in one language (among French, Japanese and English) and the question is in another language (among the same three possibilities).

If you use these datasets, please cite this paper : https://arxiv.org/abs/1910.04659 and the references mentioned above.

Results

In our paper "Multilingual Question Answering from Formatted Text applied to Conversational Agents", we trained multilingual BERT (https://github.com/google-research/bert) on the English training SQuAD v2.0 and we tested it on the 9 datasets mentioned above.

We compare the performances of mBERT on the monolingual French and Japanese test sets with a previously published baseline (https://github.com/AkariAsai/extractive_rc_by_runtime_mt):

French Japanese
F1 EM F1 EM
Baseline 61.88 40.67 52.19 37.00
Multilingual BERT 76.65 61.77 61.83 59.94

We also discuss the impressive results of mBert on cross-lingual datasets :

Question En Fr Jap
F1 EM F1 EM F1 EM
En 90.57 81.96 78.55 67.28 66.22 52.91
Context Fr 81.10 65.14 76.65 61.77 60.28 42.20
Jap 58.95 57.49 47.19 45.26 61.83 59.93