This repository contains the code to our (Jacob Dudek and Gerrit Bartels) bachelor thesis. We implemented 5 Neural Networks and compared their performance on the task of unconditional text generation. We set up a thorough evaluation scheme using common automatic evaluation methods and supplemented these results with a human survey. Furthermore, we propose two evaluation metrics based on the Jennsen-Shannon Distance that help with judging how well the underlying data distribution has been learnt.
As dataset for training all models we used a monolingual news crawl from the Fifth Conference on Machine Translation (WMT20) that can directly be obtained from the conference website. It contains approximately 44M English news sentences extracted from online newspaper articles that were published throughout 2019. After applying our preprocessing steps (see preprocessing notebook) we obtained a dataset comprising approximately 240k sentences with an average length of 18.65 and a vocabulary size of 6801.
To assess the capabilities of our models in unconditional text generation, we employed methods that evaluate the model outputs with respect to sentence diversity (D) and quality (Q), as well as how well the underlying data distribution was captured (C). We also conducted a survey to get an additional angle at judging model performance.
- JS Distance Sentence Lengths (C)
- JS Distance Token Counts (C)
- Test BLEU-4 (Q)
- Self BLEU-4 (D)
- Fréchet InferSent Distance (Q & D)
The survey was implemented in _magpie, a minimal architecture for generating portable interactive experiments and made available as a web-based online survey through the hosting service Netlify. We defined two tasks to elicit judgments about the overall quality of generation (Likert-scale rating) and the participants' likelihood of detecting whether a sentence was artificially generated (2-alternative forced-choice task).
LSTM LM |
cVAE LM |
GS GAN |
LaText GAN |
GPT-2 Small |
Real Data |
|
---|---|---|---|---|---|---|
Average Sent Length |
16.83 | 12.91 | 16.89 | 18.06 | 17.25 | 16.66 |
JS Distance Sent Length |
0.1471 | 0.4334 | 0.1677 | 0.3487 | 0.1206 | 0.0205 |
JS Distance Token Counts |
0.1441 | 0.2963 | 0.1437 | 0.5111 | 0.2444 | 0.1286 |
Top 12 Token Overlap |
12/12 | 10/12 | 12/12 | 7/12 | 11/12 | 12/12 |
Test BLEU-4 | 0.3136 | 0.0544 | 0.3258 | 0.2563 | 0.4536 | 0.3301 |
Self BLEU-4 | 0.3235 | 0.0904 | 0.3463 | 0.6746 | 0.5374 | 0.3282 |
FID | 0.369 | 0.9932 | 0.3606 | 1.9926 | 0.7368 | 0.3456 |
Results of the automatic evaluation methods applied to all models and, for reference, to the test data itself. The best results are highlighted in bold.
LSTM LM |
cVAE LM |
GS GAN |
LaText GAN |
GPT-2 Small |
Real Data |
|
---|---|---|---|---|---|---|
Average Fluency Rating |
3.0704 | 1.9296 | 3.1861 | 1.704 | 3.9025 | 4.3948 |
Confusion Rate in % |
23.81 | 9.93 | 20.37 | 9.03 | 29.82 | - |
Results of the human evaluation applied to all models and, for reference, to the test data itself. The best results are highlighted in bold.
- elizabeth warren suggests trump would win the u.s through congress, whereas president trump by his th year race as he staged a century.
- unable to read live in recent times, china is not long term.
- please note that radio had a site of panic and pre recorded surveillance books in the afternoon little body.
- and in san diego that may have been trumps remarks after a bitter short tournament.
- government employees, women and organisations have been focused on improving care and role to ensure guests be held responsible for their personal data.
- should multi billion dollar corporations zero emissions by ?
- the mother of a girl next to her was pushed too hard.
- labour responded that they should not vote by the snp, then we would need to get brexit done.
- but another west london, royal republic, won european international womens semi finals
- our future brexit will turn us once again, he said during his three day visit.
- when the new government started being introduced in october, there was no such thing as a result that could ever take place.
- some lawmakers are going to move forward in the next phase of the senate in a week, as congress does.
- she said: it did not feel right and i did not want this to be happening at all.
- however, he was left with a six year old who left with the job over £ .
- ministers way.
- twitter will you fell aside an additional public supply chain of women if
- nothing every divided on february on me on, but we for.
- it once certainly normally neither this all their scores remain on.
- professional annually.
- thirds he need kong he rt.com lanka rt.com lanka rt.com lanka rt.com lanka rt.com lanka rt.com lanka rt.com lanka rt.com lanka rt.com lanka rt.com lanka rt.com lanka rt.com lanka
- dow is not united do well in europe, but will be an interview.
- rt.com rt.com just angeles rt.com feel angeles thrones am angeles have knew the people, in that am not on .
- there two do in trump and an emergency and go on an emergency services to %.
- $ president a need to and no deal to climate change u.s border on monday.