Sampiran is a dataset for pantun generation. It consists of 7.8K Indonesian pantun, collected from various sources (online). Pantun is a traditional Malay poem consisting of four lines: two lines of deliverance and two lines of message. This dataset filtered the gathered Pantun to follow the general rules of Pantun; four lines with ABAB rhyme and eight to twelve syllables per line.
No split.
- 2022-12-01 v1.0
- Initial dataset
- Sampiran v1.0 was built by Emmanuella Anggi Siallagan for her "Studi Mandiri" course at MIK, Faculty of Computer Science, Universitas Indonesia in 2022.
Please cite the following paper if you use this dataset for your project/publication:
@article{siallagan2023,
author = {Siallagan, Emmanuella Anggi and Alfina, Ika},
journal = {Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information)},
keywords = {gpt-2,pantun,poetry generation,seqgan,text generation},
pages = {59--67},
title = {{Poetry Generation for Indonesian Pantun : Comparison Between SeqGAN and GPT-2}},
volume = {16},
number = {1},
year = {2023}
}
You can use this dataset for free. You don't need our permission to use it. Please cite our paper if your work uses our data in your publication. Please note that you are not allowed to create a copy of this dataset and share it publicly in your own repository without our permission.
ika.alfina [at] cs.ui.ac.id