Skip to content

ir-nlp-csui/sampiran

Repository files navigation

Sampiran: An Indonesian Pantun Dataset

Summary

Sampiran is a dataset for pantun generation. It consists of 7.8K Indonesian pantun, collected from various sources (online). Pantun is a traditional Malay poem consisting of four lines: two lines of deliverance and two lines of message. This dataset filtered the gathered Pantun to follow the general rules of Pantun; four lines with ABAB rhyme and eight to twelve syllables per line.

Dataset Split

No split.

Changelog

  • 2022-12-01 v1.0
    • Initial dataset

Acknowledgments

  • Sampiran v1.0 was built by Emmanuella Anggi Siallagan for her "Studi Mandiri" course at MIK, Faculty of Computer Science, Universitas Indonesia in 2022.

References

Please cite the following paper if you use this dataset for your project/publication:

@article{siallagan2023,
author = {Siallagan, Emmanuella Anggi and Alfina, Ika},
journal = {Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information)},
keywords = {gpt-2,pantun,poetry generation,seqgan,text generation},
pages = {59--67},
title = {{Poetry Generation for Indonesian Pantun : Comparison Between SeqGAN and GPT-2}},
volume = {16},
number = {1},
year = {2023}
}

Licence

You can use this dataset for free. You don't need our permission to use it. Please cite our paper if your work uses our data in your publication. Please note that you are not allowed to create a copy of this dataset and share it publicly in your own repository without our permission.

Contact

ika.alfina [at] cs.ui.ac.id

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published