ETOS: A Spelling Correction Dataset for Formal Indonesian Text

Summary

ETOS (Ejaan oTOmatiS) is a dataset for automatic spelling correction for formal Indonesian text. It consists of 200 sentences with each sentence contains at least one typo. It has 4,323 tokens with 288 of them are non-word errors.

Dataset Split

Since this dataset is very small, we do not define any split.

Changelog

2022-12-01 v1.0
- Initial dataset

Acknowledgments

ETOS v1.0 was built by M. Nirwan Samsuri for his master thesis at Faculty of Computer Science, Universitas Indonesia in 2022.

References

Please cite the following paper if you use this dataset for your project/publication:

Mukhlizar Nirwan Samsuri, Arlisa Yuliawati, and Ika Alfina. A Comparison of Distributed, PAM, and Trie Data Structure Dictionaries in Automatic Spelling Correction for Indonesian Formal Text. In the Proceeding of 2022 5th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI) (Accepted).

Licence

You can use this dataset for free. You don't need our permission to use it. Please cite our paper if your work uses our data in your publication. Please note that you are not allowed to create a copy of this dataset and share it publicly in your own repository without our permission.

Contact

ika.alfina [at] cs.ui.ac.id

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
LICENSE		LICENSE
README.md		README.md
gold_standard.conllu		gold_standard.conllu
kalimat_gold_standard.txt		kalimat_gold_standard.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETOS: A Spelling Correction Dataset for Formal Indonesian Text

Summary

Dataset Split

Changelog

Acknowledgments

References

Licence

Contact

About

Releases

Packages

Contributors 2

License

ir-nlp-csui/etos

Folders and files

Latest commit

History

Repository files navigation

ETOS: A Spelling Correction Dataset for Formal Indonesian Text

Summary

Dataset Split

Changelog

Acknowledgments

References

Licence

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages