Skip to content

ir-nlp-csui/etos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ETOS: A Spelling Correction Dataset for Formal Indonesian Text

Summary

ETOS (Ejaan oTOmatiS) is a dataset for automatic spelling correction for formal Indonesian text. It consists of 200 sentences with each sentence contains at least one typo. It has 4,323 tokens with 288 of them are non-word errors.

Dataset Split

Since this dataset is very small, we do not define any split.

Changelog

  • 2022-12-01 v1.0
    • Initial dataset

Acknowledgments

  • ETOS v1.0 was built by M. Nirwan Samsuri for his master thesis at Faculty of Computer Science, Universitas Indonesia in 2022.

References

Please cite the following paper if you use this dataset for your project/publication:

  • Mukhlizar Nirwan Samsuri, Arlisa Yuliawati, and Ika Alfina. A Comparison of Distributed, PAM, and Trie Data Structure Dictionaries in Automatic Spelling Correction for Indonesian Formal Text. In the Proceeding of 2022 5th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI) (Accepted).

Licence

You can use this dataset for free. You don't need our permission to use it. Please cite our paper if your work uses our data in your publication. Please note that you are not allowed to create a copy of this dataset and share it publicly in your own repository without our permission.

Contact

ika.alfina [at] cs.ui.ac.id

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published