idn-tagged-corpus-CSUI

Summary

Idn-tagged-corpus-CSUI is a manually tagged Indonesian POS tagging corpus consists of 10000 sentences.

Data Format

Each line consists of token with its respective part-of-speech tag separated by a tab character(\t). There is an empty line between sentences.

Format Data (versi Bahasa Indonesia)

Korpus ini menggunakan format tab-separated file (.tsv). Setiap baris berisi token beserta part-of-speech tag dari token tersebut yang terpisahkan oleh satu karakter tab(\t). Antar kalimat dipisahkan oleh satu baris kosong.

References

Authors

Ruli Manurung
Arawinda Dinakaramani
Fam Rashel
Andry Luthfi

@inproceedings{Dinakaramani2014,
author = {Dinakaramani, Arawinda and Rashel, Fam and Luthfi, Andry and Manurung, Ruli},
booktitle = {Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014},
doi = {10.1109/IALP.2014.6973519},
pages = {66--69},
title = {{Designing an Indonesian part of speech tagset and manually tagged Indonesian corpus}},
year = {2014} }

Page

For more details about this work, please visit http://bahasa.cs.ui.ac.id/postag/corpus

Changelog

2022
- The dataset was moved to the IR-NLP Lab repository
- The dataset name was changed from idn-tagged-corpus to idn-tagged-corpus-CSUI
2014
- Initial release at Fam Rashel's repository

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/.

You can use this dataset for free. You don't need our permission to use it. Please cite our paper if your work uses our data in your publication. Please note that you are not allowed to create a copy of this dataset and share it publicly in your own repository without our permission.

Contact

arawinda [at] cs.ui.ac.id

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
IndoNLU-split		IndoNLU-split
Indonesian_Manually_Tagged_Corpus.tsv		Indonesian_Manually_Tagged_Corpus.tsv
Indonesian_Manually_Tagged_Corpus_ID.tsv		Indonesian_Manually_Tagged_Corpus_ID.tsv
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

idn-tagged-corpus-CSUI

Summary

Data Format

Format Data (versi Bahasa Indonesia)

References

Authors

Page

Changelog

Contributing

License

Contact

About

Releases

Packages

Contributors 3

ir-nlp-csui/idn-tagged-corpus-CSUI

Folders and files

Latest commit

History

Repository files navigation

idn-tagged-corpus-CSUI

Summary

Data Format

Format Data (versi Bahasa Indonesia)

References

Authors

Page

Changelog

Contributing

License

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages