singgalang

An auto generated NER dataset of 48K sentences

The datasets conforms with the dataset format of Stanford-NER.

Four named entity classes are used:

"Person" for person names
"Place" for place names
"Organisation" for organization names
"O" for others

References

The dataset may be used for free, but if you want to publish paper/publication using the dataset, please cite these publications:

Ika Alfina, Septiviana Savitri, and Mohamad Ivan Fanany, "Modified DBpedia Entities Expansion for Tagging Automatically NER Dataset", in Proceeding of 9th International Conference on Advanced Computer Science and Information Systems 2017 (ICACSIS 2017).
Ika Alfina, Ruli Manurung, and Mohamad Ivan Fanany, "DBpedia Entities Expansion in Automatically Building Dataset for Indonesian NER", in Proceeding of 8th International Conference on Advanced Computer Science and Information Systems 2016 (ICACSIS 2016).

How to create NER model using this dataset?

We suggest you to use the Stanford NER library.
The steps to create NER model using Stanford NER library are as follows:

Download Stanford-NER.
Download the dataset and its properties file (file with .prop extension)
Use Stanford NER classifier to create the model.
For example:
java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop singgalang.prop

I recommend to increase the heap size so you can train the dataset on computer with limited RAM. Add option like "-Xmx1024m" on the command, for example:

java -Xmx1024m -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop singgalang.prop

if this still doesn't work, increase the number. For example: "-Xmx8000m". This works for me :)

Let say this step will create a NER model file named "idner-model-singgalang.ser.gz"
Create or use a testing dataset. Lets say the file name is "testing.txt"
Evaluate the NER model using Stanford NER library
For example:
java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier idner-model-20k-mdee.ser.gz -testFile testing.txt

Licence

You can use this dataset for free. You don't need our permission to use it. Please cite our paper if your work uses our data in your publication. Please note that you are not allowed to create a copy of this dataset and share it publicly in your own repository without our permission.

Contact

ika.alfina [at] cs.ui.ac.id

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
LICENSE		LICENSE
README.md		README.md
SINGGALANG.tsv		SINGGALANG.tsv
singgalang.prop		singgalang.prop

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

singgalang

References

How to create NER model using this dataset?

Licence

Contact

About

Releases

Packages

License

ir-nlp-csui/singgalang

Folders and files

Latest commit

History

Repository files navigation

singgalang

References

How to create NER model using this dataset?

Licence

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages