DataScience, Spring 2019

Project Stage 1 - Named Entity Recognition

Prerequisites

Tools and Libraries

Python 3.6
Pandas, Numpy
NLTK

Training data

Annotated and raw documents are to be present in the required directories and format

Usage

Run setup.sh as shown below to download pretrained GloVe [1] word vectors:
```
$ bash setup.sh
```
Run model.py with Python 3.6 as follows. This file performs 4 fold cross validation on the training data (~200 documents) to choose the best model among SVM, Logistic Regression, Decision Tree and Random Forest. Then it evaluates the best model on the testing data (~100 documents) and reports its Precision, Recall and F1 score.
```
$ python model.py
```

References

[1] Jeffrey Pennington and Richard Socher and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. Empirical Methods in Natural Language Processing (EMNLP)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DataScience, Spring 2019

Project Stage 1 - Named Entity Recognition

Prerequisites

Tools and Libraries

Training data

Usage

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

DataScience, Spring 2019

Project Stage 1 - Named Entity Recognition

Prerequisites

Tools and Libraries

Training data

Usage

References