Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
Report_Assignment_1.pdf		Report_Assignment_1.pdf
TEXTCZ2.ptg		TEXTCZ2.ptg
TEXTEN2.ptg		TEXTEN2.ptg
code_brill_demo.py		code_brill_demo.py
main.py		main.py

README.md

Brill's Tagger and Tagger Evaluation

Download Eric Brill's supervised tagger from UFAL's course assignment space. Install it (i.e., uncompress (gunzip), untar, and make). You might need to make some changes in his makefile of course (it's and OLD program, in this fast changing world...).

Alternativelly, you can use the NLTK implementation in Python (pip install --user -U nltk).

After installation, get the data, train it on as much data from T as time allows (in the package, there is an extensive documentation on how to train it on new data), and evaluate on data S. Tabulate the results.

Do cross-validation of the results: split the data into S', [H',] T' such that S' is the first 40,000 words, and T' is the last but the first 20,000 words from the rest. Train Eric Brill's tagger on T' (again, use as much data as time allows) and evaluate on S'. Again, tabulate the results.

Do three more splits of your data (using the same formula: 40k/20k/the rest) in some way or another (as different as possible), and get another three sets of results. Compute the mean (average) accuracy and the standard deviation of the accuracy. Tabulate all results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Brill's Tagger

Brill's Tagger

README.md

Brill's Tagger and Tagger Evaluation

Files

Brill's Tagger

Directory actions

More options

Directory actions

More options

Latest commit

History

Brill's Tagger

Folders and files

parent directory

README.md

Brill's Tagger and Tagger Evaluation