GitHub - hsgodhia/agm_language_model: Aggregate markov language model (Saul & Pereira)

About

An implementation of Aggregate and mixed-order Markov models for statistical language processing as described in the Saul & Pereira 1997 paper.

Author

Harshal Godhia
[email protected]

Files

lm.py
Contains all the python code for training and testing the model.

model_params.pkl
A pre-trained model whose parameters are serialized and saved into this file

test_sentences.txt
Sample sentences used througout the questions for the homework

FAQ

Q1 Which file do I run?
A1 lm.py will use a pre-trained model and sample sentences from the file test_sentences.txt to evaluate sentence probability

Q2 What packages do i need to run?
A2 This code is tested in the environment Python 3.6.0 64-bit. You would need the following packages installed nltk, nltk.data(brown corpus), numpy, matplotlib

Q3Where do I add new test sentence whose probability I want?
A3Add new sentences, one sentence per line, in the file test_sentences.txt. the code will report sentence probability for the sentence and its reverse ordering. Incase you want a custom ordering then use add the delimiter || on the same line and another sentence.

Some details
There are lots of sentences already prepopulated (that can serve as examples if you like to add more test cases)
Please ensure all words are in the vocabulary, it will throw an exception otherwise (not handled by code)

Q4 What is model_params.pkl
A4 It is a pre-trained model (emission and transition matrices) which I found to give good numbers for the questions in the assignment. Since EM approaches a local optimum and depends on random initialization, the results like the ratio for the colorless sentence may not be reproduced. Hence I saved the best model (after different tries) in this file.

Q5 How do I train a new model?
A5 Uncomment line #353. It will train a new model and save it into the file model_params.pkl replacing the pre-trained model

Some details
Training may take upto 4 mins overall
Please re-run incase the results don't match as reported in the writeup

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Pereira1997.pdf		Pereira1997.pdf
README.md		README.md
hw1_sol.docx		hw1_sol.docx
lm.py		lm.py
model_params.pkl		model_params.pkl
test_sentences.txt		test_sentences.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Author

Files

FAQ

About

Releases

Packages

Languages

hsgodhia/agm_language_model

Folders and files

Latest commit

History

Repository files navigation

About

Author

Files

FAQ

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages