PyInfVoc

PyInfVoc is an Online Latent Dirichlet Allocation with Infinite Vocabulary topic modeling package based on Variational Bayesian learning approach under online settings, developed by the Cloud Computing Research Team in [University of Maryland, College Park] (http://www.umd.edu). You may find more details about this project on our paper [Online Latent Dirichlet Allocation with Infinite Vocabulary] (http://kzhai.github.io/paper/2013_icml.pdf) appeared in ICML 2013.

Please download the latest version from our GitHub repository.

Please send any bugs of problems to Ke Zhai ([email protected]).

Install and Build

This package depends on many external python libraries, such as numpy, scipy and nltk. After downloading the source code packages, unzip the datasets to the 'input' directory. The package includes a few fundamental datasets --- ap, de-news and 20-newsgroup datasets.

Launch and Execute

Assume the PyInfVoc package is downloaded under directory $PROJECT_SPACE/src/, i.e.,

$PROJECT_SPACE/src/PyInfVoc

To prepare the example dataset,

tar zxvf de-news.tar.gz

To launch PyInfVoc, first redirect to the directory of PyInfVoc source code,

cd $PROJECT_SPACE/src/PyInfVoc

and run the following command on example dataset,

python -m launch_train --input_directory=./de-news/ --output_directory=./ --truncation_level=4000 --number_of_topics=10 --number_of_documents=9800 --training_iterations=100 --vocab_prune_interval=10 --batch_size=98 --alpha_beta=100

The generic argument to run PyLDA is

python -m launch_train --input_directory=$INPUT_DIRECTORY/$CORPUS_NAME --output_directory=$OUTPUT_DIRECTORY --number_of_topics=$NUMBER_OF_TOPICS --number_of_documents=$NUMBER_OF_DOCUMENTS --training_iterations=$TRAINING_ITERATIONS --batch_size=$BATCH_SIZE

You should be able to find the output at directory $OUTPUT_DIRECTORY/$CORPUS_NAME.

Under any circumstances, you may also get help information and usage hints by running the following command

python -m launch_train --help

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
20-news.tar.gz		20-news.tar.gz
README.md		README.md
__init__.py		__init__.py
ap.tar.gz		ap.tar.gz
de-news.tar.gz		de-news.tar.gz
hybrid.py		hybrid.py
launch_train.py		launch_train.py
nchar.py		nchar.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyInfVoc

Install and Build

Launch and Execute

About

Releases

Packages

Languages

kzhai/PyInfVoc

Folders and files

Latest commit

History

Repository files navigation

PyInfVoc

Install and Build

Launch and Execute

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages