- kitty now takes in input a stopword list instead of a language (from which it gathered the stopwords)
- solving a bug in the whitespace preprocessing function
- adding a new preprocessing function that supports passing the stopwords as a list
- deprecating whitespace preprocessing
- minor fixes to kitty API
- breaking change to kitty API, now uses WhiteSpacePreprocessingStopwords.
- introducing kitty
- improving the documentation a lot
- new model introduced SuperCTM
- new model introduced β-CTM
- warning, breaking changes were introduced:
- the order of the parameters in CTMDataset was changed (now first is contextual embeddings)
- CTM takes in input bow_size, contextual_size instead of input_size and bert_size
- changed the name of the parameters in the dataset
- introduced early stopping
- introduced visualization with pyldavis
- removed constraint over pytorch version. This should solve problems for Windows users
- novel way to handle text, we now allow for an easy usage of training and testing data
- better visualization of the training progress and of the sampling process
- removed old stuff from the documentation
- some minor updates to the documentation
- adding a new method to visualize the topic using a wordcloud
- save and load will now generate a warning since the feature has not been tested
- adding a new and much simpler way to handle text for topic modeling
- introducing the two different classes for ZeroShotTM and CombinedTM
- depracating CTM class in favor of ZeroShotTM and CombinedTM
- adding support for Windows encoding by defaulting file load to UTF-8
- updated sentence-transformers version to 0.3.6
- beta support for model saving and loading
- new evaluation metrics based on coherence
- Introduced a method to predict the topics for a set of documents (supports multiple sampling to reduce variation)
- Adding some features to bert embeddings creation like increased batch size and progress bar
- Supporting training directly from lists without the need to deal with files
- Adding a simple quick preprocessing pipeline
- Updating sentence-transformers package to avoid errors
- Changed the encoding on file load for the SBERT embedding function
- Fixed bug over sparse matrices
- New feature handling sparse bow for optimized processing
- New method to return topic distributions for words
- Released models with the main features implemented
- First release on PyPI.