diff --git a/LICENSE b/LICENSE
index e69de29bb2..191ec87b9d 100644
--- a/LICENSE
+++ b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2024 Anton Frederik Thielmann
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
\ No newline at end of file
diff --git a/README.md b/README.md
index bbac07c099..a081977f46 100644
--- a/README.md
+++ b/README.md
@@ -16,77 +16,87 @@
-# STREAM
-We present STREAM, a Simplified Topic Retrieval, Exploration, and Analysis Module for user-friendly topic modelling and especially subsequent interactive topic visualization and analysis. Our paper can be found [here](https://aclanthology.org/2024.acl-short.41.pdf).
-
-# Table of Contents
-- [STREAM](#stream)
-- [Table of Contents](#table-of-contents)
- - [Speed](#speed)
-- [Installation](#installation)
-- [Available Models](#available-models)
-- [Available Metrics](#available-metrics)
-- [Available Datasets](#available-datasets)
-- [Usage](#usage)
- - [Preprocessing](#preprocessing)
- - [Model fitting](#model-fitting)
- - [Evaluation](#evaluation)
- - [Expressivity](#expressivity)
- - [Intruder Accuracy (INT)](#intruder-accuracy-int)
- - [Average Intruder Similarity (ISIM)](#average-intruder-similarity-isim)
- - [Intruder Shift (ISH)](#intruder-shift-ish)
- - [Hyperparameter optimization](#hyperparameter-optimization)
- - [Visualization](#visualization)
- - [Downstream Tasks](#downstream-tasks)
- - [How to use](#how-to-use)
- - [Contributing and Testing New Models](#contributing-and-testing-new-models)
- - [Steps for Contributing](#steps-for-contributing)
- - [Example Model Structure](#example-model-structure)
- - [Testing Your Model](#testing-your-model)
- - [Validation Criteria](#validation-criteria)
- - [Submitting Your Contribution](#submitting-your-contribution)
-- [Citation](#citation)
- - [STREAM](#stream-1)
- - [Metrics and CEDC](#metrics-and-cedc)
- - [TNTM](#tntm)
- - [DCTE](#dcte)
- - [CBC](#cbc)
-
-
-
-For better topic analysis, we implement multiple intruder-word based topic evaluation metrics. Additionally, we publicize multiple new datasets that can extend the so far very limited number of publicly available benchmark datasets in topic modeling. We integrate downstream interpretable analysis modules to enable users to easily analyse the created topics in downstream tasks together with additional tabular information.
-
-#### Speed
-Since most of STREAMs models are centered around Document embeddings, STREAM comes along with a set of pre-embedded datasets.
-Additionally, once a user fits a model that leverages document embeddings, the embeddings are saved and automatically loaded the next time the user wants to fit any model with the same set of embeddings.
+
STREAM: Simplified Topic Retrieval, Exploration, and Analysis Module
+- Topic Modeling Made Easy in Python -
-
-
-
-Installation
-=============
-stream_topic is available on PyPI. To install STREAM, you can either install it directly from the GitHub repository using the following command:
+We present STREAM, a Simplified Topic Retrieval, Exploration, and Analysis Module for User-Friendly and Interactive Topic Modeling and Visualization. Our paper can be found here.
-```sh
-pip install git+https://github.com/AnFreTh/STREAM.git
-```
-or simply install via:
-```sh
-pip install stream_topic
-```
-Make additionally sure to download the necessary [nltk](https://www.nltk.org/) ressources, e.g. via:
+
+
+
+
+ |
+ |
+
+
+
+ Table of Contents
+
+
+- [🏃 Quick Start](#-quick-start)
+- [🚀 Installation](#-installation)
+- [📦 Available Models](#-available-models)
+- [📊 Available Metrics](#-available-metrics)
+- [🗂️ Available Datasets](#️-available-datasets)
+- [🔧 Usage](#-usage)
+ - [🛠️ Preprocessing](#️-preprocessing)
+ - [🚀 Model fitting](#-model-fitting)
+ - [✅ Evaluation](#-evaluation)
+ - [🔍 Hyperparameter optimization](#-hyperparameter-optimization)
+ - [🖼️ Visualization](#️-visualization)
+ - [📈 Downstream Tasks](#-downstream-tasks)
+ - [🤝 Contributing and Testing New Models](#-contributing-and-testing-new-models)
+- [📜 Citation](#-citation)
+- [📝 License](#-license)
+
+
+# 🏃 Quick Start
+
+Get started with STREAM in just a few lines of code:
```python
-import nltk
-nltk.download('stopwords')
-nltk.download('punkt')
-nltk.download('wordnet')
-nltk.download('averaged_perceptron_tagger')
+from stream_topic.models import KmeansTM
+from stream_topic.utils import TMDataset
+
+dataset = TMDataset()
+dataset.fetch_dataset("BBC_News")
+dataset.preprocess(model_type="KmeansTM")
+
+model = KmeansTM()
+model.fit(dataset, n_topics=20)
+
+topics = model.get_topics()
+print(topics)
```
-# Available Models
+
+# 🚀 Installation
+
+You can install STREAM directly from PyPI or from the GitHub repository:
+
+1. **PyPI (Recommended)**:
+ ```bash
+ pip install stream_topic
+ ```
+
+2. **GitHub**:
+ ```bash
+ pip install git+https://github.com/AnFreTh/STREAM.git
+ ```
+
+3. **Download NLTK Resources**:
+ Ensure you have the necessary NLTK resources installed:
+ ```python
+ import nltk
+ nltk.download('stopwords')
+ nltk.download('punkt')
+ nltk.download('wordnet')
+ nltk.download('averaged_perceptron_tagger')
+ ```
+
+# 📦 Available Models
STREAM offers a variety of neural as well as non-neural topic models and we are always trying to incorporate more and new models. If you wish to incorporate your own model, or want another model incorporated please raise an issue with the required information. Currently, the following models are implemented:
@@ -160,7 +170,7 @@ STREAM offers a variety of neural as well as non-neural topic models and we are
-# Available Metrics
+# 📊 Available Metrics
Since evaluating topic models, especially automatically, STREAM implements numerous evaluation metrics. Especially, the intruder based metrics, while they might take some time to compute, have shown great correlation with human evaluation.
@@ -206,7 +216,7 @@ Since evaluating topic models, especially automatically, STREAM implements numer
-# Available Datasets
+# 🗂️ Available Datasets
To integrate custom datasets for modeling with STREAM, please follow the example notebook in the examples folder. For benchmarking new models, STREAM already includes the following datasets:
@@ -297,7 +307,7 @@ If you wish yo include and publish one of your datasets directly into the packag
-# Usage
+# 🔧 Usage
To use one of the available models, follow the simple steps below:
1. Import the necessary modules:
@@ -305,7 +315,7 @@ To use one of the available models, follow the simple steps below:
from stream_topic.models import KmeansTM
from stream_topic.utils import TMDataset
```
-## Preprocessing
+## 🛠️ Preprocessing
2. Get your dataset and preprocess for your model:
```python
dataset = TMDataset()
@@ -316,7 +326,7 @@ To use one of the available models, follow the simple steps below:
The specified model_type is optional and further arguments can be specified. Default steps are predefined for all included models.
Steps like stopword removal and lemmatizing are automatically performed for models like e.g. LDA.
-## Model fitting
+## 🚀 Model fitting
Fitting a model from STREAM follows a simple, sklearn-like logic and every model can be fit identically.
3. Choose the model you want to use and train it:
@@ -333,58 +343,9 @@ Depending on the model, check the documentation for hyperparameter settings. To
topics = model.get_topics()
```
-## Evaluation
-
-In this section, we describe the three metrics used to evaluate topic models' performance: **Intruder Shift (ISH)**, **Intruder Accuracy (INT)**, and **Average Intruder Similarity (ISIM)**.
-
-### Expressivity
-**Expressivity**, evaluates the meaningfulness of a topic by leveraging stopwords. Stopwords primarily serve a grammatical role and don't contribute to the document's meaning. The steps to calculate Expressivity are as follows:
-
-1. Compute vector embeddings for all stopwords and calculate their centroid embedding, ${\psi}$.
-2. For each topic, compute the weighted centroid of the top $Z$ words, normalized so that their weights sum up to 1: ${\gamma}_k = \frac{1}{Z}\sum_{i=1}^{Z} \phi_{k,i}{\omega_i}$.
-3. Calculate the cosine similarity between each topic centroid ${\gamma}_k$ and the stopword centroid ${\psi}$.
-4. The Expressivity metric is then defined as the average similarity across all $K$ topics:
-
-$$\small{EXPRS({\gamma}, {\psi}) = \frac{1}{K} \sum_{k=1}^{K} sim({\gamma}_k, {\psi})}$$
-
-Note that ${\gamma}_k$ is different from ${\mu}_k$, where the latter is the centroid of the document cluster associated with topic $t_k$. Expressivity can vary based on the chosen stopwords, allowing for domain-specific adjustments to evaluate a topic's expressivity based on a custom stopword set.
-
-This approach provides a quantifiable measure of how well a topic conveys meaningful information, distinct from grammatical structure alone.
-
-
-### Intruder Accuracy (INT)
-
-The **Intruder Accuracy (INT)** metric aims to improve the identification of intruder words within a topic. Here's how it works:
-
-1. Given the top Z words of a topic, randomly select an intruder word from another topic.
-2. Calculate the cosine similarity between all possible pairs of words within the set of the top Z words and the intruder word.
-3. Compute the fraction of top words for which the intruder has the least similar word embedding using the following formula:
-
-$$\small{INT(t_k) = \frac{1}{Z}\sum_{i=1}^Z {1}(\forall j: sim({\omega}_i, {\hat{\omega}}) < sim({\omega}_i, {\omega}_j))}$$
-
+## ✅ Evaluation
-INT measures how effectively the intruder word can be distinguished from the top words in a topic. A larger value is better.
-
-### Average Intruder Similarity (ISIM)
-
-The **Average Intruder Similarity (ISIM)** metric calculates the average cosine similarity between each word in a topic and an intruder word:
-$$ISIM(t_k) = \frac{1}{Z} \sum_{i=1}^{Z} sim({\omega}_i, {\hat{\omega}})$$
-
-To enhance the metrics' robustness against the specific selection of intruder words, ISH, INT, and ISIM are computed multiple times with different randomly chosen intruder words, and the results are averaged.
-
-These metrics provide insights into the performance of topic models and their ability to maintain topic coherence and diversity. A smaller value is better.
-
-### Intruder Shift (ISH)
-
-The **Intruder Shift (ISH)** metric quantifies the shift in a topic's centroid when an intruder word is substituted. This process involves the following steps:
-
-1. Compute the unweighted centroid of a topic and denote it as $\tilde{\boldsymbol{\gamma}}_i$.
-2. Randomly select a word from that topic and replace it with a randomly selected word from a different topic.
-3. Recalculate the centroid of the resulting words and denote it as $\hat{\boldsymbol{\gamma}}_i$.
-4. Calculate the ISH score for a topic by averaging the cosine similarity between $\tilde{{\gamma}}_i$ and $\hat{\boldsymbol{\gamma}}_i$ for all topics using the formula:
-5.
-$$ISH(T) = \frac{1}{K} \sum_{i=1}^{K} sim(\tilde{{\gamma}}_i, \hat{{\gamma}}_i)$$
-A lower ISH score indicates a more coherent and diverse topic model.
+stream-topic implements various evaluation metrics, mostly focused around the intruder word task. The implemented metrics achieve high correlations with human evaluation. See [here](https://direct.mit.edu/coli/article/50/2/619/118990/Topics-in-the-Haystack-Enhancing-Topic-Quality) for the detailed description of the metrics.
To evaluate your model simply use one of the metrics.
```python
@@ -413,7 +374,7 @@ topic_diversity_score = metric.score(model_output)
Similarly to use one of STREAMS metrics for any model, use the topics and occasionally the $\beta$ (topic-word-matrix) of the model to calculate the score.
-## Hyperparameter optimization
+## 🔍 Hyperparameter optimization
If you want to optimize the hyperparameters, simply run:
```python
model.optimize_and_fit(
@@ -424,7 +385,7 @@ model.optimize_and_fit(
n_trials=20,
)
```
-## Visualization
+## 🖼️ Visualization
You can also specify to optimize with respect to any evaluation metric from stream_topic.
Visualize the results:
@@ -438,10 +399,10 @@ visualize_topic_model(
```
-
+
-## Downstream Tasks
+## 📈 Downstream Tasks
The general formulation of a Neural Additive Model (NAM) can be summarized by the equation:
@@ -463,7 +424,6 @@ In this setup, visualizing the shape function `k` reveals the impact of a topic
Fitting a downstream model with a pre-trained topic model is straightforward using the PyTorch Trainer class. Subsequently, visualizing all shape functions can be done similarly to the approach described by Agarwal et al. (2021).
-### How to use
```python
from lightning import Trainer
@@ -490,11 +450,19 @@ plot_downstream_model(downstream_model)
-## Contributing and Testing New Models
+## 🤝 Contributing and Testing New Models
+
+We welcome contributions! Before you start, please:
+
+1. **Check Existing Issues**: Look for existing issues or discussions that may cover your idea.
+2. **Fork and Clone**: Fork the repository and clone it to your local machine.
+3. **Create a Branch**: Work on a new branch to keep your changes organized.
+4. **Develop and Test**: Develop your model and validate it using our provided testing script.
+5. **Submit a Pull Request**: Once ready, submit a PR with a clear description of your changes.
-We welcome contributions to enhance the functionality of our topic modeling package. To ensure your new models integrate seamlessly, please follow the guidelines and testing instructions provided below.
+For detailed guidelines on how to structure your contributions, see below.ng instructions provided below.
-### Steps for Contributing
+ Steps for Contributing
1. **Fork the Repository**:
- Fork the repository to your GitHub account.
@@ -516,7 +484,7 @@ We welcome contributions to enhance the functionality of our topic modeling pack
- Create your model class file, ensuring it follows the expected structure and naming conventions.
- Implement the required methods (`get_info`, `fit`, `predict`) and attributes (`topic_dict`). Optionally, implement `beta`, `theta`, or corresponding methods (`get_beta`, `get_theta`).
-#### Example Model Structure
+ Example Model Structure
Here is an example of how your model class should be structured:
@@ -552,7 +520,7 @@ class ExampleModel(BaseModel):
return self.theta
```
-#### Testing Your Model
+ Testing Your Model
1. **Install Dependencies**:
- Ensure all dependencies are installed.
@@ -569,7 +537,7 @@ class ExampleModel(BaseModel):
```
If this validation fails, it will tell you
-#### Validation Criteria
+ Validation Criteria
The following checks are performed during validation:
- Presence of required methods (`get_info`, `fit`, `predict`).
@@ -581,7 +549,7 @@ The following checks are performed during validation:
Refer to the `tests/model_validation.py` script for detailed validation logic.
-#### Submitting Your Contribution
+ Submitting Your Contribution
1. **Commit Your Changes**:
- Commit your changes to your branch.
@@ -607,11 +575,11 @@ If you want to include a new model where these guidelines are not approriate ple
-# Citation
+# 📜 Citation
If you use this project in your research, please consider citing:
-### STREAM
+ STREAM
```bibtex
@inproceedings{thielmann-etal-2024-stream,
@@ -624,7 +592,7 @@ If you use this project in your research, please consider citing:
}
```
-### Metrics and CEDC
+ Metrics and CEDC
```bibtex
@article{thielmann2024topics,
@@ -637,7 +605,7 @@ If you use this project in your research, please consider citing:
}
```
-### TNTM
+ TNTM
```bibtex
@article{reuter2024probabilistic,
@@ -649,7 +617,7 @@ If you use this project in your research, please consider citing:
```
-### DCTE
+ DCTE
```bibtex
@inproceedings{thielmann2024human,
@@ -661,7 +629,7 @@ If you use this project in your research, please consider citing:
}
```
-### CBC
+ CBC
```bibtex
@inproceedings{thielmann2023coherence,
@@ -686,4 +654,8 @@ If you use one of the Reddit or GME datasets, consider citing:
year={2024},
publisher={Springer}
}
-```
\ No newline at end of file
+```
+
+# 📝 License
+
+STREAM is released under the [MIT License](./LICENSE). © 2024
\ No newline at end of file
diff --git a/docs/images/gif1.gif b/docs/images/gif1.gif
new file mode 100644
index 0000000000..4de8c9d4a0
Binary files /dev/null and b/docs/images/gif1.gif differ
diff --git a/docs/images/gif2.gif b/docs/images/gif2.gif
new file mode 100644
index 0000000000..618f5be6d7
Binary files /dev/null and b/docs/images/gif2.gif differ
diff --git a/stream_topic/preprocessor/_preprocessor.py b/stream_topic/preprocessor/_preprocessor.py
index 09444d14c4..951b326e36 100644
--- a/stream_topic/preprocessor/_preprocessor.py
+++ b/stream_topic/preprocessor/_preprocessor.py
@@ -57,7 +57,7 @@ class TextPreprocessor:
remove_words_with_numbers : bool, optional
Whether to remove words containing numbers from the text data (default is False).
remove_words_with_special_chars : bool, optional
- Whether to remove words containing special characters from the text data (default is False).
+ Whether to remove words containing special characters from the text data (default is False).
"""
@@ -73,15 +73,18 @@ def __init__(self, **kwargs):
self.remove_html_tags = kwargs.get("remove_html_tags", True)
self.remove_special_chars = kwargs.get("remove_special_chars", True)
self.remove_accents = kwargs.get("remove_accents", True)
- self.custom_stopwords = set(kwargs.get("custom_stopwords", []))
+ self.custom_stopwords = (
+ set(kwargs.get("custom_stopwords", []))
+ if kwargs.get("custom_stopwords")
+ else set()
+ )
self.detokenize = kwargs.get("detokenize", False)
self.min_word_freq = kwargs.get("min_word_freq", 2)
self.max_word_freq = kwargs.get("max_word_freq", None)
self.min_word_length = kwargs.get("min_word_length", 3)
self.max_word_length = kwargs.get("max_word_length", None)
self.dictionary = set(kwargs.get("dictionary", []))
- self.remove_words_with_numbers = kwargs.get(
- "remove_words_with_numbers", False)
+ self.remove_words_with_numbers = kwargs.get("remove_words_with_numbers", False)
self.remove_words_with_special_chars = kwargs.get(
"remove_words_with_special_chars", False
)
@@ -186,23 +189,19 @@ def _clean_text(self, text):
]
if self.min_word_length is not None:
- words = [word for word in words if len(
- word) >= self.min_word_length]
+ words = [word for word in words if len(word) >= self.min_word_length]
if self.max_word_length is not None:
- words = [word for word in words if len(
- word) <= self.max_word_length]
+ words = [word for word in words if len(word) <= self.max_word_length]
if self.dictionary != set():
words = [word for word in words if word in self.dictionary]
if self.remove_words_with_numbers:
- words = [word for word in words if not any(
- char.isdigit() for char in word)]
+ words = [word for word in words if not any(char.isdigit() for char in word)]
if self.remove_words_with_special_chars:
- words = [word for word in words if not re.search(
- r"[^a-zA-Z0-9\s]", word)]
+ words = [word for word in words if not re.search(r"[^a-zA-Z0-9\s]", word)]
if self.detokenize:
text = TreebankWordDetokenizer().detokenize(words)
diff --git a/stream_topic/preprocessor/_tf_idf.py b/stream_topic/preprocessor/_tf_idf.py
index 25de4f1170..78df170470 100644
--- a/stream_topic/preprocessor/_tf_idf.py
+++ b/stream_topic/preprocessor/_tf_idf.py
@@ -21,7 +21,7 @@ def c_tf_idf(documents, m, ngram_range=(1, 1)):
w = t.sum(axis=1)
# Suppress divide by zero warning
- with np.errstate(divide='ignore', invalid='ignore'):
+ with np.errstate(divide="ignore", invalid="ignore"):
tf = np.divide(t.T, w)
if np.any(np.isnan(tf)) or np.any(np.isinf(tf)):
logger.warning("NaNs or inf in tf matrix")
@@ -34,7 +34,7 @@ def c_tf_idf(documents, m, ngram_range=(1, 1)):
return tf_idf, count
-def extract_tfidf_topics(tf_idf, count, docs_per_topic, n=10):
+def extract_tfidf_topics(tf_idf, count, docs_per_topic, n=100):
"""class based tf_idf retrieval from cluster of documents
Args:
@@ -51,8 +51,7 @@ def extract_tfidf_topics(tf_idf, count, docs_per_topic, n=10):
tf_idf_transposed = tf_idf.T
indices = tf_idf_transposed.argsort()[:, -n:]
top_n_words = {
- label: [((words[j]), (tf_idf_transposed[i][j]))
- for j in indices[i]][::-1]
+ label: [((words[j]), (tf_idf_transposed[i][j])) for j in indices[i]][::-1]
for i, label in enumerate(labels)
}
diff --git a/stream_topic/visuals/visuals.py b/stream_topic/visuals/visuals.py
index 8fc868190d..58842b0a76 100644
--- a/stream_topic/visuals/visuals.py
+++ b/stream_topic/visuals/visuals.py
@@ -3,9 +3,12 @@
from ..models.abstract_helper_models.base import BaseModel, TrainingStatus
from ..utils import TMDataset
-from ._interactive import (_visualize_topic_model_2d,
- _visualize_topic_model_3d, _visualize_topics_2d,
- _visualize_topics_3d)
+from ._interactive import (
+ _visualize_topic_model_2d,
+ _visualize_topic_model_3d,
+ _visualize_topics_2d,
+ _visualize_topics_3d,
+)
from ._octis_visuals import OctisWrapperVisualModel
@@ -43,7 +46,7 @@ def visualize_topics_as_wordclouds(
hasattr(model, "topic_dict") and model._status == TrainingStatus.SUCCEEDED
), "Model must have been trained with topics extracted."
- topics = model.get_topics()
+ topics = model.topic_dict
for topic_id, topic_words in topics.items():
# Generate a word frequency dictionary for the topic
@@ -153,24 +156,24 @@ def visualize_topics(
use_average: bool = True,
):
"""
- Visualize topics in either 2D or 3D space using UMAP, t-SNE, or PCA dimensionality reduction techniques.
-
- Args:
- model (AbstractModel): The trained topic model instance.
- model_output (dict, optional): The output of the topic model, typically including topic-word distributions and document-topic distributions. Required if the model does not have an 'output' attribute.
- dataset (TMDataset, optional): The dataset used for training the topic model. Required if the model does not have an 'output' attribute.
- three_dim (bool, optional): Flag to visualize in 3D if True, otherwise in 2D. Defaults to False.
- reduce_first (bool, optional): Indicates whether to perform dimensionality reduction on embeddings before computing topic centroids. Defaults to False.
- reducer (str, optional): Choice of dimensionality reduction technique. Supported values are 'umap', 'tsne', and 'pca'. Defaults to 'umap'.
- port (int, optional): The port number on which the visualization dashboard will run. Defaults to 8050.
- embedding_model_name (str, optional): Name of the embedding model used for generating document embeddings. Defaults to "all-MiniLM-L6-v2".
- embeddings_folder_path (str, optional): Path to the folder containing precomputed embeddings. If not provided, embeddings will be computed on the fly.
- embeddings_file_path (str, optional): Path to the file containing precomputed embeddings. If not provided, embeddings will be computed on the fly.
-
-
- Returns:
- None
- The function launches a Dash server to visualize the topic model.
+ Visualize topics in either 2D or 3D space using UMAP, t-SNE, or PCA dimensionality reduction techniques.
+
+ Args:
+ model (AbstractModel): The trained topic model instance.
+ model_output (dict, optional): The output of the topic model, typically including topic-word distributions and document-topic distributions. Required if the model does not have an 'output' attribute.
+ dataset (TMDataset, optional): The dataset used for training the topic model. Required if the model does not have an 'output' attribute.
+ three_dim (bool, optional): Flag to visualize in 3D if True, otherwise in 2D. Defaults to False.
+ reduce_first (bool, optional): Indicates whether to perform dimensionality reduction on embeddings before computing topic centroids. Defaults to False.
+ reducer (str, optional): Choice of dimensionality reduction technique. Supported values are 'umap', 'tsne', and 'pca'. Defaults to 'umap'.
+ port (int, optional): The port number on which the visualization dashboard will run. Defaults to 8050.
+ embedding_model_name (str, optional): Name of the embedding model used for generating document embeddings. Defaults to "all-MiniLM-L6-v2".
+ embeddings_folder_path (str, optional): Path to the folder containing precomputed embeddings. If not provided, embeddings will be computed on the fly.
+ embeddings_file_path (str, optional): Path to the file containing precomputed embeddings. If not provided, embeddings will be computed on the fly.
+
+
+ Returns:
+ None
+ The function launches a Dash server to visualize the topic model.
"""
if not isinstance(model, BaseModel):