You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working with BERTopic and I am trying to evaluate my topic models trained on Marathi language (Indic language) using some metrics.I found this code written by MaartenGR (Author of BERTopic) but unfortunately I was not able to install the dependencies of the setup he has mentioned here (https://github.com/MaartenGr/BERTopic_evaluation/tree/main). The author recommended using OCTIS as it provides more metrics. I tried calculating the topic diversity and npmi score. The topic diversity is calculated,but I keep getting issues while calculating npmi score.
Here is my code
from octis.evaluation_metrics.coherence_metrics import Coherence
from octis.evaluation_metrics.diversity_metrics import TopicDiversity
#This is how the sentence arrays looks
sentence array = ['तीन दिवस झाले, पण गाडी अजून सापडली नाही. पोलिसांचा कडक तपास सुरु आहे.' , 'डाळी भारतीय थाळीमध्ये सामील असलेले मुख्य भोजन आहेत.']
#This is how the topics are
topics_list = [
['ठाकरे', 'एक', 'भारतीय', 'दिवस', 'शिंदे', 'सांगितले', 'दोन', 'माहिती', 'देण्यात', 'जात'],
['भारतीय', 'शिंदे', 'ठाकरे', 'मुख्यमंत्री', 'उद्धव', 'एक', 'पोलीस', 'धावा', 'दोन', 'सरकार'],
['देण्यात', 'फोन', 'डेटा', 'कॅमेरा', 'स्मार्टफोन', 'सादर', 'डिस्प्ले', 'सेन्सर', 'सपोर्ट', 'बॅटरी']
]
octis_texts = [sentence_array]
npmi = Coherence(texts = octis_texts, topk = 10, measure = 'c_npmi')
octis_output = {"topics": list1}
topic_diversity = TopicDiversity(topk=10)
topic_diversity_score = topic_diversity.score(octis_output)
print("Topic diversity: "+str(topic_diversity_score))
npmi_score = npmi.score(octis_output)
print("Coherence: "+str(npmi_score))
Error
This is the error I get.
Topic diversity: 0.8857142857142857
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
[<ipython-input-68-c000efdb667a>](https://localhost:8080/#) in <cell line: 5>()
3 print("Topic diversity: "+str(topic_diversity_score))
4
----> 5 npmi_score = npmi.score(octis_output)
6 print("Coherence: "+str(npmi_score))
3 frames
[/usr/local/lib/python3.10/dist-packages/gensim/models/coherencemodel.py](https://localhost:8080/#) in _ensure_elements_are_ids(self, topic)
452 return np.array(ids_from_ids)
453 else:
--> 454 raise ValueError('unable to interpret topic as either a list of tokens or a list of ids')
455
456 def _update_accumulator(self, new_topics):
ValueError: unable to interpret topic as either a list of tokens or a list of ids
Can anyone point out what exactly is wrong here and how can i evaluate BERTopic models trained on indic languages.
Thanks.
The text was updated successfully, but these errors were encountered:
Description
I am working with BERTopic and I am trying to evaluate my topic models trained on Marathi language (Indic language) using some metrics.I found this code written by MaartenGR (Author of BERTopic) but unfortunately I was not able to install the dependencies of the setup he has mentioned here (https://github.com/MaartenGr/BERTopic_evaluation/tree/main). The author recommended using OCTIS as it provides more metrics. I tried calculating the topic diversity and npmi score. The topic diversity is calculated,but I keep getting issues while calculating npmi score.
Here is my code
Error
This is the error I get.
Can anyone point out what exactly is wrong here and how can i evaluate BERTopic models trained on indic languages.
Thanks.
The text was updated successfully, but these errors were encountered: