Skip to content

Classifying the topics of a NEWS transcriptions using BERT and TF-iDF.

Notifications You must be signed in to change notification settings

arawxx/transcriptions-EDA-and-classification

Repository files navigation

Transcriptions EDA, Data Cleaning and their Topics Classification

The first task is to fetch the complete paragraph using only the first and last few words of it. The second task is classifying the topics of the now-fetched paragraphs using Machine Learning models (a multilabel classification problem).

The models used are (initially) Random Forest and BERT.

The main notebook and the finalized dataframe are main.ipynb and to_fill_finalized_BERT.csv respectively.

The topic_classification_BERT.ipynb notebook contains the full training code and predictions of the BERT model.

About

Classifying the topics of a NEWS transcriptions using BERT and TF-iDF.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published