Transcriptions EDA, Data Cleaning and their Topics Classification

The first task is to fetch the complete paragraph using only the first and last few words of it. The second task is classifying the topics of the now-fetched paragraphs using Machine Learning models (a multilabel classification problem).

The models used are (initially) Random Forest and BERT.

The main notebook and the finalized dataframe are main.ipynb and to_fill_finalized_BERT.csv respectively.

The topic_classification_BERT.ipynb notebook contains the full training code and predictions of the BERT model.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
README.md		README.md
main.ipynb		main.ipynb
to_fill_finalized.csv		to_fill_finalized.csv
to_fill_finalized_BERT.csv		to_fill_finalized_BERT.csv
topic_classification_BERT.ipynb		topic_classification_BERT.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transcriptions EDA, Data Cleaning and their Topics Classification

About

Releases

Packages

Languages

arawxx/transcriptions-EDA-and-classification

Folders and files

Latest commit

History

Repository files navigation

Transcriptions EDA, Data Cleaning and their Topics Classification

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages