Skip to content

Predict tags for posts from StackOverflow with multilabel classification approach.

Notifications You must be signed in to change notification settings

partoftheorigin/multilabel-classification-stack-overflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Multilabel classification on Stack Overflow tags

Predict tags for posts from StackOverflow with multilabel classification approach.

Dataset

  • Dataset of post titles from StackOverflow

Transforming text to a vector

  • Transformed text data to numeric vectors using bag-of-words and TF-IDF.

MultiLabel classifier

MultiLabelBinarizer to transform labels in a binary form and the prediction will be a mask of 0s and 1s.

Logistic Regression for Multilabel classification

  • Coefficient = 10
  • L2-regularization technique

Evaluation

Results evaluated using several classification metrics:

Libraries

  • Numpy — a package for scientific computing.
  • Pandas — a library providing high-performance, easy-to-use data structures and data analysis tools for the Python
  • scikit-learn — a tool for data mining and data analysis.
  • NLTK — a platform to work with natural language.

About

Predict tags for posts from StackOverflow with multilabel classification approach.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published