Social Media Post Classification within the MediaEval 2015 dataset

TASKS

Data Analysis
Algorithm Design
Evalutation

MODULE

Machine Learning Technologies (COMP32222)

DESCRIPTION

This individual project was set with an aim to explore ways of automatically classifying Twitter news related content as real or fake. Within this coursework, I designed a machine learning algorithm for classifying Twitter posts from MediaEval 2015 "verifying multimedia use" challenge dataset.

The project was based on Python and Jupyter Notebook, along with the use of scikit-learn library, numpy, pandas and deep translator. The dataset contained 14,277 training data entries and 3755 testing data entries, and each entry had the following set of features:

tweetId / tweetText / userId / imageId / username / timestamp / label

DATA ANALYSIS

Here are some of the graphs produced throughout the data analysis.

ALGORITHM DESIGN

The algorithm design part started with the preprocessing steps taken. This task consisted of data cleaning by removing punctuation from tweets, text lowercasing, stop word removal, emoji removal as well as translation. Once the preprocessing was completed, the tweetText features were vectorized and transformed into a term frequency inverse document frequency matrices.

Considering all the constraints and the characteristics of the data, 3 starting classifiers were chosen: MultinomialNB / LinearSVC / SGDClassifier.

EVALUATION AND RESULT

The classifiers were evaluated and the strongest learner (MultinomialNB in this project) was chosen to further perform hyper parameter tuning through GridSearch. Additionally, other features like the imageId and username were used in an iterative process.

The best performance was achieved by using the TweetText and username feature, which resulted in an accuracy score of 89.26%.

Once the project was submitted, I received detailed feedback for them module leader highlighting the strengths of this work, mainly being the data analysis and code quality, as well as areas of improvement like additional feature selection. This project was awarded a 1st class mark of 70%.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Datasets		Datasets
images		images
.DS_Store		.DS_Store
Main.ipynb		Main.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Social Media Post Classification within the MediaEval 2015 dataset

TASKS

MODULE

DESCRIPTION

DATA ANALYSIS

ALGORITHM DESIGN

EVALUATION AND RESULT

About

Releases

Packages

Languages

edelmans/Twitter-news-post-ML-classification

Folders and files

Latest commit

History

Repository files navigation

Social Media Post Classification within the MediaEval 2015 dataset

TASKS

MODULE

DESCRIPTION

DATA ANALYSIS

ALGORITHM DESIGN

EVALUATION AND RESULT

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages