Skip to content

This repository contains list of available fake news datasets for data mining.

License

Notifications You must be signed in to change notification settings

pmacinec/fake-news-datasets

Repository files navigation

Fake News Datasets

License: MIT

Introduction

This project was created to show basic analysis of public datasets of fake news. Main idea is to make each analysis replicable, so everyone can add his own analysis and use it for his experiments and data mining. Every dataset has its own python jupyter notebook with simple analysis, which can help to choose appropriate dataset.

Prerequisites

Installation and running

To run all jupyter notebooks with appropriate libraries installed, we refer to use Docker.

With installed Docker, run the following command to build docker image and start container:

./scripts/run.sh -b

Note: Next time, when no build is needed (because image has been already built), you can just run container by skipping -b argument.

Datasets

List of all processed datasets with simple comparison is stored in datasets/README.md file.

All datasets analyses are stored in datasets/ folder. Each dataset has its own folder with simple description in README file and jupyter notebook (also can include different files, e.g. data itself).

Dataset files (e.g. .csv or .tsv files) are stored using Git LFS (see Git LFS for more information).

Adding new dataset

When adding new dataset, please follow these steps:

  1. Call ./scripts/create_structure.sh {name} script with name argument supplied in snake_case format (e.g. fake_news_detection_kaggle). This script will create needed folders and files in datasets/{name} folder.
  2. Add data into datasets/{name}/data directory.
  3. Update datasets/{name}/README.md file to provide link, potential tasks, description and attributes descriptions. Please, follow template file structure.
  4. Update datasets/{name}/{name}.ipynb file with analysis of the dataset. Please, follow template file structure.
  5. Add dataset and details into table of datasets in datasets/README.md file (please, follow the alphabetical order).

TODO

Finish prepared datasets:

  • coaid
  • that_is_a_known_lie
  • fake_health
  • fake_covid

About

This repository contains list of available fake news datasets for data mining.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages