Skip to content

In this repository, I tried to investigate the utility of synthetic data generated by DataSynthesizer and Synthetic Data Vault in machine learning tasks. I applied the Random Forest, Logistic Regression, Support Vector Machine, K-Nearest Neighbor, and Naive Bayes algorithms to the synthetic data and made a comparison.

Notifications You must be signed in to change notification settings

AliValiyev/Utility-of-Synthetic-Data-in-Machine-learning-tasks.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Utility-of-Synthetic-Data-in-Machine-learning-tasks.

In this repository, I tried to investigate the utility of synthetic data generated by DataSynthesizer and Synthetic Data Vault in machine learning tasks. I applied the Random Forest, Logistic Regression, Support Vector Machine, K-Nearest Neighbor, and Naive Bayes algorithms to the synthetic data and made a comparison.

I used Adult (Census Income), Banknote Authentication, Iris, Social Network Ads and Titanic datasets. My main motivation was "On the Utility of Synthetic Data: An Empirical Evaluation on Machine Learning Tasks" paper by M.Hittmeir, A.Ekelhart and R.Mayer.

Links to datasets:

  1. https://archive.ics.uci.edu/ml/datasets/Adult
  2. https://archive.ics.uci.edu/ml/datasets/census+income
  3. http://archive.ics.uci.edu/ml/datasets/banknote+authentication
  4. https://archive.ics.uci.edu/ml/datasets/iris
  5. https://www.kaggle.com/rakeshrau/social-network-ads
  6. https://www.kaggle.com/c/titanic/data

Reference: Markus Hittmeir, Andreas Ekelhart and Rudolf Mayer. 2019. On the Utility of Synthetic Data: An Empirical Evaluation on Machine Learning Tasks

About

In this repository, I tried to investigate the utility of synthetic data generated by DataSynthesizer and Synthetic Data Vault in machine learning tasks. I applied the Random Forest, Logistic Regression, Support Vector Machine, K-Nearest Neighbor, and Naive Bayes algorithms to the synthetic data and made a comparison.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages