Skip to content

kirralabs/data-process

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learning how to process data

This repo maybe not a good / right solution for the case that explain here. However, this repo is used to know step by step to process data. Thanks

Step

  1. Basic Terms and Definitions
    1. Data Structures, Types and Values
  2. praprocessing
    1. preparing the preparation
    2. Exploratory Data Analysis
    3. Scalling data
    4. Dealing with Missing Values
    5. Dealing with Outliers
    6. Dealing with Imbalanced Data
    7. Data Transformations
    8. Finishing Touches & Moving Ahead
    9. Choosing feature
  3. process
  4. Validation
  5. summary and conclutions

Metode

  1. Basic Term
    1. Data Structures
    2. Data Types
      1. nominal (qualitative)
      2. ordinal (quantitative)
      3. interval (quantitative)
      4. ratio/scale (quantitative)
    3. Values
      1. decimal
      2. integer
      3. boolean
      4. date/time/time stamp
      5. string
  2. Praprocessing
    1. Exploration
    2. Scalling
      1. Rescale data.
      2. Standardize data.
      3. Normalize data.
      4. Binarize data.
    3. Missing Values
      1. delete row (with nan val)
      2. average nan from global value
      3. predict
    4. Outliers
    5. Imbalance Data
      1. upsampling
      2. downsampling
    6. Data Transformation
    7. Feature Selection
  3. Process
    1. Logistic Regression
    2. Decision Tree
    3. Random Forest
    4. Neural Networks
  4. Validation
    1. ROC
    2. AUC
    3. Confusion matrix
  5. Summary & Conclution

How to run project

  1. Setup environment
virtualenv -p python3 venv
source venv/bin/activate
pip install -r req.txt
  1. run project
jupyter notebook
  1. open Fraud_process.ipynb

Image

  1. Data Data

  2. Exploration Exploration 1 Exploration 2

  3. Outliers Outliers 1 Outliers 1

  4. Imbalance Imbalance

  5. Feature Selection Feature selection Feature selection Feature selection Feature selection Feature selection Feature selection Feature selection Feature selection Feature selection

  6. Validation validation 1 validation 2 validation 3

Refference

  1. Basic Terms and Definitions
    1. https://towardsdatascience.com/data-preprocessing-for-non-techies-basic-terms-and-definitions-ea517038a4e5
  2. praprocessing : preparing the preparation
    1. http://scikit-learn.org/stable/modules/preprocessing.html
    2. https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf
    3. https://www.kaggle.com/ekami66/detailed-exploratory-data-analysis-with-python
    4. https://www.kdnuggets.com/2017/06/7-steps-mastering-data-preparation-python.html/2
    5. https://github.com/pandas-profiling/pandas-profiling
    6. https://machinelearningmastery.com/prepare-data-machine-learning-python-scikit-learn/
    7. https://www.geeksforgeeks.org/data-preprocessing-machine-learning-python/
    8. https://www.analyticsvidhya.com/blog/2016/07/practical-guide-data-preprocessing-python-scikit-learn/
    9. https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-1-exploratory-data-analysis-with-pandas-de57880f1a68
    10. https://github.com/ajaymache/data-analysis-using-python
  3. praprocessing : scalling data
    1. https://machinelearningmastery.com/prepare-data-machine-learning-python-scikit-learn/
    2. https://machinelearningmastery.com/rescaling-data-for-machine-learning-in-python-with-scikit-learn/
    3. http://scikit-learn.org/stable/auto_examples/preprocessing/plot_scaling_importance.html
    4. http://scikit-learn.org/stable/modules/preprocessing.html
  4. praprocessing : Dealing with Missing Values
    1. https://clevertap.com/blog/how-to-treat-missing-values-in-your-data-part-i/
    2. https://clevertap.com/blog/how-to-treat-missing-values-in-your-data-part-ii/
    3. http://pandas.pydata.org/pandas-docs/stable/missing_data.html
    4. http://scikit-learn.org/stable/modules/preprocessing.html#imputation-of-missing-values
    5. KNN https://towardsdatascience.com/the-use-of-knn-for-missing-values-cf33d935c637
    6. KNN sklearn scikit-learn/scikit-learn#9212
    7. KNN keagle https://www.kaggle.com/dan195/knn-imputation-gbregression-an-plsregression
  5. praprocessing : Dealing with Outliers
    1. https://www.kdnuggets.com/2017/02/removing-outliers-standard-deviation-python.html
    2. https://medium.com/@dhwajraj/learning-python-regression-analysis-part-7-handling-outliers-in-data-d36ee9e2130b
    3. https://www.kaggle.com/general/24617
  6. praprocessing : Dealing with Imbalanced Data
  7. praprocessing : Feature selection
    1. https://www.kaggle.com/kanncaa1/feature-selection-and-data-visualization
    2. https://machinelearningmastery.com/feature-selection-in-python-with-scikit-learn/
    3. https://machinelearningmastery.com/feature-selection-machine-learning-python/
    4. https://towardsdatascience.com/a-feature-selection-tool-for-machine-learning-in-python-b64dd23710f0