This repo maybe not a good / right solution for the case that explain here. However, this repo is used to know step by step to process data. Thanks
- Basic Terms and Definitions
- Data Structures, Types and Values
- praprocessing
- preparing the preparation
- Exploratory Data Analysis
- Scalling data
- Dealing with Missing Values
- Dealing with Outliers
- Dealing with Imbalanced Data
- Data Transformations
- Finishing Touches & Moving Ahead
- Choosing feature
- process
- Validation
- summary and conclutions
- Basic Term
- Data Structures
- Data Types
- nominal (qualitative)
- ordinal (quantitative)
- interval (quantitative)
- ratio/scale (quantitative)
- Values
- decimal
- integer
- boolean
- date/time/time stamp
- string
- Praprocessing
- Exploration
- Scalling
- Rescale data.
- Standardize data.
- Normalize data.
- Binarize data.
- Missing Values
- delete row (with nan val)
- average nan from global value
- predict
- Outliers
- Imbalance Data
- upsampling
- downsampling
- Data Transformation
- Feature Selection
- Process
- Logistic Regression
- Decision Tree
- Random Forest
- Neural Networks
- Validation
- ROC
- AUC
- Confusion matrix
- Summary & Conclution
- Setup environment
virtualenv -p python3 venv
source venv/bin/activate
pip install -r req.txt
- run project
jupyter notebook
- open Fraud_process.ipynb
- Basic Terms and Definitions
- praprocessing : preparing the preparation
- http://scikit-learn.org/stable/modules/preprocessing.html
- https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf
- https://www.kaggle.com/ekami66/detailed-exploratory-data-analysis-with-python
- https://www.kdnuggets.com/2017/06/7-steps-mastering-data-preparation-python.html/2
- https://github.com/pandas-profiling/pandas-profiling
- https://machinelearningmastery.com/prepare-data-machine-learning-python-scikit-learn/
- https://www.geeksforgeeks.org/data-preprocessing-machine-learning-python/
- https://www.analyticsvidhya.com/blog/2016/07/practical-guide-data-preprocessing-python-scikit-learn/
- https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-1-exploratory-data-analysis-with-pandas-de57880f1a68
- https://github.com/ajaymache/data-analysis-using-python
- praprocessing : scalling data
- https://machinelearningmastery.com/prepare-data-machine-learning-python-scikit-learn/
- https://machinelearningmastery.com/rescaling-data-for-machine-learning-in-python-with-scikit-learn/
- http://scikit-learn.org/stable/auto_examples/preprocessing/plot_scaling_importance.html
- http://scikit-learn.org/stable/modules/preprocessing.html
- praprocessing : Dealing with Missing Values
- https://clevertap.com/blog/how-to-treat-missing-values-in-your-data-part-i/
- https://clevertap.com/blog/how-to-treat-missing-values-in-your-data-part-ii/
- http://pandas.pydata.org/pandas-docs/stable/missing_data.html
- http://scikit-learn.org/stable/modules/preprocessing.html#imputation-of-missing-values
- KNN https://towardsdatascience.com/the-use-of-knn-for-missing-values-cf33d935c637
- KNN sklearn scikit-learn/scikit-learn#9212
- KNN keagle https://www.kaggle.com/dan195/knn-imputation-gbregression-an-plsregression
- praprocessing : Dealing with Outliers
- praprocessing : Dealing with Imbalanced Data
- praprocessing : Feature selection
- https://www.kaggle.com/kanncaa1/feature-selection-and-data-visualization
- https://machinelearningmastery.com/feature-selection-in-python-with-scikit-learn/
- https://machinelearningmastery.com/feature-selection-machine-learning-python/
- https://towardsdatascience.com/a-feature-selection-tool-for-machine-learning-in-python-b64dd23710f0