Data Mining Complete Bootcamp

Dataset Installation

Use pip to install the dmba package from pypi (https://pypi.org/project/dmba/).

pip install dmba

Should this not work, for example when you are behind a firewall, download the package from pypi and install from file, e.g.

pip install dmba-0.0.14.tar.gz

The Steps in Data Mining

Develop an understanding of the purpose of the data mining project
Obtain the dataset to be used in the analysis
Explore, clean, and preprocess the data
Reduce the data dimension, if necessary
Determine the data mining task (classification, prediction, clustering, etc.)
Partition the data (for supervised tasks)
Choose the data mining techniques to be used (regression, neural nets, hierarchical clustering, and so on).
Use algorithms to perform the task : This is typically an iterative process—trying multiple variants, and often using multiple variants of the same algorithm (choosing different variables or settings within the algorithm). Where appropriate, feedback from the algorithm’s performance on validation data is used to refine the settings.
Interpret the results of the algorithms : This involves making a choice as to the best algorithm to deploy, and where possible, testing the final choice on the test data to get an idea as to how well it will perform. (Recall that each algorithm may also be tested on the validation data for tuning purposes; in this way, the validation data become a part of the fitting process and are likely to underestimate the error in the deployment of the model that is finally chosen.)
Deploy the model : This step involves integrating the model into operational systems and running it on real records to produce decisions or actions. For example, the model might be applied to a purchased list of possible customers, and the action might be “include in the mailing if the predicted amount of purchase is > $10.” A key step here is “scoring” the new records, or using the chosen model to predict the outcome value (“score”) for each new record.

The foregoing steps encompass the steps in SEMMA, a methodology developed by the software company SAS:

Sample : Take a sample from the dataset; partition into training, validation, and test datasets.

Explore: Examine the dataset statistically and graphically.

Modify: Transform the variables and impute missing values.

Model: Fit predictive models (e.g., regression tree, neural network).

Assess: Compare models using a validation dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Mining Complete Bootcamp

Dataset Installation

The Steps in Data Mining

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Chapter 01 - Introduction.ipynb		Chapter 01 - Introduction.ipynb
Chapter 02 - Overview.ipynb		Chapter 02 - Overview.ipynb
Chapter 03 - Data visualization.ipynb		Chapter 03 - Data visualization.ipynb
Chapter 04 - Dimension reduction.ipynb		Chapter 04 - Dimension reduction.ipynb
Chapter 05 - Evaluating performance.ipynb		Chapter 05 - Evaluating performance.ipynb
Chapter 06 - Multiple linear regression.ipynb		Chapter 06 - Multiple linear regression.ipynb
Chapter 07 - kNN.ipynb		Chapter 07 - kNN.ipynb
Chapter 08 - Naive Bayes classifier.ipynb		Chapter 08 - Naive Bayes classifier.ipynb
Chapter 09 - Classification and regression trees.ipynb		Chapter 09 - Classification and regression trees.ipynb
Chapter 10 - Logistic regression.ipynb		Chapter 10 - Logistic regression.ipynb
Chapter 11 - Neural nets (add-on).ipynb		Chapter 11 - Neural nets (add-on).ipynb
Chapter 11 - Neural nets.ipynb		Chapter 11 - Neural nets.ipynb
Chapter 12 - Discriminant analysis.ipynb		Chapter 12 - Discriminant analysis.ipynb
Chapter 13 - Ensembles and uplift.ipynb		Chapter 13 - Ensembles and uplift.ipynb
Chapter 14 - Association rules.ipynb		Chapter 14 - Association rules.ipynb
Chapter 15 - Cluster analysis.ipynb		Chapter 15 - Cluster analysis.ipynb
Chapter 16 - Handling time series .ipynb		Chapter 16 - Handling time series .ipynb
Chapter 17 - Regression-based forecasting.ipynb		Chapter 17 - Regression-based forecasting.ipynb
Chapter 18 - TS smoothing.ipynb		Chapter 18 - TS smoothing.ipynb
Chapter 19 - Social Network Analysis.ipynb		Chapter 19 - Social Network Analysis.ipynb
Chapter 20 - Text mining.ipynb		Chapter 20 - Text mining.ipynb
LICENSE		LICENSE
README.md		README.md

License

sadnanMohosin/Data-Mining-complete-Bootcamp

Folders and files

Latest commit

History

Repository files navigation

Data Mining Complete Bootcamp

Dataset Installation

The Steps in Data Mining

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages