pruned-cv

Introduction

The package implements Pruned Cross-Validation technique, which verifies whether all folds are worth calculating. It's components may be used as a standalone methods or as a part of hyperparameter optimization frameworks like Hyperopt or Optuna.

It proved to be more less three times faster than Scikit-Learn GridSearchCV and RandomizedSearchCV yielding the same results (see benchmarks section).

You can find a broader overview of the motivation an methodology under this directory or alternatively on Medium.

Motivation

The idea was to improve speed of hyperparameter optimization. All the methods which are based on Cross-Validation require big folds number (8 is an absolute minimum) to assure that the surrogate model (whether it's GridSearch, RandomSearch or a Bayesian model) does not overfit to the training set.

On the other hand Optuna proposes a mechanism of pruned learning for Artificial Neural Networks and Gradient Boosting Algorithms. It speeds the search process greatly but one issue with the method is that is prunes the trials based on a single validation sample. With relatively small datasets the model's quality variance may be high and lead to suboptimal hyperparameters choices. In addition it can only help to optimize an estimator and not the whople ML pipeline.

Pruned-cv is a compromise between brut-force methods like GridSearch and more elaborate, but vulnerable ones like Optuna's pruning.

How does it work?

You can see example of correlations between cumulative scores on folds with the final score:

You may find the whole study notebook here.

The package uses the fact that cumulative scores are highly correlated with the final score. In most cases after calculating 2 folds it's possible to predict the final score very accurately. If the partial score is very poor the cross-validation is stopped (pruned) and the final scores value is predicted based on best till now scores. If the partial score fits within some tolerance limit, next folds are evaluated.

Installation

The package works with Python 3. To install it clone the repository:

git clone [email protected]:PiotrekGa/pruned-cv.git

and run:

pip install -e pruned-cv

Examples

You can find example notebooks in the examples section of the repository.

Usage with Optuna

https://github.com/PiotrekGa/pruned-cv/blob/master/examples/Usage_with_Optuna.ipynb

Usage with Hyperopt

https://github.com/PiotrekGa/pruned-cv/blob/master/examples/Usage_with_Hyperopt.ipynb

Benchmarks

You can find benchmarks in examples section.

Grid Search CV

https://github.com/PiotrekGa/pruned-cv/blob/master/examples/GridSearchCV_Benchmark.ipynb

Randmized Search CV

https://github.com/PiotrekGa/pruned-cv/blob/master/examples/RandomizedSearchCV_Benchmark.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
examples		examples
prunedcv		prunedcv
tests		tests
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pruned-cv

Introduction

Motivation

How does it work?

Installation

Examples

Usage with Optuna

Usage with Hyperopt

Benchmarks

Grid Search CV

Randmized Search CV

About

Releases

Packages

Languages

License

PiotrekGa/pruned-cv

Folders and files

Latest commit

History

Repository files navigation

pruned-cv

Introduction

Motivation

How does it work?

Installation

Examples

Usage with Optuna

Usage with Hyperopt

Benchmarks

Grid Search CV

Randmized Search CV

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages