Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using sklearn StratifiedKFold in Evaluator CV #135

Open
oberlage opened this issue Sep 24, 2020 · 1 comment
Open

Error when using sklearn StratifiedKFold in Evaluator CV #135

oberlage opened this issue Sep 24, 2020 · 1 comment

Comments

@oberlage
Copy link

Hi there,

First of all, thanks for providing this nice library, it's really helpful in our project!

We are implementing the Evaluator class to do a grid search but our data needs stratification. We were happy to read in the documentation that the Evaluator class also accepts "a KFold class that obeys the Scikit-learn API". This would allow us to use the sklearn.model_selection.StratifiedKFold class and easily stratify our data in the cross validation.

However, when implementing this, we get the following error:

[MLENS] backend: threading
Traceback (most recent call last):
  File "mlens_kfol_cv.py", line 29, in <module>
    n_iter=10
  File "/Users/user/.python-virtualenvs/some_env/lib/python3.7/site-packages/mlens/model_selection/model_selection.py", line 492, in fit
    self._fit(X, y, job)
  File "/Users/user/.python-virtualenvs/some_env/lib/python3.7/site-packages/mlens/model_selection/model_selection.py", line 180, in _fit
    manager.process(self, job, X, y)
  File "/Users/user/.python-virtualenvs/some_env/lib/python3.7/site-packages/mlens/parallel/backend.py", line 855, in process
    caller.indexer.fit(self.job.predict_in, self.job.targets, self.job.job)
  File "/Users/user/.python-virtualenvs/some_env/lib/python3.7/site-packages/mlens/index/fold.py", line 147, in fit
    check_full_index(n, self.folds, self.raise_on_exception)
  File "/Users/user/.python-virtualenvs/some_env/lib/python3.7/site-packages/mlens/index/_checks.py", line 19, in check_full_index
    "type(%s) was passed." % type(folds))
ValueError: 'folds' must be an integer. type(<class 'sklearn.model_selection._split.KFold'>) was passed.   

The error seems to contradict the documentation of the Evalutator class.

The error can be reproduced with the following (dummy) code:

import numpy as np
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import mean_absolute_error
from mlens.model_selection import Evaluator
from mlens.metrics import make_scorer
from sklearn.linear_model import Lasso
from scipy.stats import uniform

scorer = make_scorer(mean_absolute_error, greater_is_better=False)
estimators = [('lasso',Lasso())]
param_dicts = {
    'lasso':
        {'alpha': uniform(1e-6, 1e-5)},
}

x_train = np.random.rand(10,1)
y_train = np.random.rand(10)

evl = Evaluator(
    scorer,
    cv=StratifiedKFold(),
    verbose=5,
)
evl.fit(
    x_train, y_train,
    estimators=estimators,
    param_dicts=param_dicts,
    n_iter=10
)

We're using Python 3.7.6 with the following library versions:

mlens==0.2.3
scikit-learn==0.22.1
numpy==1.18.1
scipy==1.4.1

Do you have any insights on how to get this solved?

@oberlage oberlage changed the title Error on using sklearn StratifiedKFold in Evaluator CV Error when using sklearn StratifiedKFold in Evaluator CV Sep 24, 2020
@agartland
Copy link

Were you able to implement a KFold object with mlens? I'm hoping to be able to use stratified k-fold CV for the Evaluator as well as the SuperLearner. Workarounds would be OK too! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants