Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug? mlens.utils.check_instances returns wrong pipeline orders #122

Open
lunluen opened this issue Sep 6, 2019 · 4 comments
Open

Bug? mlens.utils.check_instances returns wrong pipeline orders #122

lunluen opened this issue Sep 6, 2019 · 4 comments

Comments

@lunluen
Copy link

lunluen commented Sep 6, 2019

If I understand it right, check_instances seems to tend to cluster transformers with same types without regard to the list order.
So the results from make_group and BaseEnsemble are also wrong.

Reproduce - Case 1

Code

from mlens.utils import check_instances
from sklearn.preprocessing import (
    StandardScaler as SS, FunctionTransformer as FT)
from xgboost import XGBClassifier


clf = XGBClassifier(n_estimators=10)
def f1(): return 1
def f2(): return 2

estimators = [clf]
preprocessing = [FT(f1), SS(), FT(f2)]

check_instances(estimators, preprocessing)[0][0][1]

Result

[('functiontransformer-1',
  FunctionTransformer(accept_sparse=False, check_inverse=True,
                      func=<function f1 at 0x000002532FDB0C80>, inv_kw_args=None,
                      inverse_func=None, kw_args=None, pass_y='deprecated',
                      validate=None)),
 ('functiontransformer-2',
  FunctionTransformer(accept_sparse=False, check_inverse=True,
                      func=<function f2 at 0x000002532FDB0D90>, inv_kw_args=None,
                      inverse_func=None, kw_args=None, pass_y='deprecated',
                      validate=None)),
 ('standardscaler', StandardScaler(copy=True, with_mean=True, with_std=True))]

Expected Behavior

The order should be [FT(f1), SS(), FT(f2)] rather than FT(f1), FT(f2), SS()


Reproduce - Case 2

Code

from mlens.utils import check_instances
from sklearn.preprocessing import (
    StandardScaler as SS, FunctionTransformer as FT)
from xgboost import XGBClassifier


clf = XGBClassifier(n_estimators=10)
def f1(): return 1
def f2(): return 2

estimators = {
    'key-1': [clone(clf)],
    'key-2': [clone(clf)],
}
preprocessing = {
    'key-1': [FT(f2), SS(), FT(f1)],
    'key-2': [SS(), FT(f2), FT(f1)],
}

check_instances(estimators, preprocessing)[0]

Result

[('key-1',
  [('functiontransformer-1',
    FunctionTransformer(accept_sparse=False, check_inverse=True,
                        func=<function f2 at 0x0000025336E1C840>, inv_kw_args=None,
                        inverse_func=None, kw_args=None, pass_y='deprecated',
                        validate=None)),
   ('functiontransformer-2',
    FunctionTransformer(accept_sparse=False, check_inverse=True,
                        func=<function f1 at 0x0000025336E1C9D8>, inv_kw_args=None,
                        inverse_func=None, kw_args=None, pass_y='deprecated',
                        validate=None)),
   ('standardscaler-1',
    StandardScaler(copy=True, with_mean=True, with_std=True))]),
 ('key-2',
  [('functiontransformer-3',
    FunctionTransformer(accept_sparse=False, check_inverse=True,
                        func=<function f2 at 0x0000025336E1C840>, inv_kw_args=None,
                        inverse_func=None, kw_args=None, pass_y='deprecated',
                        validate=None)),
   ('functiontransformer-4',
    FunctionTransformer(accept_sparse=False, check_inverse=True,
                        func=<function f1 at 0x0000025336E1C9D8>, inv_kw_args=None,
                        inverse_func=None, kw_args=None, pass_y='deprecated',
                        validate=None)),
   ('standardscaler-2',
    StandardScaler(copy=True, with_mean=True, with_std=True))])]

key1 should be [FT(f2), SS(), FT(f1)] rather than [FT(f2), FT(f1), SS()]
key2 should be [FT(f2), SS(), FT(f1)] rather than [FT(f2), FT(f1), SS()]

@lunluen lunluen changed the title mlens.utils.check_instances returns a wrong pipeline order Bug? mlens.utils.check_instances returns wrong pipeline orders Sep 6, 2019
@lunluen
Copy link
Author

lunluen commented Sep 6, 2019

Sorry the correct last line is key2 should be [SS(), FT(f2), FT(f1)] rather than [FT(f2), FT(f1), SS()]

@lunluen
Copy link
Author

lunluen commented Sep 6, 2019

>>> import mlens
>>> mlens.__version__
'0.2.3'

@flennerhag
Copy link
Owner

Hey! Thanks for flagging this, definitely looks like a bug. It's reordering based on name, the offending line seems to be the sorted() call applied to the output of _format_instances, on L99, which will sort the list of named preprocessors according to their names.

I'll try to fix this some time this week, but feel free to make a PR.

@lunluen
Copy link
Author

lunluen commented Sep 9, 2019

Hi @flennerhag , Thanks for the reply. And... I found another bug here. 😲

Code

from mlens.utils import check_instances
from sklearn.preprocessing import (
    StandardScaler as SS, FunctionTransformer as FT)
from xgboost import XGBClassifier


clf = XGBClassifier(n_estimators=10)
def f1(): return 1
def f2(): return 2

s = SS()

estimators = {
    'key-1': [clone(clf)],
    'key-2': [clone(clf)],
}
preprocessing = {
    'key-1': [FT(f2), s, FT(f1)],
    'key-2': [s, FT(f2), FT(f1)],
}

check_instances(estimators, preprocessing)[0]

Result

[('key-1',
  [('functiontransformer-1',
    FunctionTransformer(accept_sparse=False, check_inverse=True,
                        func=<function f2 at 0x000001C9B3C389D8>, inv_kw_args=None,
                        inverse_func=None, kw_args=None, pass_y='deprecated',
                        validate=None)),
   ('functiontransformer-2',
    FunctionTransformer(accept_sparse=False, check_inverse=True,
                        func=<function f1 at 0x000001C9B3C38BF8>, inv_kw_args=None,
                        inverse_func=None, kw_args=None, pass_y='deprecated',
                        validate=None))]),
 ('key-2',
  [('functiontransformer-3',
    FunctionTransformer(accept_sparse=False, check_inverse=True,
                        func=<function f2 at 0x000001C9B3C389D8>, inv_kw_args=None,
                        inverse_func=None, kw_args=None, pass_y='deprecated',
                        validate=None)),
   ('functiontransformer-4',
    FunctionTransformer(accept_sparse=False, check_inverse=True,
                        func=<function f1 at 0x000001C9B3C38BF8>, inv_kw_args=None,
                        inverse_func=None, kw_args=None, pass_y='deprecated',
                        validate=None)),
   ('standardscaler-1',
    StandardScaler(copy=True, with_mean=True, with_std=True)),
   ('standardscaler-2',
    StandardScaler(copy=True, with_mean=True, with_std=True))])]

The two standard scaler (singleton) are grouped together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants