Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a reason why the object in learner logs isn't inside the learner key? #108

Open
robotenique opened this issue Nov 7, 2019 · 1 comment
Labels
bug Something isn't working

Comments

@robotenique
Copy link
Contributor

Code sample

Taking a look at the return logs of the learners, e.g. the logistic regression one:

    log = {'logistic_classification_learner': {
        'features': features,
        'target': target,
        'parameters': merged_params,
        'prediction_column': prediction_column,
        'package': "sklearn",
        'package_version': sk_version,
        'feature_importance': dict(zip(features, clf.coef_.flatten())),
        'training_samples': len(df)},
        'object': clf}

Problem description

Is there a reason why the object key isn't inside the dictionary of logistic_classification_learner? This leads to a problem where, if I have multiple learners in my pipeline, the final object depends only on the order of the learners inside the pipeline, and I lose the objects of the first learners.
E.g.: My pipeline is (logistic_regression, isotonic_calibration). Since the build_pipeline function will merge the logs of the two objects, the final object will have only the isotonic calibration, and I lose the logistic_regression object.

Expected behavior

Access all learner objects of the pipeline, not just the last one.

Possible solutions

Put the learner object inside the dictionary of the logs:

    log = {'logistic_classification_learner': {
        'features': features,
        'target': target,
        'parameters': merged_params,
        'prediction_column': prediction_column,
        'package': "sklearn",
        'package_version': sk_version,
        'feature_importance': dict(zip(features, clf.coef_.flatten())),
        'training_samples': len(df),
        'object': clf}
        }
@robotenique robotenique added the bug Something isn't working label Nov 7, 2019
@caique-lima
Copy link
Contributor

caique-lima commented Nov 7, 2019

I'll double check this, but seems that we have some typo. Looking at the code this "object" key should be dropped to avoid a huge training log, I'm saying this based on this line https://github.com/nubank/fklearn/blob/master/src/fklearn/training/pipeline.py#L75

If the was 'obj' instead of 'object', the key would be dropped in your learner's log, and will be available only in the key '__fkml__', under the learners key. But given that the name is object, nothing happens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants