Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing multiclass-multioutput support #292

Closed
mitar opened this issue May 29, 2017 · 22 comments
Closed

Missing multiclass-multioutput support #292

mitar opened this issue May 29, 2017 · 22 comments
Labels
enhancement A new improvement or feature

Comments

@mitar
Copy link

mitar commented May 29, 2017

I see that multiclass-multioutput support is missing? Couldn't in that case code just split learning into one model per output?

@mfeurer
Copy link
Contributor

mfeurer commented May 29, 2017

Yes, that is correct. The feature is not implemented since there is no metric for that in scikit-learn, no one of the contributors so far needed that feature and we do not have any data to test that. Do you have any reference for multiclass-multioutput?

@mitar
Copy link
Author

mitar commented May 29, 2017

I think my issue was in fact #293. A continuous-multioutput was misclassified as multiclass-multioutput so this is why I thought I need it.

I see that there is an active pull requests to add a metric though: scikit-learn/scikit-learn#3681 (addressing scikit-learn/scikit-learn#3453) But it looks stale.

@mfeurer
Copy link
Contributor

mfeurer commented May 30, 2017

Okay, I see. However, continuous-multioutput is not supported by auto-sklearn. This type of class is only supported by the tree-based models in scikit-learn, right? What would be an application of this kind of data?

@mitar
Copy link
Author

mitar commented May 30, 2017

No, it seems it is supported by any regressor:

Multioutput regression support can be added to any regressor with MultiOutputRegressor. This strategy consists of fitting one regressor per target. Since each target is represented by exactly one regressor it is possible to gain knowledge about the target by inspecting its corresponding regressor. As MultiOutputRegressor fits one regressor per target it can not take advantage of correlations between targets.

BTW, similar can be done for classifiers as well, using MultiOutputClassifier.

I think it would be a good first start, just to have something.

What would be an application of this kind of data?

I mean, application is that I can run auto-sklearn on any dataset I get, no matter the task it has. ;-)

@mfeurer
Copy link
Contributor

mfeurer commented May 31, 2017

The reason why we didn't implement anything together with the MultiOutputClassifier and MultiOutputRegressor is that they didn't nicely support partial_fit and models with warmstarts. As partial_fit seems to be added in sklearn==0.19 we can hopefully add this feature then.

@BrechtBa
Copy link

Is there any progress on this issue?
I also have several multi-output regressions to which I would like to apply auto-sklearn.

@mfeurer
Copy link
Contributor

mfeurer commented Dec 13, 2017

No, there's no one working on this. Do you want to contribute this feature?

@berisfu
Copy link

berisfu commented Jan 12, 2018

If the y label is (longitude,latitude),which means i wanna predict a location, auto-sklearn can support? I think there are numerous case about geo location,for example to predict where the user will drop off or pickup for Uber.

@mfeurer
Copy link
Contributor

mfeurer commented Jan 12, 2018

Auto-sklearn currently does not implement this feature. While we think that this feature would be good to have, I doubt that anyone from our team will implement this in the near future. Therefore, any contribution in getting this feature into Auto-sklearn would be greatly appreciated.

@Skylion007
Copy link

What would be needed to get this feature in? This library is missing crucial functionality without supporting these two problem types. I'd be willing to look into this if I can get an idea of how much effort it would take.

@mfeurer
Copy link
Contributor

mfeurer commented Jan 22, 2018

From the top of my head:

  • making sure that the metrics work with this kind of data
  • making sure that all the models are able to train on this kind of data
  • making sure that ensembles work on this kind of data
    And then doing an integration check.

Point 2 is somewhat optional if only having unsupervised preprocessing and random forests for classification/regression are fine.

@Skylion007
Copy link

I mean theoretically every regressor can support it via this cludge? Of course, that's not optimal, but it will work since it cannot correlate data between output but it will give some functionality instead of just erroring. There is a multiclassification class that does the same thing.

For metrics, the following metrics are supported for multioutput according to the doc:

mean_squared_error, mean_absolute_error, explained_variance_score and r2_score.

Following classifiers support multilabel without the cludge.

It's difficult to find a list of regressors that support multioutput but it looks DecisionTrees and forms of Linear regressions along with their variants support it out of the box without the multioutput cludge.

That seems to be what I can find in the latest about which metrics and models work. Haven't found much about ensemble for multioutput regression but at least some of them support multiclass models. Given the list of regressors and models that support it, what would need to be done?

@Skylion007
Copy link

Alternatively, RegressorChain will be in the next version of Sklearn so that might be easier to work with: https://github.com/scikit-learn/scikit-learn/pull/9257/files

We already have ClassiferChain after all.

@mfeurer
Copy link
Contributor

mfeurer commented Jan 24, 2018

Yes, that's what I meant that wrapper which you'd need to plug around all kinds of classifiers - but I think that's secondary to get some basic functionality. Also, I don't think that ClassifierChain and RegressorChain are easily applicable in Auto-sklearn as it would be unclear how to tune their hyperparameters in a fast way.

Regarding the ensemble: Auto-sklearn uses an ensemble to post-hoc combine the models into an ensemble - ideally that one still works afterwards.

I think it would be easiest to start by adding multilabel regression (if that's of interest to you) by:

Please excuse that this is rather complicated and not all in one place, but we didn't desgin Auto-sklearn to be extensible for different tasks.

@charlesfu4
Copy link
Contributor

My thesis topic is quite related to this issue, which is forecasting multi-output electricity load. I wonder if the team is working on multi-output auto regressor. If not, I will be willing to try that.

@charlesfu4
Copy link
Contributor

The reason why we didn't implement anything together with the MultiOutputClassifier and MultiOutputRegressor is that they didn't nicely support partial_fit and models with warmstarts. As partial_fit seems to be added in sklearn==0.19 we can hopefully add this feature then.

Does it mean that the meta-learning part in your pipe-line will not be conducted well if I try to implement MultioutputClassifier directly on autosklearn? Since I read the paper that you published and knew that meta-learning makes use of warmstart.

This was referenced Mar 17, 2020
@mfeurer
Copy link
Contributor

mfeurer commented Jul 3, 2020

Thanks to @charlesfu4 we now have multi-output regression available in the development branch, and it will be available in the next release of Auto-sklearn.

@franchuterivera franchuterivera added the enhancement A new improvement or feature label Feb 17, 2021
@mfeurer
Copy link
Contributor

mfeurer commented Sep 6, 2021

Closing this as scikit-learn still doesn't support multiclass-multioutput support. We can create a new, clean issue for this once scikit-learn provides metrics to evaluate multioutput-multiclass predictions.

@mfeurer mfeurer closed this as completed Sep 6, 2021
@tron27
Copy link

tron27 commented Aug 28, 2023

Matthias,
It seems as though scikit-learn now offers/supports multiclass-multioutput support. See the link below:
https://scikit-learn.org/stable/modules/multiclass.html

I see that you mentioned almost two years ago that you may have a clean issue for this once scikit-learn provides metrics to evaluate multioutput-multiclass predictions. Do you know where you and your team is with this. As mentioned in some of the earlier messages, I would also like to apply automl to predict a location (latitude/longitude). Thanks for all that you do! It's greatly appreciated.

Eric

@eddiebergman
Copy link
Contributor

Hi @tron27,

Can you make a new issue about this and I can add it to #1677

Best,
Eddie

@tron27
Copy link

tron27 commented Aug 29, 2023

Good morning Eddie,
Sure, I can make a new issue about this so that you can add it to #1677.

Thanks,
Eric

@tron27
Copy link

tron27 commented Aug 29, 2023

Good day Eddie,
I have created a new issue about this. You can find it here:

#1685

Thanks,
Eric

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new improvement or feature
Projects
None yet
Development

No branches or pull requests

9 participants