-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing multiclass-multioutput support #292
Comments
Yes, that is correct. The feature is not implemented since there is no metric for that in scikit-learn, no one of the contributors so far needed that feature and we do not have any data to test that. Do you have any reference for multiclass-multioutput? |
I think my issue was in fact #293. A I see that there is an active pull requests to add a metric though: scikit-learn/scikit-learn#3681 (addressing scikit-learn/scikit-learn#3453) But it looks stale. |
Okay, I see. However, |
No, it seems it is supported by any regressor:
BTW, similar can be done for classifiers as well, using I think it would be a good first start, just to have something.
I mean, application is that I can run auto-sklearn on any dataset I get, no matter the task it has. ;-) |
The reason why we didn't implement anything together with the |
Is there any progress on this issue? |
No, there's no one working on this. Do you want to contribute this feature? |
If the y label is (longitude,latitude),which means i wanna predict a location, auto-sklearn can support? I think there are numerous case about geo location,for example to predict where the user will drop off or pickup for Uber. |
Auto-sklearn currently does not implement this feature. While we think that this feature would be good to have, I doubt that anyone from our team will implement this in the near future. Therefore, any contribution in getting this feature into Auto-sklearn would be greatly appreciated. |
What would be needed to get this feature in? This library is missing crucial functionality without supporting these two problem types. I'd be willing to look into this if I can get an idea of how much effort it would take. |
From the top of my head:
Point 2 is somewhat optional if only having unsupervised preprocessing and random forests for classification/regression are fine. |
I mean theoretically every regressor can support it via this cludge? Of course, that's not optimal, but it will work since it cannot correlate data between output but it will give some functionality instead of just erroring. There is a multiclassification class that does the same thing. For metrics, the following metrics are supported for multioutput according to the doc:
Following classifiers support multilabel without the cludge. It's difficult to find a list of regressors that support multioutput but it looks DecisionTrees and forms of Linear regressions along with their variants support it out of the box without the multioutput cludge. That seems to be what I can find in the latest about which metrics and models work. Haven't found much about ensemble for multioutput regression but at least some of them support multiclass models. Given the list of regressors and models that support it, what would need to be done? |
Alternatively, RegressorChain will be in the next version of Sklearn so that might be easier to work with: https://github.com/scikit-learn/scikit-learn/pull/9257/files We already have ClassiferChain after all. |
Yes, that's what I meant that wrapper which you'd need to plug around all kinds of classifiers - but I think that's secondary to get some basic functionality. Also, I don't think that ClassifierChain and RegressorChain are easily applicable in Auto-sklearn as it would be unclear how to tune their hyperparameters in a fast way. Regarding the ensemble: Auto-sklearn uses an ensemble to post-hoc combine the models into an ensemble - ideally that one still works afterwards. I think it would be easiest to start by adding multilabel regression (if that's of interest to you) by:
Please excuse that this is rather complicated and not all in one place, but we didn't desgin Auto-sklearn to be extensible for different tasks. |
My thesis topic is quite related to this issue, which is forecasting multi-output electricity load. I wonder if the team is working on multi-output auto regressor. If not, I will be willing to try that. |
Does it mean that the meta-learning part in your pipe-line will not be conducted well if I try to implement MultioutputClassifier directly on autosklearn? Since I read the paper that you published and knew that meta-learning makes use of warmstart. |
Thanks to @charlesfu4 we now have multi-output regression available in the development branch, and it will be available in the next release of Auto-sklearn. |
Closing this as scikit-learn still doesn't support multiclass-multioutput support. We can create a new, clean issue for this once scikit-learn provides metrics to evaluate multioutput-multiclass predictions. |
Matthias, I see that you mentioned almost two years ago that you may have a clean issue for this once scikit-learn provides metrics to evaluate multioutput-multiclass predictions. Do you know where you and your team is with this. As mentioned in some of the earlier messages, I would also like to apply automl to predict a location (latitude/longitude). Thanks for all that you do! It's greatly appreciated. Eric |
Good morning Eddie, Thanks, |
Good day Eddie, Thanks, |
I see that multiclass-multioutput support is missing? Couldn't in that case code just split learning into one model per output?
The text was updated successfully, but these errors were encountered: