-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot get the number of selected classifiers by a DES algorithm #130
Comments
Hello, I agree this would be a useful feature in the library. In the current version, the information is being computed in the select method (which called inside the classify_with_ds method). However, in the current version, this information is never accessible to the user. We need to define a way to return the selected classifiers to the user, while still maintaining the library standards. I will think about how to make that easily accessible to the user. This should be a good feature to add for the v0.4 release. |
@Menelau - here are some ideas for this issue As you mentioned:
Alternatives:
Note that both solutions have some problems: in (1), the functions "predict" and "predict_proba" would return either 1 value (the normal case), or 2 (when "return_selected_classifiers=True"). Option (2), on the other hand, may be misused: in general it is not good to store as an instance variable this type of "temporary" values. Look at this example: pred = knop.predict(x) If "some_other_func" uses knop.predict again (e.g. with different x), then "selected_classifiers" would have the incorrect value. I think option (1) is still preferable: the only case that the function would return two values is when the user is actually asking for the value to be returned. |
Well I also think option 1 is better. Do you if there is any other estimator on scikit-learn that can return more than one value? |
I did a search for "return_" in the sklearn code base, and it seems that this strategy is used in a lot of cases. For instance, the KNN method "kneighbors" has a "return_distances" argument, that changes what is returned (just the indices, or also the distances). https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html#sklearn.neighbors.NearestNeighbors.kneighbors That being said, I think we should implement option 1. I will take care of this issue |
Great! One think to think about is when the DS mechanism is not used to classify a certain example (either all classifiers agree or it is being classified by the KNN method). So in these cases maybe we should have an special marker indicating that the DS mechanism was not used for this example. |
That complicates things a little bit. Some ideas:
Both options may be misused: For (1), if someone counts the average number of classifiers in selected_classifiers without taking "used_ds" into consideration, the value will be incorrect. Same thing for the second case, if the user does not properly disregard the examples with an empty list. Another way is to return "used_ds" as a list of indexes, and the "selected_classifiers" be an array of [n_selected x n_classifiers]. |
May I suggest adding an "debug" or "experimental" which would allow for DESlib models to store data which is not critical for production but could greatly help researchers such the historical of selected_classfiers or other stuff? I understand that this is not an standardized solution but could at least facilitate obtaining such kind of information. |
@maffei2443 Hello, I think having this functionality as an "debug" mode would be the best way of solving this issue for now, as we haven't figured out a way of adding this functionality while respecting other constraints/design patterns from scikit-learn. Would you be interested in working on adding this functionality? Unfortunately I'm quite busy until the end of the year with little time to dedicate for coding. So I can't guarantee that I could add it myself in a short period of time. |
Hello good people
I have been trying to get the number of classifiers selected by a DES algorithm. But I could not figure out how this can be achieved. I think it would a nice and useful feature to have.
Cheers
Zahid
The text was updated successfully, but these errors were encountered: