Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: tuneThreshold - Minimization for measures needing maximization #2857

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jokokojote
Copy link

When using tuneThreshold to get the best threshold for a measure that needs maximization it does not work, but provides the minima (worst threshold) and returns it with the wrong sign.

Reproduce:

model = train(makeLearner("classif.rpart", predict.type = "prob"), sonar.task)
preds = predict(model, sonar.task)

performance(preds, bac)
# bac
# 0.8763815

tuneThreshold(preds, bac) # minimum instead of maximum returned (with wrong sign)
# $th
# [1] 0.9309793
#
# $perf
# bac
# -0.545045

d = generateThreshVsPerfData(preds, bac)$data
min(d$bac[2:99]) # min bac
# 0.545045
max(d$bac[2:99]) # max bac
# 0.8763815

Cause

The defined callback function in tune threshold always makes it a minimization problem for all measures (e.g. for measures that need maximization as well):
ifelse(measure$minimize, 1, -1) * performance(setThreshold(pred, x), measure, task, model, simpleaggr = TRUE)

When optimizeSubInts is called the maximum flag is set depending on the measure's minimize flag (even though it was already handled and it will be always a minimization problem at this point). This leads to finding the overall minima not maxima of a measure that needs to be maximized (e.g. bac, acc etc.).

Fix

After changing the call of optimizeSubInts to search for the minima always it works correctly:

> model = train(makeLearner("classif.rpart", predict.type = "prob"), sonar.task)
> preds = predict(model, sonar.task)
> performance(preds, bac)
      bac 
0.8763815 
> tuneThreshold(preds, bac) 
$th
[1] 0.5309993

$perf
      bac 
0.8763815 

> d = generateThreshVsPerfData(preds, bac)$data
> min(d$bac[2:99]) 
[1] 0.545045
> max(d$bac[2:99]) 
[1] 0.8763815

@jokokojote
Copy link
Author

@pat-s @larskotthoff @mllg @berndbischl please review, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant