Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] lgb.cv mixes booster verbosity with evaluation metric monitoring #6162

Open
david-cortes opened this issue Oct 30, 2023 · 3 comments · May be fixed by #6172
Open

[R-package] lgb.cv mixes booster verbosity with evaluation metric monitoring #6162

david-cortes opened this issue Oct 30, 2023 · 3 comments · May be fixed by #6172

Comments

@david-cortes
Copy link
Contributor

When one calls lgb.cv, one usually wants to monitor the evaluation metric in the training and validation set as the fitting procedure happens, and one usually doesn't want to know about every internal detail of what's happening inside each boosting round.

lgb.cv takes a verbosity parameter, with levels for fatal/warning/info/debug, but the messages about the metric of interest in the training and validation data do not correspond to any of those. Currently, they'd be bumped together with info, even though they don't say it in the message.

But there's no current arrangement in which one would only get the messages about the metrics and not the internal messages from each boosting round - example:

library(lightgbm)
data(mtcars)
y <- mtcars$mpg
x <- mtcars[, -1] |> as.matrix()
result <- lgb.cv(
    data = lgb.Dataset(x, label=y),
    params = list(
        objective = "regression",
        metric = "l2",
        min_data_in_leaf = 5
    ),
    nrounds = 5,
    nfold = 3,
    verbose = 1,
    eval_train_metric = TRUE
)
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000055 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 74
[LightGBM] [Info] Number of data points in the train set: 22, number of used features: 10
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000044 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 74
[LightGBM] [Info] Number of data points in the train set: 21, number of used features: 10
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000041 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 74
[LightGBM] [Info] Number of data points in the train set: 21, number of used features: 10
[LightGBM] [Info] Start training from score 20.604546
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Start training from score 19.761905
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Start training from score 19.880952
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[1] "[1]:  train's l2:29.7871+2.01926  valid's l2:31.7638+4.84005"
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[1] "[2]:  train's l2:25.4878+1.93599  valid's l2:28.1109+4.93162"
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[1] "[3]:  train's l2:21.9533+1.92091  valid's l2:25.3463+5.08398"
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[1] "[4]:  train's l2:18.9155+1.72977  valid's l2:23.1842+5.24103"
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[1] "[5]:  train's l2:16.4374+1.55234  valid's l2:21.0872+5.31824"

Whereas one would usually want to see only these:

[1] "[1]:  train's l2:29.7871+2.01926  valid's l2:31.7638+4.84005"
[1] "[2]:  train's l2:25.4878+1.93599  valid's l2:28.1109+4.93162"
[1] "[3]:  train's l2:21.9533+1.92091  valid's l2:25.3463+5.08398"
[1] "[4]:  train's l2:18.9155+1.72977  valid's l2:23.1842+5.24103"
[1] "[5]:  train's l2:16.4374+1.55234  valid's l2:21.0872+5.31824"

And without the [1] that gets added by R's print function.

Should ideally have a separate parameter for deciding whether to print the metric that wouldn't be tied to the booster's internals.

@bennyjg
Copy link

bennyjg commented Nov 6, 2023

Please quickly review and fix. This mix up of loggings is very annoying. I am using the Python API (not R) and it is frustrating to see all those info and warning messages that are of no interest to me.

@jameslamb
Copy link
Collaborator

I am using the Python API (not R)

@bennyjg Thanks for using LightGBM.

The issue you're commenting on here is specific to the R package.

If you'd like to suggest different behavior for the Python package, please first check that there aren't other issues at https://github.com/microsoft/LightGBM/issues already tracking what you're asking about, then open a new issue with a minimal, reproducible example showing the behavior you're currently seeing and explaining the behavior you'd expect/prefer.

@jameslamb
Copy link
Collaborator

Thanks very much for taking the time to write this up!

I agree with the proposal that these things should be separated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants