Training on GPU is not successful using XGBClassifier when training data is too large #10301

madakkmi · 2024-05-20T07:26:49Z

I have X_train and y_train with shapes (483903, 2897) and (483903,) respectively. Training XGBoost is successful on GPU using the following code:

import xgboost as xgb
fit_kwargs = {'tree_method': 'hist', 'device': 'cuda'}
dtrain = xgb.DMatrix(X_train, label=y_train, feature_names=list(X_train.columns))
model = xgb.train(fit_kwargs, dtrain)

However, the following code does not run on GPU successfully:

from xgboost import XGBClassifier
model = XGBClassifier(**fit_kwargs)
model.fit(X_train, y_train)

It throws the error:

XGBoostError: [16:43:48] C:\buildkite-agent\builds\buildkite-windows-cpu-autoscaling-group-i-0b3782d1791676daf-1\xgboost\xgboost-ci-windows\src\tree\updater_gpu_hist.cu:781: Exception in gpu_hist: [16:43:48] C:\buildkite-agent\builds\buildkite-windows-cpu-autoscaling-group-i-0b3782d1791676daf-1\xgboost\xgboost-ci-windows\src\data\../common/device_helpers.cuh:431: Memory allocation error on worker 0: bad allocation: cudaErrorMemoryAllocation: out of memory
- Free memory: 1997537280
- Requested memory: 5558567356

It is expected that if xgb.train(fit_kwargs, dtrain) runs on GPU successfully, then fitting using XGBClassifier should also run on GPU successfully.

xgboost version = 2.0.3

The text was updated successfully, but these errors were encountered:

trivialfis · 2024-05-20T08:18:08Z

Hi, if you replace the DMatrix object with QuantileDMatrix in the native interface snippet, does it work? In addition, what's the type of X_train?

madakkmi · 2024-05-21T00:32:01Z

@trivialfis, if DMatrix is replaced with QuantileDMatrix, then training in native interface fails with the following error.

XGBoostError: [09:56:51] C:\buildkite-agent\builds\buildkite-windows-cpu-autoscaling-group-i-0b3782d1791676daf-1\xgboost\xgboost-ci-windows\src\tree\updater_gpu_hist.cu:781: Exception in gpu_hist: [09:56:51] C:\buildkite-agent\builds\buildkite-windows-cpu-autoscaling-group-i-0b3782d1791676daf-1\xgboost\xgboost-ci-windows\src\data\../common/device_helpers.cuh:431: Memory allocation error on worker 0: bad allocation: cudaErrorMemoryAllocation: out of memory
- Free memory: 1974468608
- Requested memory: 5558567356

Data type of X_train is given below.

X_train.dtypes.value_counts()
Out[78]: 
float16    2683
float64     214
Name: count, dtype: int64

trivialfis · 2024-05-21T02:19:13Z

That makes sense; thank you for sharing. Could you please share the type of input, such as whether it's a pandas dataframe or a cudf dataframe?

madakkmi · 2024-05-21T02:25:12Z

@trivialfis , thank you for your quick reply. Here is what you requested:

X_train.__class__
Out[79]: pandas.core.frame.DataFrame

y_train.__class__
Out[80]: pandas.core.series.Series

trivialfis · 2024-05-21T09:25:43Z

Hi, I noticed that using the native interface, you are training a regression model with the default objective rmse, while it's a classification model when sklearn is used. Could you please fix that?
Classification uses more memory since it needs to train one model for each class.

madakkmi · 2024-05-21T10:25:18Z

@trivialfis, thanks for noticing that. I've done modified the code (as below), and it runs successfully on GPU.

import xgboost as xgb
fit_kwargs_native = {'objective': 'binary:logistic', 'tree_method': 'hist', 'device': 'gpu'}
dtrain = xgb.DMatrix(X_train, label=y_train, feature_names=list(X_train.columns))
model = xgb.train(fit_kwargs_native, dtrain)

trivialfis added the status: need update label May 20, 2024

trivialfis removed the status: need update label May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training on GPU is not successful using XGBClassifier when training data is too large #10301

Training on GPU is not successful using XGBClassifier when training data is too large #10301

madakkmi commented May 20, 2024 •

edited

trivialfis commented May 20, 2024

madakkmi commented May 21, 2024

trivialfis commented May 21, 2024 •

edited

madakkmi commented May 21, 2024

trivialfis commented May 21, 2024 •

edited

madakkmi commented May 21, 2024 •

edited

Training on GPU is not successful using XGBClassifier when training data is too large #10301

Training on GPU is not successful using XGBClassifier when training data is too large #10301

Comments

madakkmi commented May 20, 2024 • edited

trivialfis commented May 20, 2024

madakkmi commented May 21, 2024

trivialfis commented May 21, 2024 • edited

madakkmi commented May 21, 2024

trivialfis commented May 21, 2024 • edited

madakkmi commented May 21, 2024 • edited

madakkmi commented May 20, 2024 •

edited

trivialfis commented May 21, 2024 •

edited

trivialfis commented May 21, 2024 •

edited

madakkmi commented May 21, 2024 •

edited