-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with Categorical Feature Encoding in Binary Classification #2636
Comments
Hello! |
Hi, Ekaterina, Thank you for your response. One-hot encoding definitely works in this toy case. |
Hello,
First, I would like to express my appreciation for the CatBoost library; it has been a fantastic tool for numerous machine learning tasks. However, I've encountered an encoding anomaly with categorical features that I cannot explain.
Reproduction Steps:
I created a simplified dataset for a binary classification task with a single categorical feature having three unique values. Two of these values correspond to conversions in the dataset, while the third value has zero conversions. Using CatBoost "out of the box," the model fails to differentiate between the categories; i.e., it outputs the same prediction across all feature values during testing.
What I've Tried:
Attachments:
I am attaching a Jupyter notebook with the example for your reference. catboost_debug_encoding.ipynb.zip
Could you please help understand why the default settings fail to distinguish between these categories and any possible steps to resolve this?
Thank you for your assistance and for developing such a powerful tool.
The text was updated successfully, but these errors were encountered: