Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable importance for nominal variables with few categories #24

Open
nhejazi opened this issue May 2, 2019 · 0 comments
Open

Variable importance for nominal variables with few categories #24

nhejazi opened this issue May 2, 2019 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@nhejazi
Copy link
Member

nhejazi commented May 2, 2019

Theoretically, it should be sound to perform variable importance assessment based on a grid of counterfactual shift values with nominal variables; however, in practice, such variables (even when converted via as.numeric) have few unique values. This leads to a downstream bug due to sl3's Variable_Type where the nominal variables are categorized as categorical rather than continuous. This bug is non-trivial to track down and can be distressing to users. A simple but naive solution is to add mean-zero noise to nominal variables such that there appear to be more than 20 or so unique values, as this is sufficient to trick sl3 into recognizing the variable as continuous. For example, in the following variable u has only 4 (ordered) categories but will be recognized as categorical:

n <- 10000
u_idx <- runif(n)
u <- rep(NA, n)
u[u_idx <= 0.1] <- "A"
u[u_idx > 0.1 & u_idx <= 0.3] <- "B"
u[u_idx > 0.3 & u_idx <= 0.95] <- "C"
u[u_idx > 0.95] <- "D"
u <- as.numeric(as.factor(u))

To have it recognized as continuous, one could implement

u <- u + runif(n, -0.001, 0.001)

which will have more categories than the original u yet remain the same in expectation.

@nhejazi nhejazi self-assigned this May 2, 2019
@nhejazi nhejazi added the bug Something isn't working label May 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant