New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Warning if User Passes Categorical Columns to h2o.cor() #12903
Comments
Lauren DiPerna commented: question originally came up here: https://stackoverflow.com/questions/53265386/how-does-h2o-cor-deal-with-categorical-data |
JIRA Issue Migration Info Jira Issue: PUBDEV-6057 |
Can you tell me, where the code file is because I am unable to find it. |
@tomasfryda have raised this PR, to close this issue, please review and merge. |
We should add a warning for the python/r method h2o.cor(), that tells users the method is only intended for numeric columns, if they try to pass a categorical column.
we should also add a runit/pyunit test to test what happens if a user passes a categorical. Right now it seems that we return NA for categorical columns with more than two levels.
{code}
library(h2o)
h2o.init()
create a categorical column called k2 with 5 levels and 20 values
k2 = rep(c('her', 'him', 'cat', 'mouse', 'dog'),4)
create a categorical column with two levels
k = rep(c('her', 'him'),10)
#create a numeric column with 20 values
n <- 20
h <- runif(n)
see what happens if you try to calculate the correlation of a numeric with a binary categorical
h2o.cor(as.h2o(k),as.h2o(h))
0.07981525
see what happens when you try to calculate the correlation of a numeric with a multi-level categorical
h2o.cor(as.h2o(k2),as.h2o(h))
NA
{code}
The text was updated successfully, but these errors were encountered: