Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lasso with SL.glmnet #15

Open
PFMB opened this issue Jan 29, 2018 · 1 comment
Open

Lasso with SL.glmnet #15

PFMB opened this issue Jan 29, 2018 · 1 comment

Comments

@PFMB
Copy link

PFMB commented Jan 29, 2018

Hi,

I tried to estimate the AverageTreatmentEffect with 'ltmle' using solely
SL.library="SL.glmnet" for the LASSO for variable selection which results in:

Error in lognet(x,is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one
multinomial or binomial class has 1 or 0 observations; not allowed
Additional warning:
In FUN(X[[i]], ...) : Error in algorithm SL.glmnet
The Algorithm will be removed from the Super Learner (i.e. given weight0)

Error occured during call to SuperLearner:
Q.kplus1 ~ A.1 + L.1 + Y.1 + A.2 + L.2 + Y.2 + A.3
Note that some SuperLeaner libraries crash when called with continuous dependent variables, as in the case of initial Q regressions with continuous Y or subsequent Q regressions even if Y is binary.
The error reported is:
Error in system.time({ : All algorithms dropped from library

Used R-Code:

set.seed(123)
n <- 36 # no. of obs are particularly low
t <- 3 # points in time

A <- data.frame(matrix(rbinom( n*t ,1,0.6), n, t))
L <- A+data.frame(matrix(rnorm( n*t ,0,1), n, t))
Y <- L*data.frame(matrix(rgamma( n*t,2,4), n, t))
df <- data.frame(A[,1],L[,1],Y[,1],A[,2],L[,2],Y[,2],A[,3],L[,3],Y[,3]) # assume A->L->Y
colnames(df) <- c("A.1","L.1","Y.1","A.2","L.2","Y.2","A.3","L.3","Y.3")
YRANGE <- c(min(Y),max(Y))

SL.lib1 <- c("SL.glmnet")
SL.lib2 <- c("SL.stepAIC","SL.knn","SL.gam","SL.glm.interaction")

ltmle_est <- ltmle(df, Lnodes = c(2,5,8), Anodes=c(1,4,7), Cnodes = NULL,
Ynodes=c(3,6,9), Yrange= YRANGE, estimate.time = FALSE,
gbounds=c(0.05,1),
abar = list(treament = rep(1L,t),
control = rep(0L,t)),
SL.library=SL.lib1)

I assume that the internal transformation of the continuous response variable
according to Yrange to [0,1] is the origin of the problem. I guess a change in the family argument is needed? I tried to define my own wrapper by modifying https://github.com/ecpolley/SuperLearner/blob/master/R/SL.glmnet.R without success. What do I need to change to fix this?

Many thanks and regards!

@ck37

@joshuaschwab
Copy link
Owner

Hi,
I don't know glmnet especially well, but I think the problem is that glmnet does not work with family="binomial" and continuous Y.
n <- 100
p <- 5
X <- matrix(rnorm(n * p), n, p)
beta <- rnorm(p)
Y.continuous <- plogis(X %*% beta + rnorm(n))
Y.binary <- rbinom(n, 1, Y.continuous)
m1 <- SL.glmnet(Y.binary, X, X, family=binomial(), obsWeights = rep(1, n)) #ok
m2 <- SL.glmnet(Y.continuous, X, X, family=binomial(), obsWeights = rep(1, n)) #error

You could use a different SL.library, but if you only have 36 observations, you may be better off using glm instead of SuperLearner (SL.library = "glm", the default).

Josh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants