Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stratified estimator #17

Open
benkeser opened this issue Jul 14, 2017 · 3 comments
Open

stratified estimator #17

benkeser opened this issue Jul 14, 2017 · 3 comments
Assignees

Comments

@benkeser
Copy link
Owner

Adding in a way to compute stratified Kaplan-Meier-style estimator would also be a nice addition. The code currently can be hacked to do this as follows:

set.seed(1234)
n <- 100
t_0 <- 6
trt <- rbinom(n, 1, 0.5)
# e.g., study site
adjustVars <- data.frame(site = (rbinom(n,1,0.5) + 1))
ftime <- round(1 + runif(n, 1, 350) - trt + adjustVars$site)
ftype <- round(runif(n, 0, 1))

#' make a formula that will implement an empirical hazard
#' estimator when called in estimateHazard
get.ftimeForm <- function(trt, site){
	form <- "-1"
	for(i in unique(trt)){
		for(s in unique(site)){
			form <- c(form, 
			  paste0("I(trt==",i,"& site == ",s," & t==",
			         unique(ftime[ftype>0 & trt==i & site == s]),")",
			         collapse="+"))
		}
	}
	return(paste(form,collapse="+"))
}


#' make a formula that will implement an empirical hazard
#' estimator when called in estimateCensoring
get.ctimeForm <- function(trt, site){
	form <- "-1"
	for(i in unique(trt)){
		for(s in unique(site)){
			form <- c(form, 
			  paste0("I(trt==",i,"& site == ",s," & t==",
			         unique(ftime[ftype==0 & trt==i & site == s]),")",
			         collapse="+"))
		}
	}
	return(paste(form,collapse="+"))
}

form.ftime <- get.ftimeForm(trt = trt, site = adjustVars$site)
form.ctime <- get.ctimeForm(trt = trt, site = adjustVars$site)

fit <- survtmle(ftime = ftime, 
                ftype = ftype,
                trt = trt,
                adjustVars = adjustVars,
                glm.ftime = form.ftime,
                glm.ctime = form.ctime,
                glm.trt = "1", t0 = 300)

This example runs fine for n <- 100 but not for n <- 1000. Changing the call to glm within estimateCensoring and estimateHazards to family = gaussian() allows larger sample sizes to run, albeit still somewhat slowly (e.g., ~ 1 minute for n <- 1000). speedglm helps get this down to around 10 seconds.

@benkeser benkeser self-assigned this Jul 14, 2017
@nhejazi nhejazi self-assigned this Jul 24, 2017
@nhejazi
Copy link
Collaborator

nhejazi commented Aug 23, 2017

We have the speedglm functionality working now, but did we maybe want to write a wrapper for the main survtmle function to allow computation of the stratified estimator be more straightforward?

@benkeser
Copy link
Owner Author

Why don't we not worry about it and include an example in the vignette of how to do it with direct calls to hazard_tmle (or mean_tmle, which I suspect may be faster).

@nhejazi
Copy link
Collaborator

nhejazi commented Aug 24, 2017

Sure, sounds good to me. I'll just re-open to remind us to keep it on the radar in terms of adding this to the vignette soon enough.

@nhejazi nhejazi reopened this Aug 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants