Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fit treatment propensity based only on initial time point #40

Open
ck37 opened this issue Apr 21, 2018 · 8 comments
Open

Fit treatment propensity based only on initial time point #40

ck37 opened this issue Apr 21, 2018 · 8 comments

Comments

@ck37
Copy link

ck37 commented Apr 21, 2018

Hello,

I'm trying to conduct an intent-to-treat analysis and am having difficulty fitting the treatment propensity score using only t == 0 (baseline) and then using that estimated baseline treatment propensity across all time points with time-varying censoring.

It looks like when I stratify and only calculate it on t == 0, the treatment propensity score for all remaining time points (1 to 170) may be set to 1 by default? Any recs or hints on how to apply the t = 0 propensity score to all future time points? Censoring on the other hand I do want to be estimated at each time point (or in a pooled manner). Also just fyi at t == 0, 8.7% of the sample is treated.

This is using scrambled data from Romain and attempting to replicate Marcus, J. L., Neugebauer, R. S., Leyden, W. A., Chao, C. R., Xu, L., Quesenberry Jr, C. P., ... & Silverberg, M. J. (2016). Use of abacavir and risk of cardiovascular disease among HIV-infected individuals. JAIDS Journal of Acquired Immune Deficiency Syndromes, 71(4), 413-419.

Here is my rough code:

vars = list(
  id = "ID",
  time = "intnum",
  treatment = "exposure",
  censoring = "censor",
  outcome = "outcome",
  covars_time_varying =
    c("cd4", "I.cd4", "vl", "I.vl", "egfr.low", "I.egfr.low", "diabetes", "htn.meds",
      "llt.meds")
  # covars_baseline will be defined after factors_to_indicators conversion.
)

odata =
  importData(dt,
             ID = vars$id,
             t = vars$time,
             covars = c(vars$covars_baseline, vars$covars_time_varying),
             CENS = vars$censoring,
             TRT = vars$treatment,
             OUTCOME = vars$outcome)

# Try simpler propensity score specification to get a better distribution.
(gform_TRT = paste(vars$treatment, "~", paste(c("eversmoke", "everdrug", "sex_m", "art.naive"), collapse = " + ")))

# For ITT parameter only estimate treatment propensity at baseline.
# (uses rlang::list2 and !! so that we can substitute the treatment variable into the list name)
(stratify_TRT = list2(!!vars$treatment := paste0(vars$time, " == 0L")))

(gform_CENS = paste(vars$censoring, "~", paste(c(vars$covariates), collapse = " + ")))

odata = fitPropensity(odata,
                      gform_TRT = gform_TRT,
                      gform_CENS = gform_CENS,
                      stratify_TRT = stratify_TRT)

# Excessive right skewing: 1st quartile to max are all 1.0
# Min is 6.7%.
# Perhaps because we need to examine only t = 0?
summary(odata$g_preds$g0.A)
qplot(odata$g_preds$g0.A)

Thank you and happy to provide any more context if it would be helpful. Sorry if this is an easy fix.

@osofr
Copy link
Collaborator

osofr commented Apr 21, 2018

Hm, this is a good one. Let me think about it for a second. Definitely possible, just the question of what is the easiest hack for this within existing functionality.

@ck37
Copy link
Author

ck37 commented Apr 21, 2018

Thanks, appreciate it. Maybe there is a hack to fit the estimator manually outside of stremr, add the predictions to the data table as a new column with replicated predicted values across all time points, then configure stremr to use that column as the treatment propensity score (manually create or integrate into the fitPropensity object)?

@ck37
Copy link
Author

ck37 commented Apr 21, 2018

I guess longer term perhaps there could be a subset_TRT formula for fitPropensity() to identify the subset of observations used for estimation.

Or perhaps if I used an sl3 estimator I could manually subset the data within a wrapper, or include a separate data processing step in the pipeline that subsets to t == 0? Would also need to handle the replication of predicted values, possibly in a custom prediction wrapper.

@osofr
Copy link
Collaborator

osofr commented Apr 21, 2018

But do you really want to replicate the actual predicted values across all time-points or rather use the model fit from t==0 to obtain predictions of propensity scores across all the other time-points? I was under the impression it was the latter, where as you seem to suggest its the former?

Can you please confirm that this is exactly what you are trying to accomplish: Fit the model A(0) ~ L(0) and use that model fit to obtain the predictions pi_0 = P(A(0)|L(0)) (\hat added to pi_0 to emphasize these are fits, not true probs). Then multiply all time-varying weights (e.g., censoring) across all time-points t=0,..,K by this additional weight 1/pi_0? What is the purpose of that, not sure I follow? So this additional weight only changes by observation, but it is time-invariant, right?

On the other hand you may want to use the above model fit to obtain predictions pi_t = P(A(t)|L(t)) or P(A(0)|L(t)), which would imply that the weights are also time-varying. It seems to me this is not what you are trying to do.

@ck37
Copy link
Author

ck37 commented Apr 21, 2018

My interpretation of the article is that for the ITT parameter the treatment propensity score (& associated weights) is time-invariant and based on exposure status at t == 0. So yes that would be what you have in paragraph 2. I could definitely be misinterpreting it though, here are some key excerpts from the article:

We fit pooled logistic marginal structural models (MSMs) to estimate discrete time hazards of CVD comparing ART regimens with and without abacavir using IPW estimation to account for baseline confounding and time-varying selection bias in intention-to-treat and per-protocol analyses, analogous to a randomized trial. The data were structured to allow the exposure, outcome, right-censoring, and time-dependent covariates to change every 30 days after ART initiation. Propensity score logistic models predicted exposure at baseline and censoring over time as a result of death, health plan disenrollment, switching exposure groups, or end of study.

The first approach was intention-to-treat, which focused on the comparison of the effect of ART initiation with or without abacavir, regardless of how long subjects remained on the initial regimen. For these analyses, switching exposure groups at any time was ignored, thus maximizing follow-up and statistical power. This approach estimates the legacy effect of a 1-time decision to initiate regimens with or without abacavir.

What do you think?

@osofr
Copy link
Collaborator

osofr commented Apr 21, 2018

If above is correct, then absolutely the easiest thing to do is evaluate this subject specific weight 1/pi_0 outside stremr. Then create a dataset with this additional weight column, structured as ID, t, weight, then pass this weight dataset as an additional argument to the relevant estimation routine (I know survNPMSM can handle it, but would have to double check that TMLE can as well.

Here is the code that basically shows how additional weights would be integrated in the weights dataset:
https://github.com/osofr/stremr/blob/master/R/main_estimation.R#L11-L27

Another, much more robust option, is to manually incorporate the additional weight 1/pi_0 into your weights dataset and then continue to use that new, adjusted weights dataset within stremr. This is you could do this manually:

  1. Treat stremr as if it only has time-varying censoring and has absolutely no knowledge about your baseline time-invariant treatment A(0).
  2. Fit the propensity scores and obtain the weights dataset (getIPWeights) for your intervention of interest, based on time-varying censoring alone (the weights dataset for now excludes weights based on baseline propensity treatment score, but we will add them shortly)
  3. Estimate the additional weight 1/pi_0 outside stremr using whatever sl3 learner you want. Alternatively, run stremr again (separately from 1 & 2), with the input data that includes only a single time-point, i.e., (A(0),L(0)). Then fit the propensity scores to that data for A(0)~L(0) and extract the weights dataset that now contains (ID,t=0,1/pi_0).
  4. Create an additional weight dataset, with replicated rows across all-time points. Another way to do that is to exclude t from the weights data.table in 3. and then do a simple merge of the weight data.table from 2 with weights data.table from 3. That way (if the merge is done right) the additional column 1/pi_0 would be simply replicated across all time-points.
  5. Over-write the cumulative weight column it with 1/pi_0, as it is basically done here: https://github.com/osofr/stremr/blob/545642c0326b14f191b034d7b55018c9b80424bc/R/main_estimation.R#L23
  6. Continue using this modifying weights data.table throughout stremr. Make sure to pass this weights dataset to all TMLE routines!!!! Otherwise TMLE will create its own weights dataset, which will obviously ignore 1/pi_0.

Hope this helps, let me know when I can close the issue or if you have any questions.

@ck37
Copy link
Author

ck37 commented Apr 21, 2018

Thanks a bunch, ok I will give this a try and report back.

@osofr
Copy link
Collaborator

osofr commented Apr 21, 2018

Scratch that.

I believe what you originally did would accomplish exactly the same effect as the weight merging procedure that I have described above. It would be nice to confirm it though, but I don't think its necessary.

Basically I think what you were doing is fine. You initial propensity score (for t=0) would be evaluated as P(A(0)|L(0)*P(C(0)=1|L(0)). Your propensity scores for t>0 would be evaluated as P(C(t)=1|L(t)).

So the cumulative weight at t=1, would be (automatically) evaluated as
[1 / {P(A(0)|L(0)*P(C(0)=1|L(0))}] x [1 / P(C(1)=1|L(1))

I believe this corresponds exactly to the intervention that you are after. Intervene on baseline treatment only and intervene on the censoring process across all time-points. Might also double confirm it with @romainkp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants