Fit treatment propensity based only on initial time point #40

ck37 · 2018-04-21T18:19:22Z

Hello,

I'm trying to conduct an intent-to-treat analysis and am having difficulty fitting the treatment propensity score using only t == 0 (baseline) and then using that estimated baseline treatment propensity across all time points with time-varying censoring.

It looks like when I stratify and only calculate it on t == 0, the treatment propensity score for all remaining time points (1 to 170) may be set to 1 by default? Any recs or hints on how to apply the t = 0 propensity score to all future time points? Censoring on the other hand I do want to be estimated at each time point (or in a pooled manner). Also just fyi at t == 0, 8.7% of the sample is treated.

This is using scrambled data from Romain and attempting to replicate Marcus, J. L., Neugebauer, R. S., Leyden, W. A., Chao, C. R., Xu, L., Quesenberry Jr, C. P., ... & Silverberg, M. J. (2016). Use of abacavir and risk of cardiovascular disease among HIV-infected individuals. JAIDS Journal of Acquired Immune Deficiency Syndromes, 71(4), 413-419.

Here is my rough code:

vars = list(
  id = "ID",
  time = "intnum",
  treatment = "exposure",
  censoring = "censor",
  outcome = "outcome",
  covars_time_varying =
    c("cd4", "I.cd4", "vl", "I.vl", "egfr.low", "I.egfr.low", "diabetes", "htn.meds",
      "llt.meds")
  # covars_baseline will be defined after factors_to_indicators conversion.
)

odata =
  importData(dt,
             ID = vars$id,
             t = vars$time,
             covars = c(vars$covars_baseline, vars$covars_time_varying),
             CENS = vars$censoring,
             TRT = vars$treatment,
             OUTCOME = vars$outcome)

# Try simpler propensity score specification to get a better distribution.
(gform_TRT = paste(vars$treatment, "~", paste(c("eversmoke", "everdrug", "sex_m", "art.naive"), collapse = " + ")))

# For ITT parameter only estimate treatment propensity at baseline.
# (uses rlang::list2 and !! so that we can substitute the treatment variable into the list name)
(stratify_TRT = list2(!!vars$treatment := paste0(vars$time, " == 0L")))

(gform_CENS = paste(vars$censoring, "~", paste(c(vars$covariates), collapse = " + ")))

odata = fitPropensity(odata,
                      gform_TRT = gform_TRT,
                      gform_CENS = gform_CENS,
                      stratify_TRT = stratify_TRT)

# Excessive right skewing: 1st quartile to max are all 1.0
# Min is 6.7%.
# Perhaps because we need to examine only t = 0?
summary(odata$g_preds$g0.A)
qplot(odata$g_preds$g0.A)

Thank you and happy to provide any more context if it would be helpful. Sorry if this is an easy fix.

osofr · 2018-04-21T18:42:47Z

Hm, this is a good one. Let me think about it for a second. Definitely possible, just the question of what is the easiest hack for this within existing functionality.

ck37 · 2018-04-21T18:57:09Z

Thanks, appreciate it. Maybe there is a hack to fit the estimator manually outside of stremr, add the predictions to the data table as a new column with replicated predicted values across all time points, then configure stremr to use that column as the treatment propensity score (manually create or integrate into the fitPropensity object)?

ck37 · 2018-04-21T18:59:43Z

I guess longer term perhaps there could be a subset_TRT formula for fitPropensity() to identify the subset of observations used for estimation.

Or perhaps if I used an sl3 estimator I could manually subset the data within a wrapper, or include a separate data processing step in the pipeline that subsets to t == 0? Would also need to handle the replication of predicted values, possibly in a custom prediction wrapper.

osofr · 2018-04-21T19:28:20Z

But do you really want to replicate the actual predicted values across all time-points or rather use the model fit from t==0 to obtain predictions of propensity scores across all the other time-points? I was under the impression it was the latter, where as you seem to suggest its the former?

Can you please confirm that this is exactly what you are trying to accomplish: Fit the model A(0) ~ L(0) and use that model fit to obtain the predictions pi_0 = P(A(0)|L(0)) (\hat added to pi_0 to emphasize these are fits, not true probs). Then multiply all time-varying weights (e.g., censoring) across all time-points t=0,..,K by this additional weight 1/pi_0? What is the purpose of that, not sure I follow? So this additional weight only changes by observation, but it is time-invariant, right?

On the other hand you may want to use the above model fit to obtain predictions pi_t = P(A(t)|L(t)) or P(A(0)|L(t)), which would imply that the weights are also time-varying. It seems to me this is not what you are trying to do.

ck37 · 2018-04-21T19:46:03Z

My interpretation of the article is that for the ITT parameter the treatment propensity score (& associated weights) is time-invariant and based on exposure status at t == 0. So yes that would be what you have in paragraph 2. I could definitely be misinterpreting it though, here are some key excerpts from the article:

We fit pooled logistic marginal structural models (MSMs) to estimate discrete time hazards of CVD comparing ART regimens with and without abacavir using IPW estimation to account for baseline confounding and time-varying selection bias in intention-to-treat and per-protocol analyses, analogous to a randomized trial. The data were structured to allow the exposure, outcome, right-censoring, and time-dependent covariates to change every 30 days after ART initiation. Propensity score logistic models predicted exposure at baseline and censoring over time as a result of death, health plan disenrollment, switching exposure groups, or end of study.

The first approach was intention-to-treat, which focused on the comparison of the effect of ART initiation with or without abacavir, regardless of how long subjects remained on the initial regimen. For these analyses, switching exposure groups at any time was ignored, thus maximizing follow-up and statistical power. This approach estimates the legacy effect of a 1-time decision to initiate regimens with or without abacavir.

What do you think?

osofr · 2018-04-21T19:48:50Z

If above is correct, then absolutely the easiest thing to do is evaluate this subject specific weight 1/pi_0 outside stremr. Then create a dataset with this additional weight column, structured as ID, t, weight, then pass this weight dataset as an additional argument to the relevant estimation routine (I know survNPMSM can handle it, but would have to double check that TMLE can as well.

Here is the code that basically shows how additional weights would be integrated in the weights dataset:
https://github.com/osofr/stremr/blob/master/R/main_estimation.R#L11-L27

Another, much more robust option, is to manually incorporate the additional weight 1/pi_0 into your weights dataset and then continue to use that new, adjusted weights dataset within stremr. This is you could do this manually:

Treat stremr as if it only has time-varying censoring and has absolutely no knowledge about your baseline time-invariant treatment A(0).
Fit the propensity scores and obtain the weights dataset (getIPWeights) for your intervention of interest, based on time-varying censoring alone (the weights dataset for now excludes weights based on baseline propensity treatment score, but we will add them shortly)
Estimate the additional weight 1/pi_0 outside stremr using whatever sl3 learner you want. Alternatively, run stremr again (separately from 1 & 2), with the input data that includes only a single time-point, i.e., (A(0),L(0)). Then fit the propensity scores to that data for A(0)~L(0) and extract the weights dataset that now contains (ID,t=0,1/pi_0).
Create an additional weight dataset, with replicated rows across all-time points. Another way to do that is to exclude t from the weights data.table in 3. and then do a simple merge of the weight data.table from 2 with weights data.table from 3. That way (if the merge is done right) the additional column 1/pi_0 would be simply replicated across all time-points.
Over-write the cumulative weight column it with 1/pi_0, as it is basically done here: https://github.com/osofr/stremr/blob/545642c0326b14f191b034d7b55018c9b80424bc/R/main_estimation.R#L23
Continue using this modifying weights data.table throughout stremr. Make sure to pass this weights dataset to all TMLE routines!!!! Otherwise TMLE will create its own weights dataset, which will obviously ignore 1/pi_0.

Hope this helps, let me know when I can close the issue or if you have any questions.

ck37 · 2018-04-21T19:58:45Z

Thanks a bunch, ok I will give this a try and report back.

osofr · 2018-04-21T19:59:13Z

Scratch that.

I believe what you originally did would accomplish exactly the same effect as the weight merging procedure that I have described above. It would be nice to confirm it though, but I don't think its necessary.

Basically I think what you were doing is fine. You initial propensity score (for t=0) would be evaluated as P(A(0)|L(0)*P(C(0)=1|L(0)). Your propensity scores for t>0 would be evaluated as P(C(t)=1|L(t)).

So the cumulative weight at t=1, would be (automatically) evaluated as
[1 / {P(A(0)|L(0)*P(C(0)=1|L(0))}] x [1 / P(C(1)=1|L(1))

I believe this corresponds exactly to the intervention that you are after. Intervene on baseline treatment only and intervene on the censoring process across all time-points. Might also double confirm it with @romainkp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fit treatment propensity based only on initial time point #40

Fit treatment propensity based only on initial time point #40

ck37 commented Apr 21, 2018 •

edited

Loading

osofr commented Apr 21, 2018

ck37 commented Apr 21, 2018

ck37 commented Apr 21, 2018 •

edited

Loading

osofr commented Apr 21, 2018

ck37 commented Apr 21, 2018 •

edited

Loading

osofr commented Apr 21, 2018

ck37 commented Apr 21, 2018

osofr commented Apr 21, 2018

Fit treatment propensity based only on initial time point #40

Fit treatment propensity based only on initial time point #40

Comments

ck37 commented Apr 21, 2018 • edited Loading

osofr commented Apr 21, 2018

ck37 commented Apr 21, 2018

ck37 commented Apr 21, 2018 • edited Loading

osofr commented Apr 21, 2018

ck37 commented Apr 21, 2018 • edited Loading

osofr commented Apr 21, 2018

ck37 commented Apr 21, 2018

osofr commented Apr 21, 2018

ck37 commented Apr 21, 2018 •

edited

Loading

ck37 commented Apr 21, 2018 •

edited

Loading

ck37 commented Apr 21, 2018 •

edited

Loading