Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested batch effects in MAST #97

Open
LuckyMD opened this issue Aug 1, 2018 · 14 comments
Open

Nested batch effects in MAST #97

LuckyMD opened this issue Aug 1, 2018 · 14 comments

Comments

@LuckyMD
Copy link

LuckyMD commented Aug 1, 2018

Hi!

I was wondering if you might be able to help me with an issue I've been having. I am trying to run MAST with a nested batch effect.

I have data from an experiment with 6 mice in 2 conditions (3 mice each), spread over 6 plates (condition & control spread half-half per plate). Now I'd like to run MAST on normalized, but not batch-corrected data and include the batch as a covariate.

First I ran ~condition + sample + chip + ngeneson, where sample is just the mouse. I got expected genes out of this, although down-regulated when I expected them to be up-regulated. Then, I noticed that sample is just a higher resolution of condition (each mouse is either condition or control). So that would lead to a non-full-rank design matrix and should have output wrong results I assume.

Running just ~condition + chip + ngeneson however gives me far fewer DE genes, and not the ones I would expect. I assume this has to do with the variability between mice making the background noise quite high.

I was wondering if I can add the mouse covariate so that it is fit separately per condition to account for variation between mice without interfering with the condition covariate.

Thanks for any help you can provide!

@gfinak
Copy link
Member

gfinak commented Aug 1, 2018

I don't understand your design from your description. Can you clarify, please? Perhaps post your design matrix?

@LuckyMD
Copy link
Author

LuckyMD commented Aug 1, 2018

Conceptually I set up my design matrix using the variables:

Condition Sample Chip ngeneson
TRUE. A C1. 1030
FALSE. B C1 2025
FALSE. B. C2. 2583
TRUE. A. C2. 2353
TRUE. C. C3. 1843
FALSE. D. C3. 1283
......

I assume this would have been transformed into 0s and 1s automatically and an offset would have been added. What I mean to say is that in this toy example Condition==TRUE includes samples A and C and Condition==FALSE includes samples B and D. Thus, if I include both condition and sample in the design matrix this should not be full rank as condition can be inferred from sample.

However, I did run this setup... and I recovered differentially expressed genes over the condition covariate. This puzzles me a little, as I would assume the variance in condition would have already been accounted for in the sample covariate.

Now, when I run without the sample covariate, I recover very different DEGs over condition.

Is there any way to account for sample variance within condition in the model?

@gfinak
Copy link
Member

gfinak commented Aug 1, 2018

Maybe use a random effect model with method='glmer' but with only three samples and three chips you're not going to get great estimates of the variability.
Something like: ~Condition+ngeneson+(1|Sample)+(1|Chip).
And center your ngeneson variable.
@amcdavid any thoughts?
Also, as a warning there's issue #98 that's related to estimation with random effects that we need to look into still.

@LuckyMD
Copy link
Author

LuckyMD commented Aug 1, 2018

I just used three in the example above. I actually have 8 chips and 8 samples split over 2 conditions.

Does centering the ngeneson variable make a big difference? That would be something I could alter.

I'm not very familiar with mixed effect models. What exactly would I be modeling using (1|Sample) + (1|Chip)?

Do you have any idea why I would get an output at all with the above design matrix (including sample and condition)? How can MAST fit a model where it could assign the variance to either the sample or condition covariate?

@gfinak
Copy link
Member

gfinak commented Aug 1, 2018

  • The above models a random intercept for each chip and each sample.
  • Centering does make a difference. Coefficients are interpreted as expected values when ngeneson is 0, which is meaningless and outside any data you actually observe. So centering ngeneson gives the interpretation of coefficients as the expected value when ngeneson is set to the mean.
  • Sample and condition are not entirely aliased (you have more samples than conditions).

@LuckyMD
Copy link
Author

LuckyMD commented Dec 5, 2018

Hi again!
I recently tried your suggested solution of using the following zlm command:
zlmCond_astro <- zlm(formula = ~condition + (1|sample) + (1|chip) + ngeneson, method='glmer', ebayes=FALSE, sca=sca_astro_filt, parallel=FALSE)

However, I am running into an error regarding a fixef() function. Could you decipher the following error message for me?

Error in fixef(object@fitC) : could not find function "fixef"

@gfinak
Copy link
Member

gfinak commented Dec 6, 2018 via email

@LuckyMD
Copy link
Author

LuckyMD commented Dec 6, 2018

Ah, thanks. I wasn't aware of other dependencies. I assumed I had accidentally renamed a fixed() function and was looking for my error ;). Mea culpa.

@amcdavid
Copy link
Member

amcdavid commented Dec 6, 2018

I think we have good reasons to have lme4 be only Suggested. Can we import its methods anyways? Putting a call to if(require(lme4)) seems like bad behavior since it modifies the search path (and I think is worthy of a warning from R CMD check

@fbrundu
Copy link

fbrundu commented Jul 31, 2020

Hi all, I post here since the topic is the same and this issue is still open, but I can open a new issue if necessary.
I would like to compare two conditions with nested batches, but with a simpler design matrix (I have no chip covariate).
The composition of each condition is similar to:

condition  batch     
Case       Sample1   
Case       Sample2
Case       Sample3
Case       Sample4
Control    Sample5
Control    Sample6
Control    Sample7
Control    Sample8

In such case, with no chip covariate, does it make sense to also model the batch or would it lead to losing the condition signal, i.e.:

~ condition + ngeneson + batch

?
At the moment I'm using MAST without the batch covariate

~ condition + ngeneson

since I thought introducing the batch covariate would lead to the wrong results in this case. Is this justified, or I should be able to use it instead?

Thanks

@gfinak
Copy link
Member

gfinak commented Jul 31, 2020

I assume you have multiple cells per sample? If so I think you could include a random batch effect.

@fbrundu
Copy link

fbrundu commented Aug 3, 2020

Thanks @gfinak, yes I have multiple cells per sample. I'll try with the batch covariate.
I would like to learn more about why the batch covariate doesn't remove the condition. In a previous message, you indicate that "Sample and condition are not entirely aliased (you have more samples than conditions)." Do you have at hand some documentation I can read to understand better how the MAST's model works in this case? Thanks!

@fbrundu
Copy link

fbrundu commented Aug 10, 2020

I tested using the batch covariate. I noticed that, although some genes related to the condition (positive controls) are recovered, most of the others are not. And the (log-fold-change, FDR) of the recovered genes is smaller (LFC) and less significant (FDR) with respect to the analysis without the batch covariate.

Is it correct to state that in this case, due to conditions partially confounded with batch (4 unique batches per condition), the statistical power of the test is reduced when adding the batch covariate? Keeping however in mind that adding the batch covariate is the more appropriate approach... Thanks.

@noobCoding
Copy link

Hi @fbrundu, I had the same problem when using MAST with the batch covariate. Has the problem with batch covariate been solved, somehow?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants