Nested batch effects in MAST #97

LuckyMD · 2018-08-01T13:57:55Z

Hi!

I was wondering if you might be able to help me with an issue I've been having. I am trying to run MAST with a nested batch effect.

I have data from an experiment with 6 mice in 2 conditions (3 mice each), spread over 6 plates (condition & control spread half-half per plate). Now I'd like to run MAST on normalized, but not batch-corrected data and include the batch as a covariate.

First I ran ~condition + sample + chip + ngeneson, where sample is just the mouse. I got expected genes out of this, although down-regulated when I expected them to be up-regulated. Then, I noticed that sample is just a higher resolution of condition (each mouse is either condition or control). So that would lead to a non-full-rank design matrix and should have output wrong results I assume.

Running just ~condition + chip + ngeneson however gives me far fewer DE genes, and not the ones I would expect. I assume this has to do with the variability between mice making the background noise quite high.

I was wondering if I can add the mouse covariate so that it is fit separately per condition to account for variation between mice without interfering with the condition covariate.

Thanks for any help you can provide!

The text was updated successfully, but these errors were encountered:

gfinak · 2018-08-01T15:56:41Z

I don't understand your design from your description. Can you clarify, please? Perhaps post your design matrix?

LuckyMD · 2018-08-01T21:14:41Z

Conceptually I set up my design matrix using the variables:

Condition Sample Chip ngeneson
TRUE. A C1. 1030
FALSE. B C1 2025
FALSE. B. C2. 2583
TRUE. A. C2. 2353
TRUE. C. C3. 1843
FALSE. D. C3. 1283
......

I assume this would have been transformed into 0s and 1s automatically and an offset would have been added. What I mean to say is that in this toy example Condition==TRUE includes samples A and C and Condition==FALSE includes samples B and D. Thus, if I include both condition and sample in the design matrix this should not be full rank as condition can be inferred from sample.

However, I did run this setup... and I recovered differentially expressed genes over the condition covariate. This puzzles me a little, as I would assume the variance in condition would have already been accounted for in the sample covariate.

Now, when I run without the sample covariate, I recover very different DEGs over condition.

Is there any way to account for sample variance within condition in the model?

gfinak · 2018-08-01T21:24:15Z

Maybe use a random effect model with method='glmer' but with only three samples and three chips you're not going to get great estimates of the variability.
Something like: ~Condition+ngeneson+(1|Sample)+(1|Chip).
And center your ngeneson variable.
@amcdavid any thoughts?
Also, as a warning there's issue #98 that's related to estimation with random effects that we need to look into still.

LuckyMD · 2018-08-01T21:46:07Z

I just used three in the example above. I actually have 8 chips and 8 samples split over 2 conditions.

Does centering the ngeneson variable make a big difference? That would be something I could alter.

I'm not very familiar with mixed effect models. What exactly would I be modeling using (1|Sample) + (1|Chip)?

Do you have any idea why I would get an output at all with the above design matrix (including sample and condition)? How can MAST fit a model where it could assign the variance to either the sample or condition covariate?

gfinak · 2018-08-01T21:53:49Z

The above models a random intercept for each chip and each sample.
Centering does make a difference. Coefficients are interpreted as expected values when ngeneson is 0, which is meaningless and outside any data you actually observe. So centering ngeneson gives the interpretation of coefficients as the expected value when ngeneson is set to the mean.
Sample and condition are not entirely aliased (you have more samples than conditions).

LuckyMD · 2018-12-05T17:52:00Z

Hi again!
I recently tried your suggested solution of using the following zlm command:
zlmCond_astro <- zlm(formula = ~condition + (1|sample) + (1|chip) + ngeneson, method='glmer', ebayes=FALSE, sca=sca_astro_filt, parallel=FALSE)

However, I am running into an error regarding a fixef() function. Could you decipher the following error message for me?

Error in fixef(object@fitC) : could not find function "fixef"

gfinak · 2018-12-06T02:39:09Z

Maybe load the lme4 package first? Greg Finak

…

On Wed, Dec 5, 2018, 09:52 MalteDLuecken ***@***.*** wrote: Hi again! I recently tried your suggested solution of using the following zlm command: zlmCond_astro <- zlm(formula = ~condition + (1|sample) + (1|chip) + ngeneson, method='glmer', ebayes=FALSE, sca=sca_astro_filt, parallel=FALSE) However, I am running into an error regarding a fixef() function. Could you decipher the following error message for me? Error in ***@***.***) : could not find function "fixef" — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#97 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABUkeQmB4GTJ4nEpX6cQSeR68YQW_NCjks5u2AfAgaJpZM4VqjPH> .

LuckyMD · 2018-12-06T09:23:31Z

Ah, thanks. I wasn't aware of other dependencies. I assumed I had accidentally renamed a fixed() function and was looking for my error ;). Mea culpa.

amcdavid · 2018-12-06T13:25:22Z

I think we have good reasons to have lme4 be only Suggested. Can we import its methods anyways? Putting a call to if(require(lme4)) seems like bad behavior since it modifies the search path (and I think is worthy of a warning from R CMD check

fbrundu · 2020-07-31T19:28:39Z

Hi all, I post here since the topic is the same and this issue is still open, but I can open a new issue if necessary.
I would like to compare two conditions with nested batches, but with a simpler design matrix (I have no chip covariate).
The composition of each condition is similar to:

condition  batch     
Case       Sample1   
Case       Sample2
Case       Sample3
Case       Sample4
Control    Sample5
Control    Sample6
Control    Sample7
Control    Sample8

In such case, with no chip covariate, does it make sense to also model the batch or would it lead to losing the condition signal, i.e.:

~ condition + ngeneson + batch

?
At the moment I'm using MAST without the batch covariate

~ condition + ngeneson

since I thought introducing the batch covariate would lead to the wrong results in this case. Is this justified, or I should be able to use it instead?

Thanks

gfinak · 2020-07-31T23:42:32Z

I assume you have multiple cells per sample? If so I think you could include a random batch effect.

fbrundu · 2020-08-03T01:57:39Z

Thanks @gfinak, yes I have multiple cells per sample. I'll try with the batch covariate.
I would like to learn more about why the batch covariate doesn't remove the condition. In a previous message, you indicate that "Sample and condition are not entirely aliased (you have more samples than conditions)." Do you have at hand some documentation I can read to understand better how the MAST's model works in this case? Thanks!

fbrundu · 2020-08-10T19:23:15Z

I tested using the batch covariate. I noticed that, although some genes related to the condition (positive controls) are recovered, most of the others are not. And the (log-fold-change, FDR) of the recovered genes is smaller (LFC) and less significant (FDR) with respect to the analysis without the batch covariate.

Is it correct to state that in this case, due to conditions partially confounded with batch (4 unique batches per condition), the statistical power of the test is reduced when adding the batch covariate? Keeping however in mind that adding the batch covariate is the more appropriate approach... Thanks.

noobCoding · 2021-03-18T06:44:59Z

Hi @fbrundu, I had the same problem when using MAST with the batch covariate. Has the problem with batch covariate been solved, somehow?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nested batch effects in MAST #97

Nested batch effects in MAST #97

LuckyMD commented Aug 1, 2018 •

edited

Loading

gfinak commented Aug 1, 2018

LuckyMD commented Aug 1, 2018

gfinak commented Aug 1, 2018

LuckyMD commented Aug 1, 2018

gfinak commented Aug 1, 2018

LuckyMD commented Dec 5, 2018

gfinak commented Dec 6, 2018 via email

LuckyMD commented Dec 6, 2018

amcdavid commented Dec 6, 2018

fbrundu commented Jul 31, 2020

gfinak commented Jul 31, 2020

fbrundu commented Aug 3, 2020

fbrundu commented Aug 10, 2020

noobCoding commented Mar 18, 2021

Nested batch effects in MAST #97

Nested batch effects in MAST #97

Comments

LuckyMD commented Aug 1, 2018 • edited Loading

gfinak commented Aug 1, 2018

LuckyMD commented Aug 1, 2018

gfinak commented Aug 1, 2018

LuckyMD commented Aug 1, 2018

gfinak commented Aug 1, 2018

LuckyMD commented Dec 5, 2018

gfinak commented Dec 6, 2018 via email

LuckyMD commented Dec 6, 2018

amcdavid commented Dec 6, 2018

fbrundu commented Jul 31, 2020

gfinak commented Jul 31, 2020

fbrundu commented Aug 3, 2020

fbrundu commented Aug 10, 2020

noobCoding commented Mar 18, 2021

LuckyMD commented Aug 1, 2018 •

edited

Loading