Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different result from available results #546

Open
rfeng2023 opened this issue Mar 10, 2023 · 2 comments
Open

Different result from available results #546

rfeng2023 opened this issue Mar 10, 2023 · 2 comments

Comments

@rfeng2023
Copy link
Contributor

I am reproducing the MASH pipeline, and I found the result of mashr is different from the available results (produced by Hao),
the code in this step is

library(mashr)
dat = readRDS('/mnt/vast/hpc/csg/rf2872/Work/MASH_csg.q/output/Ast_Exc_Inh_Mic_OPC_Oli.rds')
set.seed(1)
random.subset = sample(1:nrow(dat$random.b), min(6000, nrow(dat$random.b)))
random.subset = mash_set_data(dat$random.b[random.subset,], dat$random.s[random.subset,], alpha=1, zero_Bhat_Shat_reset = 1E3)
vhatprior = mash_estimate_corr_em(random.subset, readRDS('/mnt/vast/hpc/csg/rf2872/Work/MASH_csg.q/MASH_6_celltypes2/Ast_Exc_Inh_Mic_OPC_Oli.EZ.prior.rds'), max_iter = 6)
vhat = vhatprior$V
saveRDS(vhat, '/mnt/vast/hpc/csg/rf2872/Work/MASH_csg.q/MASH_6_celltypes/Ast_Exc_Inh_Mic_OPC_Oli.EZ.V_mle.rds')

Here is the result of my job,
image

and below is the result of Hao's job
image

The random subset may cause a tiny difference from others. But there are some of the values reversed. Is that normal?

@gaow
Copy link
Contributor

gaow commented Mar 14, 2023

@rfeng2023 The result should be identical if you set seed ... did you use the same container? It could make a diffeerence if major R version changes also changed behavior of seed such that set.seed(1) is not the same between different versions R or related libraries?

To solve the problem: since you are worried of the small differences particularly the sign differences, my suggestion is to rerun but using a much larger random.b sample and see how it works ? for example, take 4 random SNPs per gene so you get many such SNPs.

@gaow
Copy link
Contributor

gaow commented Mar 14, 2023

To solve the problem: since you are worried of the small differences particularly the sign differences, my suggestion is to rerun but using a much larger random.b sample and see how it works ? for example, take 4 random SNPs per gene so you get many such SNPs.

We can hold on to this. Because picking these SNPs should be built int othe updated pipeline with fine-mapping CS in mind. We can revisit at that point. But it is good for you (and important) that you keep documenting these problems. @rfeng2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants