RareBurdenTest

Rare variant burden test was implemented based on cyvcf2 and Julia GLM. cyvcf2 is a very fast VCF parser, binding on C version of htslib, both memory and CPU efficiency. cyvcf2 was used to compute the rare burden for each individual based on different variant masks, MAFs and weights, the outputs are piped to Julia for fast association test. Julia's GLM is about 300X faster than the R glm version, and the Julia's statistical ecosystem makes it very convenient to implement other type of statistical testing. Therefore, by the piping design structure, it's very easy to implement other type of test by using Julia as plugin. Current version supports epacts's b.burden and b.collapse tests, detail about the testing please check here.

Features

Group rare variants by epacts like grouping file
Support weigh variant by specifying weight in grouping file, OR based on MAF. 1/(N * MAF * (1-MAF))^0.5, Madsen and Browning (2009).
In a single run, generate scores for multiple MAF with no extra cost for parsing VCF again and again.
Support glm test, firth logistic test and SPAtest(Optimized for small number and unbalanced case/control data).

Variant scoring

Please see VCFCountScore4GeneMask/VCFCountScore4GeneMask.py

Association of variant score with phenotype

Please see Julia/RareRegressionAuto.R.SPA.jl

Benchmarks Benchmarked with epacts, it has the same results if we use R version of GLM (MIGEN chr22, LOFTEEHC, 25k cases and 25k controls, MAF=0.01).

The only dot difference is because of the inclusion of one variant, epacts included a variant with REF allele frequency < MAF. I don't think is good as we usually do annotation based on a REF -> ALT mutation, in my version of rare burden test, I only included ALF allele frequency <= MAF.

As the convergency termination condition is different between Julia's GLM with R's GLM, for some cases of very small effective number, and non significant association, the estimation between them are different. Basically, Julia's GLM will have smaller estimation for beta and beta's standard error. But for the true association and converged estimation, both of the estimation are identical.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
EpactsPlugins		EpactsPlugins
Julia		Julia
RareNetworkBurdenRPython		RareNetworkBurdenRPython
RareSPATest		RareSPATest
Test		Test
VCFCountScore4GeneMask		VCFCountScore4GeneMask
python_epacts_compare		python_epacts_compare
src/main/java/rareburden/rareburden		src/main/java/rareburden/rareburden
target		target
README.md		README.md
nbactions.xml		nbactions.xml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RareBurdenTest

About

Releases

Packages

Languages

wavefancy/RareBurdenTest

Folders and files

Latest commit

History

Repository files navigation

RareBurdenTest

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages