Skip to content

Commit

Permalink
Use simpler prior by default
Browse files Browse the repository at this point in the history
  • Loading branch information
gaow committed Feb 14, 2024
1 parent 612111f commit 36688ea
Show file tree
Hide file tree
Showing 3 changed files with 4 additions and 8 deletions.
2 changes: 1 addition & 1 deletion code/cis_analysis/cis_workhorse.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1014,7 +1014,7 @@
"parameter: pip_cutoff = 0.05\n",
"parameter: coverage = [0.95, 0.7, 0.5]\n",
"# prior can be either of [\"mixture_normal\", \"mixture_normal_per_scale\"]\n",
"parameter: prior = \"mixture_normal_per_scale\"\n",
"parameter: prior = \"mixture_normal\"\n",
"parameter: max_SNP_EM = 100\n",
"# Max scale is such that 2^max_scale being the number of phenotypes in the transformed space. Default to 2^10 = 1024. Don't change it unless you know what you are doing. Max_scale should be at least larger than 5.\n",
"parameter: max_scale = 10\n",
Expand Down
1 change: 0 additions & 1 deletion pipeline/command_spliter.ipynb

This file was deleted.

9 changes: 3 additions & 6 deletions website/nature_protocol/output_markdown.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,8 +99,6 @@ Quality control and normalization are performed on output from the leafcutter an
We use a gene coordinate annotation pipeline based on [`pyqtl`, as demonstrated here](https://github.com/broadinstitute/gtex-pipeline/blob/master/qtl/src/eqtl_prepare_expression.py). This adds genomic coordinate annotations to gene-level molecular phenotype files generated in `gct` format and converts them to `bed` format for downstreams analysis.


A collection of methods for the imputation of missing omics data values are included in our pipelinle. Imputation is optional of eQTL analysis, but necessary for other QTLs. We use `flashier`, a Empirical Bayes Matrix Factorization model, to impute missing values. Other imputation methods include missForest, XGBoost, k-nearest neighbors, soft impute, mean imputation, and last observed data.

We include a collection of workflows to format molecular phenotype data. These include workflows to separate phenotypes by chromosome, by user-provided regions, a workflow to subset bam files and a workflow to extract samples from phenotype files.

##### B. Covariate Data Preprocessing
Expand Down Expand Up @@ -306,12 +304,11 @@ Timing <1 min
```


##### ii. Phenotype Imputation
Timing X min

```
sos run xqtl-pipeline/pipeline/phenotype_imputation.ipynb flash \
--container oras://ghcr.io/cumc/omics_imputation_apptainer:latest
sos run phenotype_imputation.ipynb EBMF \
--container .containers/factor_analysis.sif \
```


Expand Down

0 comments on commit 36688ea

Please sign in to comment.