Use simpler prior by default

cumc · Feb 14, 2024 · 36688ea · 36688ea
1 parent 612111f
commit 36688ea
Show file tree

Hide file tree

Showing 3 changed files with 4 additions and 8 deletions.
diff --git a/code/cis_analysis/cis_workhorse.ipynb b/code/cis_analysis/cis_workhorse.ipynb
@@ -1014,7 +1014,7 @@
  "parameter: pip_cutoff = 0.05\n",
  "parameter: coverage = [0.95, 0.7, 0.5]\n",
  "# prior can be either of [\"mixture_normal\", \"mixture_normal_per_scale\"]\n",
- "parameter: prior = \"mixture_normal_per_scale\"\n",
+ "parameter: prior = \"mixture_normal\"\n",
  "parameter: max_SNP_EM = 100\n",
  "# Max scale is such that 2^max_scale being the number of phenotypes in the transformed space. Default to 2^10 = 1024. Don't change it unless you know what you are doing. Max_scale should be at least larger than 5.\n",
  "parameter: max_scale = 10\n",

diff --git a/pipeline/command_spliter.ipynb b/pipeline/command_spliter.ipynb
diff --git a/website/nature_protocol/output_markdown.md b/website/nature_protocol/output_markdown.md
@@ -99,8 +99,6 @@ Quality control and normalization are performed on output from the leafcutter an
 We use a gene coordinate annotation pipeline based on [`pyqtl`, as demonstrated here](https://github.com/broadinstitute/gtex-pipeline/blob/master/qtl/src/eqtl_prepare_expression.py). This adds genomic coordinate annotations to gene-level molecular phenotype files generated in `gct` format and converts them to `bed` format for downstreams analysis.
 
 
-A collection of methods for the imputation of missing omics data values are included in our pipelinle. Imputation is optional of eQTL analysis, but necessary for other QTLs. We use `flashier`, a Empirical Bayes Matrix Factorization model, to impute missing values. Other imputation methods include missForest, XGBoost, k-nearest neighbors, soft impute, mean imputation, and last observed data.
-
 We include a collection of workflows to format molecular phenotype data. These include workflows to separate phenotypes by chromosome, by user-provided regions, a workflow to subset bam files and a workflow to extract samples from phenotype files.
 
 ##### B. Covariate Data Preprocessing
@@ -306,12 +304,11 @@ Timing <1 min
 ```
 
 
-##### ii. Phenotype Imputation
-Timing X min
 
 ```
-sos run xqtl-pipeline/pipeline/phenotype_imputation.ipynb flash \
- --container oras://ghcr.io/cumc/omics_imputation_apptainer:latest
+sos run phenotype_imputation.ipynb EBMF \
+ --container .containers/factor_analysis.sif \
+
 ```