Skip to content

monarch-initiative/monarch_pheval

Repository files navigation

monarch-pheval

PhEval Project Configuration-Base

Quick Experiment Description

This experiment was designed with a few PhEval runners, using different VGPAs and versions. Additionally, we used several Phenotypic data in the Exomiser database: 2309 and 2402, we also made some experiments calculating the Semantic Similarity Profile as a Mapping Table.

Important: All files needed for the experiment will be downloaded using the Makefile strategy. Any file links are for inspection purposes and pipeline explanation only. Remember that the make pheval goal executes the entire pipeline without needing additional steps.

  • Exomiser
    • 13.3.0 (2309 default)
    • 13.3.0 (2309 HPxMP Semsim Ingest)
    • 14.0.0 (2402 default)
  • Phen2Gene
    • 1.2.3
  • Gado
    • 1.0.1

Experiment Data Preparation

Exomiser requires a Phenotypical database downloaded from Monarch Data URL. The first two Exomiser experiments were executed using default versions without any changes at all:

  • Exomiser 13.3.0 + Phenotype 2309
  • Exomiser 14.0.0 + Phenotype 2402

We truncated and ingested a new whole set of values into the 2309 Phenotypic database through a Semantic Similarity Table Ingestion.

Semantic Similarity Calculation

To calculate a Semantic Similarity profile to be used as an Exomiser Phenotypic Mapping Table, we used OAK Library

Command Executed for Semantic Similarity calculation:

External required files for this calculation:

runoak -i semsimian:sqlite:data/ontology/phenio-monarch.db\
similarity -p i\
--set1-file data/tmp/hp_terms.txt \
--set2-file data/tmp/mp_terms.txt \
--min-jaccard-similarity 0.4 \
--information-content-file \
data/tmp/phenio_monarch_hp_ic.tsv \
-O csv \
-o profiles/phenio-monarch-hp-mp.0.4.semsimian.tsv

Set file 1 contains all HP terms and set file 2 contains all MP terms.

The original HP_MP_MAPPINGS table has 13,492,225 records. On the other hand, the SEMSIM file that was ingested has 1,522,836. It is important to mention, that the SEMSIM calculation was done using 0.4 --min-jaccard-similarity. HP_HP_MAPPINGS and HP_ZP_MAPPINGS kept its original data.

Information content file passed in the --information-content-file was calculated beforehand also using OAK Library and this is the command used for it:

PS: We've calculated IC Scores for HP and MP separately, and then we merged them into one file.

External required files for this calculation:

runoak -i data/ontology/phenio-monarch.db \
-g data/tmp/gene_phenotype.10090.tsv \
-G hpoa_g2p information-content \
-p i i^MP: \
-o data/tmp/phenio_monarch_mp_ic.tsv

SEMSIM SQL File uploaded to Monarch Cloud

In this experiment, the SEMSIM file used for ingestion (generated using the steps described above) was converted to SQL format and uploaded to Monarch Cloud beforehand. So, unless you want to reproduce SEMSIM steps, it's not required to be executed.

It can be downloaded here: phenio-monarch-hp-mp.0.4.semsimian.sql

This file is described in pheval-config file in preprocessing section

- tool: exomiser
  id: exomiser-phenio-hpmp-ingest-13.3.0
  preprocessing:
    - phenio-monarch-hp-mp.0.4.semsimian.sql

Dependency Graph

graph TD;
    A[monarch-pheval]---->C[pheval.exomiser];
    A[monarch-pheval]---->C1[pheval.phen2gene];
    A[monarch-pheval]---->C2[pheval.gado];
    C--Depends-->P[pheval];
    C1--Depends-->P[pheval];
    C2--Depends-->P[pheval];
    A--Depends-->P[pheval];
    %% S[setup];
    %% A--Depends-->S[setup];
    %% S---->D[Genotype];
    %% S---->E[Phenotype];
    %% S---->R[Runner];
    %% F[Download];
    %% G[Extract];
    %% R---->F
    %% D---->F
    %% E---->F
    %% F---->G
Loading

  • Straight line represents mandatory dependency;

With this new feature, the Makefile can be generated within this repo, and the user can invoke the make pheval goal based on the pheval configuration file.

PhEval Config Structure

The corpora and configuration data were moved from PhEval to this new structure.

📦monarch-pheval
┣ 📂corpora
┃ ┣ 📂lirical
┃ ┃ ┗ 📂default
┃ ┃ ┃ ┣ 📂phenopackets
┃ ┃ ┃ ┗ 📜corpus.yml
┣ 📂resources
┃ ┣ 📜Makefile.j2
┃ ┣ 📜custom.Makefile
┃ ┣ 📜generatemakefile.sh
┃ ┗ 📜pheval-config.yaml

Configuring and Running an Experiment

Installing dependencies

poetry shell
poetry install

Generating Makefile

./resources/generatemakefile.sh

Run the experiment

make pheval

Results

alt text

Acknowledgements

This cookiecutter project was developed from the monarch-project-template template and will be kept up-to-date using cruft.