Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How is "correlated disease" used in Monarch? #128

Open
ValWood opened this issue Jun 15, 2024 · 15 comments
Open

How is "correlated disease" used in Monarch? #128

ValWood opened this issue Jun 15, 2024 · 15 comments

Comments

@ValWood
Copy link

ValWood commented Jun 15, 2024

Please describe your question, suggestion, or concern.

How is "correlated disease" used in Monarch?
I can't see where this is defined.

I had assumed it was used when a variant is correlated with a disease, but not known to be causal.
However, I see cases where it is used when the gene is causal (but not always, it's a susceptibility, for example

ATG16L1 | and inflammatory bowel disease 10

I guess my question is "correlated disease"
A) always used for susceptibility (i.e with other environmental conditions), or polygenic contributions.
OR
B) Would it ever be used for disease candidates (genetic correlation, ie. via linkage disequilibrium?)

thanks,

Val

If your question or suggestion is specific to Mondo, please submit it here instead: https://github.com/monarch-initiative/mondo/issues

@ValWood
Copy link
Author

ValWood commented Jun 15, 2024

One reason I ask is because I see susceptibilities that are listed as correlated
(i.e. POLD)
https://monarchinitiative.org/MONDO:0012953

and susceptibilities that are listed as causal i.e POT1
https://monarchinitiative.org/MONDO:0014368

@sagehrke
Copy link
Member

Hi @ValWood - Thanks for submitting this question!

@kevinschaper @cmungall @monicacecilia: Can you help Val out here? Thank you!

@nlharris
Copy link
Member

@kevinschaper @cmungall @monicacecilia I have a blog post ready to go about PomBase using Mondo, but can't post it until someone answers Val's question.

@amc-corey-cox
Copy link

@nlharris I'm working on this right now. I'm just digging in but I'll try to get this for you as soon as I can.

@kevinschaper
Copy link
Member

@amc-corey-cox my memory is that I made a biolink model PR to add causal and correlated gene categories to match the labels shown in the old UI, which is a pretty unsatisfying answer.

@amc-corey-cox
Copy link

Thanks for that information Kevin. I'll look up that PR.

@amc-corey-cox
Copy link

amc-corey-cox commented Jun 24, 2024

Okay, here is the start of an explanation on how we use correlated_disease.

We have biolink:CausalGeneToDiseaseAssociation and biolink:CorrelatedGeneToDiseaseAssociation as subclasses of biolink:GeneToDiseaseAssociation. I believe this means when there is evidence of a direct causal role of the gene, such as Mendelian heritability, for the disease we use the term biolink:CausalGeneToDiseaseAssociation. Any other association that links the gene to causally to a disease, such as polygenic or susceptibility, would be biolink:CorrelatedGeneToDiseaseAssociation.

Other associations that don't necessarily imply any form of causation would be simply biolink:GeneToDiseaseAssociation.

This is my current hypothesis of the explanation. I want to see if I can find these in the actual ingests to see what evidence we're using to create these edges in order to validate the above.

The biolink model also has these descriptions:
biolink:GeneToDiseaseAssociation: gene in which variation is correlated with the disease, may be protective or causative or associative, or as a model
biolink:CausalGeneToDiseaseAssociation: gene in which variation is shown to cause the disease.
biolink:CorrelatedGeneToDiseaseAssociation: gene in which variation is shown to correlate with the disease.

Does this seem reasonable or is there something obviously wrong that I've done?

@amc-corey-cox
Copy link

Okay, I think I have validation of the above. Here we discuss the terms Correlated or Causal gene to disease association.
https://monarch-initiative.github.io/monarch-ingest/Sources/hpoa/#gene-to-disease

The associations are derived from these fields:
MENDELIAN: biolink:causes
POLYGENIC: biolink:contributes_to
UNKNOWN: biolink:gene_associated_with_condition

This appears to mesh with my statements above. So, final answer to this question. We intend for 'correlated disease' to be used when a gene to disease association indicates some contribution to causing the disease condition but not including strict Mendelian association, for which we use the term 'causal'. It is possible that we've made a mistake in how these are derived and if so please bring this to our attention. However, I believe this should be correct based on what we are seeing with ATG16L1.
Further in answer to the question of "genetic correlation, ie. via linkage disequilibrium", I believe we intend to use biolink:GeneToDiseaseAssociation for these broader correlations. Again, please let us know if this appears to be inconsistent.

@ValWood
Copy link
Author

ValWood commented Jun 25, 2024

This makes sense, so contributes_to should be polygenic (except I think many causal genes are classed as correlated.
I can provide a partial list).

The POLD1 problem above would be resolved by adding the terms for the germ-line mutation diseases
monarch-initiative/mondo#7845
(as the current term does not differentiate between germ-line and sporadic)

There are quite a lot of inconsistencies. For example
colorectal cancer, susceptibility to, 12 (MONDO:0014038)
is_a
hereditary neoplastic syndrome
but this has contributes_to
however this is a single gene inherited disorder

Some of the issues are probably caused by conflating a heritable causal gene which increases susceptibility
with a susceptibility that is presumed to increase incrementally by variants in multiple genes.

==

It also seems strange for correlated genes to have definitions of the form:
Any type 2 diabetes mellitus in which the cause of the disease is a mutation in the TBC1D4 gene.
because for polygenic disorders, the gene isn't causal?

@ValWood
Copy link
Author

ValWood commented Jun 25, 2024

It would also be useful to have precise definitions on the Monarch website so that we could link to them.
tks
v

@ValWood
Copy link
Author

ValWood commented Jun 25, 2024

I guess for this it is OK
colorectal cancer, susceptibility to, 12 (MONDO:0014038)
because for any cancer subsequent changes are required....

@amc-corey-cox
Copy link

This is great feedback @ValWood. Unfortunately, if the data we're ingesting has these marked inconsistently we will as well. However, we should also make sure we're ingesting them correctly. I'll discuss with my team how we should move forward with this.

@ValWood
Copy link
Author

ValWood commented Jun 25, 2024

It is probably not a huge issue but it would be useful to be precise about the meaning of the qualifiers. I still don't fully understand.

My main issue is describing genes "contributes_to" flagged as contributes to as "causal" for a disease in the ontology definitions. That seems to be misleading. And seems to be a Mondo issue rather than an ingest issue.

I was chatting to PomBase team about this in our group meeting, and we wondered why you need a qualifier AND "susceptibility to" in the term label. We wondered why the information could not be captured in the ontology rather than with a qualifier (because people frequently ignore qualifiers)

@monicacecilia
Copy link
Contributor

@sabrinatoro 👀 👆

@sabrinatoro
Copy link

I think the main problem here is with the "susceptibility" terms.
These "susceptibility" terms come from OMIM, and are therefore added into Mondo. However, the data we get from the different sources more often relate to a disease and not necessarily to a "disease susceptibility"

It is therefore correct that we have different ways to represent "susceptibility" concepts in Mondo/Monarch and their causal/correlated gene:

  1. "susceptibility to disease X" (in Mondo) - caused by a variation in gene X
  2. "disease X" - correlated with gene X (because a variation in gene X confers a susceptibility to getting the disease.)

We need to review the representation of disease susceptibility in both Mondo and Monarch. (@monicacecilia I don't know where this falls on the priority list for both these projects. Let's discuss)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants