-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing phenotypes for gene AP1G1 #71
Comments
Hi @Alx-Kouris, |
Hi @Alx-Kouris, Unfortunately, I think what you're seeing is just the age of the data on our production site and in our production graph. We're working to rebuild our stack, starting from the graph and moving up through the API and website, so I can see that the data is present, but we've got a few months before we'll be showing it. Pulling the development sqlite database artifact from https://data.monarchinitiative.org/monarch-kg-dev/latest/index.html I can see:
Hopefully that at least looks good. Thank you for pointing out the discrepancy and submitting an issue, and hopefully we'll at least have a beta for the new API & site to look at soon. |
Thank you for the quick response @kevinschaper. However, we (I work with Alex who asked the question here) deal with thousands of genes in a high throughput way. Is there a way for us to download the correct data? We have been downloading from https://data.monarchinitiative.org/latest/tsv/gene_associations/index.html but if those data are not reliable, do we have another option?
|
Hi @chapplec, We aren't producing those nice association subset files from the new pipeline yet, but we do plan to. You can get all associations in tsv format from monarch-kg_edges.tsv within https://data.monarchinitiative.org/monarch-kg-dev/latest/monarch-kg.tar.gz, and then subset on the The new graph is intentionally more limited in gene to disease associations (currently only data from OMIM) and has predicates (in biolink, which is equivalent to relation in the OBAN model) that are more accurate / cautious, in particularly with respect to claims of causation. I know that I prefer to use delimited files for pipelines, but I'm going to go back to the sqlite database again for quick subsetting: Quickly, these are the two predicates we're using. biolink:risk_affected_by is the stronger assertion.
You can get them together with
You likely a way that you'd prefer to subset the tsv files as a part of a pipeline, but just to show it quickly as an sqlite3 one liner, and I'll attach the file:
Finally, we are still in an awkward position between the old system, which is becoming outdated and the new, which is under development and still naturally has bugs to be discovered. Unfortunately, I noticed that within that file I attached, there are 127 rows where there is an HGNC curie in the disease column (subject, for these associations). I created an issue for this bug, and we'll get it fixed ASAP. |
Sorry, @kevinschaper I just saw this! We'll have a look and see if we can work with what you've given us. Thanks for responding! |
I can give a little bit of an update on the odd G2D associations too, we have both MONDO to MONDO associations getting created as gene-to-disease as well as HGNC to HGNC. That investigation is happening in monarch-initiative/monarch-app#721. One thing that you can do with those records is look at the |
Hello, I visited the Monarch website for gene AP1G1 https://monarchinitiative.org/gene/HGNC:555#phenotype and I see only some EFO phenotypes being reported.
I would assume to find HPO phenotypes related, based on this page https://hpo.jax.org/app/browse/gene/164.
Is this expected? Or some kind of bug?
The text was updated successfully, but these errors were encountered: