Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

review old ontology IDs that do not resolve to InterLex #223

Open
tgbugs opened this issue Sep 11, 2020 · 17 comments
Open

review old ontology IDs that do not resolve to InterLex #223

tgbugs opened this issue Sep 11, 2020 · 17 comments
Assignees

Comments

@tgbugs
Copy link
Contributor

tgbugs commented Sep 11, 2020

There are a number of identifiers in
https://github.com/SciCrunch/NIF-Ontology/blob/dev/ttl/generated/NIF-NIFSTD-mapping.ttl
that have no corresponding entry in
https://github.com/SciCrunch/NIF-Ontology/blob/dev/ttl/generated/NIFSTD-ILX-mapping.ttl
nor in
https://github.com/SciCrunch/NIF-Ontology/blob/dev/ttl/generated/NIFSTD-SCR-mapping.ttl.

@tmsincomb We need to review these and make sure that they resolve. IIRC there is already code that does this or could do this with little additional work in https://github.com/tgbugs/pyontutils/blob/master/nifstd/nifstd_tools/mapnlxilx.py.

@tmsincomb
Copy link
Contributor

@tgbugs
Looking into the 9807 entities in NIF-NIFSTD-mapping VS. NIFSTD-ILX-mapping I found the following:

Currently In InterLex: 4938
Deprecated: 482
New; to be added to InterLex: 3949
SciGraph does not resolve: 208
SciGraph resolves with no label to use: 230

I will add the 3949 entities to InterLex to bridge the gap in the lack of resolving entities from NIF-NIFSTD to NIFSTD-ILX. After that is complete I will move to NIFSTD-SCR

Side Note :: I noticed the "NIF" IDs are the "NIFSTD" IDs for ILX and SCR. Should it be NIFSTD-NIF-mapping instead?

@tgbugs
Copy link
Contributor Author

tgbugs commented Sep 21, 2020

@tmsincomb can you get me the list of the 3949 for review before you add to interlex?

The NIF-NIFSTD mapping is fixed, those are the ids that only appear in the ontology, and there shouldn't be anything missing from those.

There are a number of NIFSTD ids that were never in the NIF namespace at all, and we aren't going to put them there since the NIF form of the ids has never existed nor be promulgated anywhere. I think they go in NLX-ILX mapping if they go anywhere (I don't think that that mapping file has ever been committed to git).

@jgrethe
Copy link
Contributor

jgrethe commented Sep 21, 2020

The SCR (Registry entries) should not be in InterLex as they are in the registry (I assume that is what you mean by SCR). We would need to figure out how to handle these - e.g. if someone puts in an SCR id we could divert to a special redirection page to explain and then send them to the resolver.

@tgbugs
Copy link
Contributor Author

tgbugs commented Sep 21, 2020

@jgrethe They aren't. Any mapping in the NIFSTD-SCR should not/is not in InterLex. We had discussed previously the desire to be able to have the SRC results show up in InterLex search results and take people to their registry page so that they don't get added to InterLex.

@jgrethe
Copy link
Contributor

jgrethe commented Sep 21, 2020

After the loading of the 3949 - we still need to see what we are missing in regards to diseases (DO/MONDO/...), Chemical (Chebi), Organisms (NCBItaxonomy). This is noticeable with the term matching being done via Foundry.

@tgbugs
Copy link
Contributor Author

tgbugs commented Sep 21, 2020

For now we should be able to use the code for loading taxon and chebi into the ontology to load into InterLex

https://github.com/tgbugs/pyontutils/blob/master/nifstd/nifstd_tools/slimgen.py
https://github.com/tgbugs/pyontutils/blob/master/nifstd/nifstd_tools/chebi_slim.py

Obviously in the future we will want to be able to use the mechanism that allows us to load and update from the ontology files directly, but that is farther out.

@jgrethe
Copy link
Contributor

jgrethe commented Sep 21, 2020

@tgbugs Re: SCR ids - OK that is what I thought. Just wanted to confirm. Added a ticket to SciCrunch-UI for this.

@tmsincomb
Copy link
Contributor

@jgrethe
As a status update:
Mondo is ingested now, NCBITaxonomy has a partial ingest (need to verify), and CHEBI has not been ingested

@tgbugs
I'm sending the entities to you now.

@jgrethe
Copy link
Contributor

jgrethe commented Sep 21, 2020

The potential issue for taxon and Chebi was that there were a bunch of entries missing as the ontology didn't incorproate all of NCBItaxon or Chebi - however, the term mapping is finding terms that are from these non-included areas.

@jgrethe
Copy link
Contributor

jgrethe commented Sep 21, 2020

@tgbugs Within InterLex should we set default ID to Mondo now for disease?

@tgbugs
Copy link
Contributor Author

tgbugs commented Sep 21, 2020

Yes, we should flip disease over to the MONDO ids now.

@tgbugs
Copy link
Contributor Author

tgbugs commented Sep 21, 2020

@tmsincomb can you cross reference what you are seeing now against the lists that we came up with in #124?

@tgbugs
Copy link
Contributor Author

tgbugs commented Sep 21, 2020

@tmsincomb

In general I think that you need to run a bit deeper term matching on these. I have found existing terms that are in InterLex already that correspond where they were not pulled originally from the ontology when NeuroLex was loaded (re: #124 again). This time we have to deal with them.

From the list of 3949 one issue is that there are 176 that are institutions. Those should not be loaded into InterLex and in theory should already be in SCR. You can filter them via the comment field. I'm guessing you just aren't including the subClassOf section for these terms in the report you sent, and that you would include them in the load into InterLex?

Also, there are duplicates that we need to check over. For example cell types matched via [o for o in j if o['comment'] == 'cell'] we need to manually review of existing terms. Fortunately there are only 154 of them.
https://scicrunch.org/scicrunch/interlex/view/ilx_0110028?searchTerm=rod%20cell
and

    {
        "label": "Retina rod",
        "type": "term",
        "synonyms": [
            "Retinal Rod Cell",
            "rod cell",
            "Retina rod"
        ],
        "definition": "One of the two photoreceptor cell types of the vertebrate retina. In rods the photopigment is in stacks of membranous disks separate from the outer cell membrane. Rods are more sensitive to light than cones, but rod mediated vision has less spatial and temporal resolution than cone vision.",
        "comment": "cell",
        "existing_ids": [
            {
                "iri": "http://uri.neuinfo.org/nif/nifstd/sao1458938856",
                "curie": "SAO:1458938856"
            }
        ]
    },

There are some others that seem to be coming from NIF-Organism which is a mess #70, but I think it is ok to pull those in to InterLex and we will just deal with the clean up later.

@jgrethe
Copy link
Contributor

jgrethe commented Sep 21, 2020

For NIF-Organism - shouldn't these be in NCBI taxonomy (as #70 mentions many should be deprecated for NCBI taxonomy)? Perhaps we could do the full NCBI taxon load and then match to NIF-Organism for addition of information (ids, annotations,e tc.)?

@memartone
Copy link

memartone commented Sep 21, 2020 via email

@tgbugs
Copy link
Contributor Author

tgbugs commented Sep 21, 2020

They should be deprecated and replaced by NCBITaxon, but we need to make sure that the old ids resolve so that people can find the new ones.

@jgrethe
Copy link
Contributor

jgrethe commented Sep 21, 2020

So then perhaps NCBI taxon import followed by reconciliation with NIF-Organisms ids (and any other associated ids, dbxrefs, etc.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants