Induced subtree call fails on pruned taxa #84

snacktavish · 2019-11-07T17:39:18Z

E.g.

 ~$ curl -X POST https://api.opentreeoflife.org/v3/tree_of_life/induced_subtree -H "content-type:application/json" -d '{"ott_ids":[292466, 267845, 3581696, 792843]}'
{
    "message": "[/v3/tree_of_life/induced_subtree] Error: node_id 'ott792843' was not found!",
    "unknown": {
        "ott3581696": "pruned_ott_id",
        "ott792843": "pruned_ott_id"
    }
}

This is a bit of a pain when getting trees for large sets of taxa (e.g. all famililes), as you need to parse the error message and rerun the call.
I think it would be nice to return the pruned subtree for the taxa that are in the tree, as well as the pruned taxa message. (cc @LunaSare )

The text was updated successfully, but these errors were encountered:

bredelings · 2020-11-04T21:14:04Z

Looking at this again.

Have you all (@snacktavish and @LunaSare) implemented the two-step lookup -- i.e. if it fails, remove unknown ids and try again?

bredelings · 2020-11-04T21:33:45Z

I am going to streamline and condense the error-handling logic, which is messy and implemented in multiple places.

This might make it possible to provide a list of reasons for name-lookup failure like ["pruned","unvalid"], so that we could ignore nodes where failure occurs for the listed reasons, and get the induced subtree of the other nodes.

I'm not sure if "broken" fits in this categorization, but it seems like this issue is only about pruning taxa with reason "pruned_ott_id".

bredelings · 2020-11-04T21:35:21Z

I am not sure we want to change the output to make reasons clearer, since people could be depending on the current reasons, which consist of only "pruned_ott_id" and "invalid_ott_id".

snacktavish · 2020-11-04T23:30:47Z

https://github.com/OpenTreeOfLife/python-opentree/blob/ejm-test-cov/opentree/ot_object.py#L367

this is the logic in python-opentree for catching and continuing

bredelings · 2020-11-11T17:12:00Z

Our current failure logic for looking up names in a set of names looks as follows. It is implemented in find_nodes_for_id_strings, which is a wrapper around find_node_by_id_string.

weird name -> FAIL "unknown_id"
mcraottXottY name -> any problem -> FAIL "unknown_id"
ottX name:
- number X too big? -> FAIL "unknown_id"
- number X not in ott? -> FAIL "invalid_ott_id"
- taxon X pruned? -> FAIL "pruned_ott_id"
- taxon X broken AND fail_broken=true -> FAIL "broken"
- taxon X broken and fail_broken = false -> SUCCESS!
- taxon X found -> SUCCESS!

I have refactored this on branch refactor_finding_ids. (This uses ADTs = algebraic data types, which are std::variant in C++).

Some questions include:

"unknown_id" is not a good response for errors In names like mrcaottXottY where (for example) taxon X is pruned. We now compute the lookup status of X and Y separately. Should we return something more informative?
We could probably filter some of the FAIL cases. I think this basically means that we add another outcome FILTER "reason". Probably this means that we pass back an extra dict like "filtered": {"name1":"reason1","name2":"reason2"} or something like that.

Comments? @mtholder @snacktavish @kcranston @jimallman

bredelings · 2020-11-11T17:54:45Z

snacktavish · 2020-11-11T18:43:10Z

Oooh very interesting! Lets discuss on the call tomorrow.

bredelings · 2020-11-12T21:30:07Z

Based on discussion with Emily, Mark, and Karen, current decision is that we will leave the filtering to be implemented by the caller in (say) python. Doing this inside the call makes this "too smooth", giving the impression that you can ignore name-lookup issues. The user might then realize that names are being dropped only when by chance all names are dropped.

While we could try and improve errors for looking up mrcaottXottY names, almost all of the issues come from looking up ottX names.

We might want a better way of determining the status of a collection of ids than calling induced_subtree and throwing away the tree. There is the secret nodes_info (vs node_info) call. Mark told Emily about it, so the secret's out.

All the "pruned_ott_id" nodes should be in the taxonomy.

Users might possibly want a bit more info: what flags resulted in pruning this node from synth? was this invalid ott id ever valid?

Mark pointed out that we also modify OTT to handle extinct things better, so whether a node is in "OTT" is a bit tricky. (We flag some things as extinct after OTT creation. Also there is the extinct taxon "bumping" procedure.)

snacktavish changed the title ~~Induced subtree call fails on prunec taxa~~ Induced subtree call fails on pruned taxa Nov 7, 2019

bredelings mentioned this issue Dec 2, 2020

Refactor finding ids #105

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Induced subtree call fails on pruned taxa #84

Induced subtree call fails on pruned taxa #84

snacktavish commented Nov 7, 2019 •

edited

Loading

bredelings commented Nov 4, 2020

bredelings commented Nov 4, 2020 •

edited

Loading

bredelings commented Nov 4, 2020

snacktavish commented Nov 4, 2020

bredelings commented Nov 11, 2020 •

edited

Loading

bredelings commented Nov 11, 2020

snacktavish commented Nov 11, 2020

bredelings commented Nov 12, 2020

Induced subtree call fails on pruned taxa #84

Induced subtree call fails on pruned taxa #84

Comments

snacktavish commented Nov 7, 2019 • edited Loading

bredelings commented Nov 4, 2020

bredelings commented Nov 4, 2020 • edited Loading

bredelings commented Nov 4, 2020

snacktavish commented Nov 4, 2020

bredelings commented Nov 11, 2020 • edited Loading

bredelings commented Nov 11, 2020

snacktavish commented Nov 11, 2020

bredelings commented Nov 12, 2020

snacktavish commented Nov 7, 2019 •

edited

Loading

bredelings commented Nov 4, 2020 •

edited

Loading

bredelings commented Nov 11, 2020 •

edited

Loading