Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Induced subtree call fails on pruned taxa #84

Open
snacktavish opened this issue Nov 7, 2019 · 8 comments
Open

Induced subtree call fails on pruned taxa #84

snacktavish opened this issue Nov 7, 2019 · 8 comments

Comments

@snacktavish
Copy link
Member

snacktavish commented Nov 7, 2019

E.g.

 ~$ curl -X POST https://api.opentreeoflife.org/v3/tree_of_life/induced_subtree -H "content-type:application/json" -d '{"ott_ids":[292466, 267845, 3581696, 792843]}'
{
    "message": "[/v3/tree_of_life/induced_subtree] Error: node_id 'ott792843' was not found!",
    "unknown": {
        "ott3581696": "pruned_ott_id",
        "ott792843": "pruned_ott_id"
    }
}

This is a bit of a pain when getting trees for large sets of taxa (e.g. all famililes), as you need to parse the error message and rerun the call.
I think it would be nice to return the pruned subtree for the taxa that are in the tree, as well as the pruned taxa message. (cc @LunaSare )

@snacktavish snacktavish changed the title Induced subtree call fails on prunec taxa Induced subtree call fails on pruned taxa Nov 7, 2019
@bredelings
Copy link
Contributor

Looking at this again.

Have you all (@snacktavish and @LunaSare) implemented the two-step lookup -- i.e. if it fails, remove unknown ids and try again?

@bredelings
Copy link
Contributor

bredelings commented Nov 4, 2020

I am going to streamline and condense the error-handling logic, which is messy and implemented in multiple places.

This might make it possible to provide a list of reasons for name-lookup failure like ["pruned","unvalid"], so that we could ignore nodes where failure occurs for the listed reasons, and get the induced subtree of the other nodes.

I'm not sure if "broken" fits in this categorization, but it seems like this issue is only about pruning taxa with reason "pruned_ott_id".

@bredelings
Copy link
Contributor

I am not sure we want to change the output to make reasons clearer, since people could be depending on the current reasons, which consist of only "pruned_ott_id" and "invalid_ott_id".

@snacktavish
Copy link
Member Author

https://github.com/OpenTreeOfLife/python-opentree/blob/ejm-test-cov/opentree/ot_object.py#L367

this is the logic in python-opentree for catching and continuing

@bredelings
Copy link
Contributor

bredelings commented Nov 11, 2020

Our current failure logic for looking up names in a set of names looks as follows. It is implemented in find_nodes_for_id_strings, which is a wrapper around find_node_by_id_string.

  • weird name -> FAIL "unknown_id"
  • mcraottXottY name -> any problem -> FAIL "unknown_id"
  • ottX name:
    • number X too big? -> FAIL "unknown_id"
    • number X not in ott? -> FAIL "invalid_ott_id"
    • taxon X pruned? -> FAIL "pruned_ott_id"
    • taxon X broken AND fail_broken=true -> FAIL "broken"
    • taxon X broken and fail_broken = false -> SUCCESS!
    • taxon X found -> SUCCESS!

I have refactored this on branch refactor_finding_ids. (This uses ADTs = algebraic data types, which are std::variant in C++).

Some questions include:

  1. "unknown_id" is not a good response for errors In names like mrcaottXottY where (for example) taxon X is pruned. We now compute the lookup status of X and Y separately. Should we return something more informative?
  2. We could probably filter some of the FAIL cases. I think this basically means that we add another outcome FILTER "reason". Probably this means that we pass back an extra dict like "filtered": {"name1":"reason1","name2":"reason2"} or something like that.

Comments? @mtholder @snacktavish @kcranston @jimallman

@bredelings
Copy link
Contributor

diagram

@snacktavish
Copy link
Member Author

Oooh very interesting! Lets discuss on the call tomorrow.

@bredelings
Copy link
Contributor

Based on discussion with Emily, Mark, and Karen, current decision is that we will leave the filtering to be implemented by the caller in (say) python. Doing this inside the call makes this "too smooth", giving the impression that you can ignore name-lookup issues. The user might then realize that names are being dropped only when by chance all names are dropped.

While we could try and improve errors for looking up mrcaottXottY names, almost all of the issues come from looking up ottX names.

We might want a better way of determining the status of a collection of ids than calling induced_subtree and throwing away the tree. There is the secret nodes_info (vs node_info) call. Mark told Emily about it, so the secret's out.

All the "pruned_ott_id" nodes should be in the taxonomy.

Users might possibly want a bit more info: what flags resulted in pruning this node from synth? was this invalid ott id ever valid?

Mark pointed out that we also modify OTT to handle extinct things better, so whether a node is in "OTT" is a bit tricky. (We flag some things as extinct after OTT creation. Also there is the extinct taxon "bumping" procedure.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants