-
-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ete4 #750 #751
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dengzq1234
Before I merge this PR, I'd like to understand the couple of things I mentioned before, and also we should understand why after this change the ncbiquery test fails (which seems to be from your change in line 135 of tests/test_ncbiquery.py
, that I imagine you had a reason to do):
$ pytest tests/test_ncbiquery.py
[...]
tests/test_ncbiquery.py ......F.
[...]
____________________________________________ Test_ncbiquery.test_merged_id _____________________________________________
self = <tests.test_ncbiquery.Test_ncbiquery testMethod=test_merged_id>
def test_merged_id(self):
ncbi = NCBITaxa(dbfile=DATABASE_PATH)
t1 = ncbi.get_lineage(649756)
> self.assertEqual(t1, [1, 131567, 2, 1783272, 1239, 186801, 3085636, 186803, 207244, 649756])
E AssertionError: Lists differ: [1, 131567, 2, 1783272, 1239, 186801, 186802, 186803, 207244, 649756] != [1, 131567, 2, 1783272, 1239, 186801, 3085636, 186803, 207244, 649756]
E
E First differing element 6:
E 186802
E 3085636
E
E - [1, 131567, 2, 1783272, 1239, 186801, 186802, 186803, 207244, 649756]
E ? ^ ^^^
E
E + [1, 131567, 2, 1783272, 1239, 186801, 3085636, 186803, 207244, 649756]
E ? ^^ + ^^
test_ncbiquery.py:135: AssertionError
def get_rank(self, taxids): | ||
def _dirty_id_suffix(self, taxid): | ||
pass | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this function used. Why is it defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a function I want to do to handle GTDB accesssion,
because I realized sometime they use ids in both way, for example:
GCA_000003645.1 GB_GCA_000003645.1
GCF_900113245.1 RS_GCF_900113245.1
they are equivalent so I want to have a function to handle the fuzzy. But I will keep it in my dev repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks!
ete4/gtdb_taxonomy/gtdbquery.py
Outdated
def _get_rank(self, taxids): | ||
"""Return dictionary converting taxids to their GTDB taxonomy rank.""" | ||
ids = ','.join('"%s"' % v for v in set(taxids) - {None, ''}) | ||
result = self.db.execute('SELECT taxid, rank FROM species WHERE taxid IN (%s)' % ids) | ||
return {tax: spname for tax, spname in result.fetchall()} | ||
|
||
def get_rank(self, taxids): | ||
taxid2rank = {} | ||
name2ids = self._get_name_translator(taxids) | ||
overlap_ids = name2ids.values() | ||
taxids = [item for sublist in overlap_ids for item in sublist] | ||
"""Return dictionary converting taxids to their GTDB taxonomy rank.""" | ||
ids = ','.join('"%s"' % v for v in set(taxids) - {None, ''}) | ||
result = self.db.execute('SELECT taxid, rank FROM species WHERE taxid IN (%s)' % ids) | ||
for tax, rank in result.fetchall(): | ||
taxid2rank[list(self._get_taxid_translator([tax]).values())[0]] = rank | ||
|
||
return taxid2rank |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this change. Why do we have get_rank()
and _get_rank()
. What's their difference? Also, get_rank()
would need a docstring, and we should avoid having similarly-named functions since that makes it hard to guess which one is appropriate to use.
I changed it because I updated my ncbi taxonomy database, I wonder if it would effect |
@jordibc Already update with the next ncbi unitest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks @dengzq1234 !
def get_rank(self, taxids): | ||
def _dirty_id_suffix(self, taxid): | ||
pass | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks!
this PR is to solve #750
It includes the following features:
GTDBTaxa()
which involves meaningless numeric ids to internal methods, such as:get_lineage_translator()
->_get_lineage_translator()
get_name_translator()
->_get_name_translator()
translate_to_names()
->_translate_to_names()
get_rank()
from numeric id to string id inGTDBTaxa()
, for example:ignore_unclassified
in bothNCBITaxa()
andGTDBTaxa()
, whenignore_unclassified=True
,annotate_tree()
will igore empty annotation of leaves