-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Labels
Description
I noticed this when parsing a list of insect taxa from genbank with available genomic information. When I attempted to make a TaxDict object with the list of taxa, it failed.
host = ['Unclassified Trichoceridae']
resolved_host = Resolver(terms=host)
resolved_host.main()
taxonomy = ['subspecies', 'species', 'genus',
'family', 'order', 'class', 'phylum', 'kingdom']
idents = resolved_host.retrieve('query_name')
lineages = resolved_host.retrieve('classification_path')
ranks = resolved_host.retrieve('classification_path_ranks')
print([(ranks[0][x],lineages[0][x]) for x in range(len(ranks[0]))])[('superkingdom', 'Eukaryota'), ('', 'Opisthokonta'), ('kingdom', 'Metazoa'), ('', 'Eumetazoa'), ('', 'Bilateria'), ('', 'Protostomia'), ('', 'Ecdysozoa'), ('', 'Panarthropoda'), ('phylum', 'Arthropoda'), ('', 'Mandibulata'), ('', 'Pancrustacea'), ('superclass', 'Hexapoda'), ('class', 'Insecta'), ('', 'Dicondylia'), ('', 'Pterygota'), ('subclass', 'Neoptera'), ('infraclass', 'Endopterygota'), ('order', 'Diptera'), ('suborder', 'Nematocera'), ('infraorder', 'Psychodomorpha'), ('superfamily', 'Trichoceroidea'), ('family', 'Trichoceridae'), ('', 'Unclassified')]
In this case, the terminal lineage entry is 'Unclassified' with no assigned rank, causing _getLevel to fail on initialization of the TaxRef object:
taxdict = TaxDict(idents=idents, ranks=ranks, lineages=lineages,
taxonomy=taxonomy)---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-0af07c3e75e4> in <module>()
1 taxdict = TaxDict(idents=idents, ranks=ranks, lineages=lineages,
----> 2 taxonomy=taxonomy)
/Users/jonsanders/Development/git_sw/TaxonNamesResolver/taxon_names_resolver/manip_tools.py in __init__(self, idents, ranks, lineages, taxonomy, **kwargs)
115 # create taxref
116 taxref = TaxRef(ident=idents[i], rank=ranks[i][-1],
--> 117 taxonomy=self.taxonomy)
118 # create key for ident and insert a dictionary of:
119 # lineage, taxref, cident, ident and rank
/Users/jonsanders/Development/git_sw/TaxonNamesResolver/taxon_names_resolver/manip_tools.py in __init__(self, ident, rank, taxonomy)
34 except ValueError as e:
35 print('Error in taxon ident: {}'.format(ident))
---> 36 raise e
37 super(TaxRef, self).__setattr__('counter', 0) # count ident changes
38
/Users/jonsanders/Development/git_sw/TaxonNamesResolver/taxon_names_resolver/manip_tools.py in __init__(self, ident, rank, taxonomy)
31 try:
32 super(TaxRef, self).__setattr__('level',
---> 33 self._getLevel(rank, taxonomy))
34 except ValueError as e:
35 print('Error in taxon ident: {}'.format(ident))
/Users/jonsanders/Development/git_sw/TaxonNamesResolver/taxon_names_resolver/manip_tools.py in _getLevel(self, rank, taxonomy)
54 return taxonomy.index(rank)
55 # else find its closest by using the default taxonomy
---> 56 dlevel = default_taxonomy.index(rank)
57 i = 1
58 d = dlevel + i
ValueError: '' is not in list
Not sure what the best way to resolve this should be.
-
Could add a catch in
_getLevelto make sure the query rank is present in the default taxonomy before it tries to index, otherwise return 'Unknown' or similar. -
Rather than simply passing the terminal rank to the
TaxRefconstructor, look for the most terminal labeled rank that is present in either the provided or default taxonomy.
Any thoughts?
Reactions are currently unavailable