Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with multiple taxids by taking first one (iss#54) #55

Merged
merged 1 commit into from
Feb 10, 2021

Conversation

nawrockie
Copy link
Contributor

Removes requirement that each accession only have 1 taxid from GenBank, now we take first taxid parsed from GenBank xml; this is github issue #54.

Not sure if you want to merge this now or later (or never) - up to you.

…k, now we take first taxid parsed from GenBank xml; this is github issue #54
Copy link
Member

@kalvari kalvari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to use an umbrella tax id for each accession belonging to a specific genome assembly. This means that for simplicity purposes all accessions in the genome inherit the tax id associated with high level assembly, regardless its actual taxonomic classification.

Two main things to keep an eye on:

  1. Increased redundancy: For new sequences imported from SEED alignments; If the sequence does not belong to a genome which is a member of the current non-redundant genome collection annotated in Rfamseq, overtime this will introduce redundancy and orphan sequences as seen in Rfamseq 12
  2. Increased complexity for sequence export: Sequence export will need to rely on tax ids rather than genome/proteome accessions, possibly leading to redundant hits for different assembly versions of the same genome. A possible workaround is the separation between SEED and FULL hits during export.

Possible features affected:

@kalvari kalvari merged commit a49f2e9 into master Feb 10, 2021
@kalvari kalvari deleted the multiple-tax-iss54 branch February 18, 2021 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants