Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in Nextclade-Pangolin annotation #760

Closed
animesh-workplace opened this issue Mar 28, 2022 · 2 comments
Closed

Error in Nextclade-Pangolin annotation #760

animesh-workplace opened this issue Mar 28, 2022 · 2 comments
Labels
docs Documentation related issues package: nextclade t:ask Type: question, request of information 1 t:talk Type: discussion of the application or the science behind it

Comments

@animesh-workplace
Copy link

hCoV-19/India/KA-CBR-1402CTD094/2022

Issue raised for this particular sequences in pangolin-designation

This is supposed to be a XF lineage (Delta X BA.2 recombination), but nextclade is calling it XE lineage (BA.1 X BA.2). Kindly look into it

image
image

image

@animesh-workplace animesh-workplace added good first issue Good for newcomers help wanted Extra attention is needed needs triage Mark for review and label assignment t:bug Type: bug, error, something isn't working labels Mar 28, 2022
@animesh-workplace
Copy link
Author

image
Just my own analysis to showcase the same.

@corneliusroemer
Copy link
Member

Thanks for raising this.

The software does exactly what it's supposed to. Let me explain.

We find the nearest neighbour in the reference tree for the sample. The pango lineage reported is then that of the nearest neighbour.

If there are lots of reversions and labeled mutations (i.e. P circle is red, as in this case) it means that there's no good fit in the tree. So you can't trust the pango lineage reported - it is likely not correct, as in this case.

There's no pango lineage for this sequence (yet) so there's also no right way to call it.

pangolin is very conservative, they would give this sequences a label of None probably. We just report the likeliest lineage - which can be wrong but may still be informative. In this case, it's correctly classified as a recombinant, though the exact Pango lineage is wrong.

I'll think about whether it might be worth adding parentheses about Pango lineage calls when the results are not trustworthy.

You can read more about caveats here: https://docs.nextstrain.org/projects/nextclade/en/latest/user/algorithm/nextclade-pango.html

It's worth noting that we don't claim to be 100% accurate. 98% accuracy means we expect to be wrong in 1 sequence out of 50. This sequence is very unusual, so it's not surprising Nextclade is wrong.

Also this:

Recombinants: Recombinant Pango lineages are now included in the reference tree. Each recombinant is attached to the root node so as not to spawn false internal nodes in the tree that would attract bad sequences. As long as recombinants do not qualify for a Nextstrain clade, they will receive the place holder clade name recombinant. Pango lineages are provided if present. Beware that new unnamed recombinants with similar donors but slightly different breakpoint will attach to existing recombinants in the reference tree and thus get a wrong Pango lineage. A number of reversions and labeled mutations is a sign that you may have a similar but different recombinant.
https://github.com/nextstrain/nextclade_data/blob/master/CHANGELOG.md#sars-cov-2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation related issues package: nextclade t:ask Type: question, request of information 1 t:talk Type: discussion of the application or the science behind it
Projects
None yet
Development

No branches or pull requests

2 participants