Unexpected assigment of (potential) recombinants #54

MarieLataretu · 2024-02-20T11:04:50Z

Hi there,

First, thanks for your work and the latest updates!

We stumbled across a few samples from the last months that pangolin assigns to a top-level lineage, namely BA.2 or XBB.1.
The nextclade calde assignment resolves to recombinant; the Nextclade_pango assignment XDD or XCT.1. Since XDD and XCT.1 were not part of the 1.23.1 pangolin-data version, it's not surprising, that pangolin does not assign these lineages.

However, we'd expect that pangolin would assign a (new) recombinant with the latest data release.
I did a little test series:

sample	pangolin-data 1.23.1	pangolin-data 1.24	pangolin-data 1.25	pangolin-data 1.25.1	nextclade2 2024-01-15	nextclade3 2024-01-16	nextclade3 2024-02-16
82	BA.2	BA.2	JN.1.1	JN.1.1	XDD	XDD	XDS
84	BA.2	JN.1.1	JN.1.1	JN.1.1	XDD	XDD	XDS
85	XBB.1	JN.1.1	JN.1.1	JN.1.1	XDD	XDD	XDS
63	XBB.1	BA.2	BA.2	BA.2	XCT.1	XCT.1	XCT.1
30	XBB.1	XCT.1	XCT.1	XCT.1	XCT.1	XCT.1	XCT.1
51	BA.2	XDD	JN.1.1	JN.1.1	XDD	XDD	XDD

(Tool versions: pangolin v4.3, nexclade3 v3.2.1, nextclade2 v2.14.0)

I'm wondering now, if this is a problem in pangolin - or we see an undesignated lineage. I read that Nextclade is not perfect in assigning recombinants. However, it is (more) consistent over the dataset versions.

I'm happy for any input or feedback! 🙂

Best
Marie

The text was updated successfully, but these errors were encountered:

AngieHinrichs · 2024-02-20T22:06:13Z

Hi Marie -- without looking at the sequences, I can't say for sure what's going on. Are they in GISAID? If not, are you able to upload them to https://usher.bio/ (select the full tree of 16M sequences including GISAID and increase sample size to >= 500) in order to see which sequences they most closely resemble, and what mutations make your sequences different?

Unlike nextclade, pangolin doesn't have a general 'recombinant' category; it can only assign Pango lineages. Some things that may lead to flip-flopping assignments in successive releases are a high number of N or other ambiguous bases, or a mix of mutations associated with different lineages, whether that's due to a new recombinant, mixed infection or contamination in sequencing.

If the sequences are in GISAID, there are some very keen volunteers such as @aviczhl2, @JosetteSchoenma and @FedeGueli who search for new potential recombinants and may have already taken a look.

FedeGueli · 2024-02-20T22:46:58Z

Hi Marie -- without looking at the sequences, I can't say for sure what's going on. Are they in GISAID? If not, are you able to upload them to https://usher.bio/ (select the full tree of 16M sequences including GISAID and increase sample size to >= 500) in order to see which sequences they most closely resemble, and what mutations make your sequences different?

Unlike nextclade, pangolin doesn't have a general 'recombinant' category; it can only assign Pango lineages. Some things that may lead to flip-flopping assignments in successive releases are a high number of N or other ambiguous bases, or a mix of mutations associated with different lineages, whether that's due to a new recombinant, mixed infection or contamination in sequencing.

If the sequences are in GISAID, there are some very keen volunteers such as @aviczhl2, @JosetteSchoenma and @FedeGueli who search for new potential recombinants and may have already taken a look.

Recombinants have been tracked by @aviczhl2 @josettshoenma and @Over-There-Is i dont think there is something that went under the radar. but i can suggest to try to verify if any Epi_ISl of this putative lineage is present in sars-cov-2-variants/lineage-proposals#957 (comment) via a simple query with the github search tool or more specific looking for them on this .tsv: https://github.com/sars-cov-2-variants/lineage-proposals/blob/main/recombinants.tsv

If i can get a list of the IDs i could search for them on my own and update then here

JosetteSchoenma · 2024-02-20T23:00:10Z

IMO, the best way to know if a batch of samples includes recombinants (if you are not used to recognizing them in Nextclade), is to look through GitHub issues and run the mentioned GISAID queries.
Which of course takes time!

Nextclade and Pangolin will always be a bit behind and sometimes inaccurate.

But if you have a list with EPI_ISL numbers or if you could tell me which country and dates you're interested in, one of us will probably be happy to have a look.

xz-keg · 2024-02-20T23:50:21Z

There are hundreds of different undesignated recombinants.
Most of them are registered in sars-cov-2-variants/lineage-proposals#991
and https://github.com/sars-cov-2-variants/lineage-proposals/blob/main/recombinants.tsv
If you see new ones, welcome to register in that repo too.

MarieLataretu · 2024-02-21T15:50:17Z

Hi all, thanks for all the feedback!

Unfortunately, only one sequence is on GISAID - I can keep you posted on that (best case, next week, I'd say).
EPI_ISL_18599826 is the 4ht sample (63 in the table)

Some things that may lead to flip-flopping assignments in successive releases are a high number of N or other ambiguous bases, or a mix of mutations associated with different lineages, whether that's due to a new recombinant, mixed infection or contamination in sequencing.

The N content is decent (below 3.9 %), and ambiguous bases are masked.

I checked the mapping and it does not look like a mixed infection.

Nextclade's qc.privateMutations.status ranges from good, to mediocre, to bad - not sure if this a good proxy for a mix of mutations of different lineages 🤔

I threw the samples in https://usher.bio/ (full tree, sample size to 1000). Here is a screenshot of the overview:

For pangolin-data 1.25.1, only one sample differs (JN.1.1 vs XDD; was XDD with 1.24)

JosetteSchoenma · 2024-02-21T16:29:25Z

The first 3 are linked to this singlet that @aviczhl2 found. You would have to put them all together in Nextclade to see if they match.

EPI_ISL_18715763
sars-cov-2-variants/lineage-proposals#991 (comment)

JosetteSchoenma · 2024-02-21T17:05:21Z

The 4th one, called BA.2 is linked to a pretty clean XCT.1 with only a reversion of C7051T. EPI_ISL_18599826

JosetteSchoenma · 2024-02-21T17:09:13Z

The 5th is linked to a completely normal XCT.1 from Austria. EPI_ISL_18385324

JosetteSchoenma · 2024-02-21T17:13:23Z

The 6th is linked to a completely normal looking XDD from France. You could check yours for mutations C6541T, A7842G, T15756A and A26275G to confirm it is an XDD.

AngieHinrichs · 2024-02-21T18:04:20Z

Thanks for the insights @JosetteSchoenma. @MarieLataretu you can see a lot more detail about the neighboring sequences, and what mutations separate your sequences from those sequences, if you click on the 'view in Nextstrain' links.

MarieLataretu · 2024-02-22T12:44:21Z

The 6th is linked to a completely normal looking XDD from France. You could check yours for mutations C6541T, A7842G, T15756A and A26275G to confirm it is an XDD.

I checked the four mutations (in the Nextclade output), and all 4 are present!

The subtree in Nextstrain does not show any mutations:

Do I interpret it correctly that it's indeed an XDD (most probably)?

JosetteSchoenma · 2024-02-22T12:49:01Z

The 6th is linked to a completely normal looking XDD from France. You could check yours for mutations C6541T, A7842G, T15756A and A26275G to confirm it is an XDD.

I checked the four mutations (in the Nextclade output), and all 4 are present!

The subtree in Nextstrain does not show any mutations:

Do I interpret it correctly that it's indeed an XDD (most probably)?

Yes, very likely an XDD.

MarieLataretu · 2024-02-22T13:26:55Z

The 4th one, called BA.2 is linked to a pretty clean XCT.1 with only a reversion of C7051T. EPI_ISL_18599826

Oh shoot, I overlooked that one sample is already on GISAID! 🙈

The 4th sample (63 in the table) is exactly EPI_ISL_18599826!

MarieLataretu · 2024-02-22T16:40:59Z

The first 3 are linked to this singlet that @aviczhl2 found. You would have to put them all together in Nextclade to see if they match.

EPI_ISL_18715763 sars-cov-2-variants/lineage-proposals#991 (comment)

They are linked, but the 3 sequences have 4 additional mutations in the ORF1ab compared to EPI_ISL_18715763:

AngieHinrichs · 2024-02-22T18:54:48Z

@MarieLataretu I would like to look into why your sixth sample (51) is not classified as XDD by recent versions of pangolin-data. Can you share the sequence (email: angie at soe dot ucsc dot edu), or if that's not allowed, update this issue with its EPI_ISL ID when it is in GISAID? Thanks!

xz-keg · 2024-02-22T19:35:59Z

The first 3 are linked to this singlet that @aviczhl2 found. You would have to put them all together in Nextclade to see if they match.
EPI_ISL_18715763 sars-cov-2-variants/lineage-proposals#991 (comment)

They are linked, but the 3 sequences have 4 additional mutations in the ORF1ab compared to EPI_ISL_18715763:

This looks like an independent new HV.1/JN.1 recombinant with similar breakpoint as 18715763(which is JG.3/JN.1 recomb) The "additional mutations" basically reverts the JG.3 defining and adds the HV.1 defining mutations.

AngieHinrichs · 2024-02-23T21:24:51Z

Thanks @MarieLataretu for sharing the sample 51 sequence. It turns out that one missing mutation (or reversion to reference relative to XDD) is causing it to be placed just short of XDD in the pangolin-data 1.25.1 minimized tree.

In the minimized tree, the final node on the path to XDD has these mutations:

C6541T, G11727A, C18894T, T22926C, A26275G, C26529G, T26681C, T26833C, C29625T

sample 51 has all of those except for T22926C. If it had an N at 22926, then usher would impute a C because of all the other matches, but it has the reference allele T at 22926. So usher splits that node up, creating a new node, with all mutations except T22926C, and moving the original node (labeled XDD) to become a child of the new node with only T22926C. sample 51 also becomes a child of the new node -- a sibling of XDD, so it misses the assignment. That's the long way of saying that missing a single mutation at the final node can cause a missed assignment, unfortunately.

In the full tree, there are some XDD sequences that share the mutation G5155A with sample 51, so sample 51 is placed in XDD on that branch, with one private mutation (T21810C) and multiple reversions to reference (T21711C, C22926T, G26610A):

https://nextstrain.org/fetch/hgwdev.gi.ucsc.edu/~angie/pangolin-data-54.json?branchLabel=nuc%20mutations&label=id:node_6955286

How strong is the read-level evidence for sample 51 having the reference allele instead of the expected XDD mutations at reference positions 21711, 22926 and 26610? If the coverage is very low there, it would be better from the usher point of view to have N instead of reference allele.

I can make the matching a little less stringent in the next release of pangolin-data by adding a pseudo-lineage label "XDD_dropout" in the full tree, a couple nodes upstream of XDD. When minimizing the full tree to make the next release of pangolin_data, the "_dropout" will be truncated so there will be a second "XDD" label a bit upstream of where XDD really starts, and that will assign XDD a bit more broadly (hopefully not too broadly).

MarieLataretu · 2024-02-27T09:29:59Z

Thanks for the insight, @AngieHinrichs !
I'll check the mentioned positions in detail and get back to you. (It might take some time, because I'm travelling atm)

MarieLataretu changed the title ~~of recombinants~~ Unexpected assigment of (potential) recombinants Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected assigment of (potential) recombinants #54

Unexpected assigment of (potential) recombinants #54

MarieLataretu commented Feb 20, 2024

AngieHinrichs commented Feb 20, 2024

FedeGueli commented Feb 20, 2024 •

edited

Loading

JosetteSchoenma commented Feb 20, 2024

xz-keg commented Feb 20, 2024

MarieLataretu commented Feb 21, 2024 •

edited

Loading

JosetteSchoenma commented Feb 21, 2024

JosetteSchoenma commented Feb 21, 2024

JosetteSchoenma commented Feb 21, 2024

JosetteSchoenma commented Feb 21, 2024

AngieHinrichs commented Feb 21, 2024

MarieLataretu commented Feb 22, 2024

JosetteSchoenma commented Feb 22, 2024 •

edited

Loading

MarieLataretu commented Feb 22, 2024

MarieLataretu commented Feb 22, 2024

AngieHinrichs commented Feb 22, 2024

xz-keg commented Feb 22, 2024 •

edited

Loading

AngieHinrichs commented Feb 23, 2024

MarieLataretu commented Feb 27, 2024

Unexpected assigment of (potential) recombinants #54

Unexpected assigment of (potential) recombinants #54

Comments

MarieLataretu commented Feb 20, 2024

AngieHinrichs commented Feb 20, 2024

FedeGueli commented Feb 20, 2024 • edited Loading

JosetteSchoenma commented Feb 20, 2024

xz-keg commented Feb 20, 2024

MarieLataretu commented Feb 21, 2024 • edited Loading

JosetteSchoenma commented Feb 21, 2024

JosetteSchoenma commented Feb 21, 2024

JosetteSchoenma commented Feb 21, 2024

JosetteSchoenma commented Feb 21, 2024

AngieHinrichs commented Feb 21, 2024

MarieLataretu commented Feb 22, 2024

JosetteSchoenma commented Feb 22, 2024 • edited Loading

MarieLataretu commented Feb 22, 2024

MarieLataretu commented Feb 22, 2024

AngieHinrichs commented Feb 22, 2024

xz-keg commented Feb 22, 2024 • edited Loading

AngieHinrichs commented Feb 23, 2024

MarieLataretu commented Feb 27, 2024

FedeGueli commented Feb 20, 2024 •

edited

Loading

MarieLataretu commented Feb 21, 2024 •

edited

Loading

JosetteSchoenma commented Feb 22, 2024 •

edited

Loading

xz-keg commented Feb 22, 2024 •

edited

Loading