-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected assigment of (potential) recombinants #54
Comments
Hi Marie -- without looking at the sequences, I can't say for sure what's going on. Are they in GISAID? If not, are you able to upload them to https://usher.bio/ (select the full tree of 16M sequences including GISAID and increase sample size to >= 500) in order to see which sequences they most closely resemble, and what mutations make your sequences different? Unlike nextclade, pangolin doesn't have a general 'recombinant' category; it can only assign Pango lineages. Some things that may lead to flip-flopping assignments in successive releases are a high number of N or other ambiguous bases, or a mix of mutations associated with different lineages, whether that's due to a new recombinant, mixed infection or contamination in sequencing. If the sequences are in GISAID, there are some very keen volunteers such as @aviczhl2, @JosetteSchoenma and @FedeGueli who search for new potential recombinants and may have already taken a look. |
Recombinants have been tracked by @aviczhl2 @josettshoenma and @Over-There-Is i dont think there is something that went under the radar. but i can suggest to try to verify if any Epi_ISl of this putative lineage is present in sars-cov-2-variants/lineage-proposals#957 (comment) via a simple query with the github search tool or more specific looking for them on this .tsv: https://github.com/sars-cov-2-variants/lineage-proposals/blob/main/recombinants.tsv If i can get a list of the IDs i could search for them on my own and update then here |
IMO, the best way to know if a batch of samples includes recombinants (if you are not used to recognizing them in Nextclade), is to look through GitHub issues and run the mentioned GISAID queries. Nextclade and Pangolin will always be a bit behind and sometimes inaccurate. But if you have a list with EPI_ISL numbers or if you could tell me which country and dates you're interested in, one of us will probably be happy to have a look. |
There are hundreds of different undesignated recombinants. |
Hi all, thanks for all the feedback! Unfortunately, only one sequence is on GISAID - I can keep you posted on that (best case, next week, I'd say).
The N content is decent (below 3.9 %), and ambiguous bases are masked. I checked the mapping and it does not look like a mixed infection. Nextclade's I threw the samples in https://usher.bio/ (full tree, sample size to 1000). Here is a screenshot of the overview: For pangolin-data 1.25.1, only one sample differs (JN.1.1 vs XDD; was XDD with 1.24) |
The first 3 are linked to this singlet that @aviczhl2 found. You would have to put them all together in Nextclade to see if they match. EPI_ISL_18715763 |
The 4th one, called BA.2 is linked to a pretty clean XCT.1 with only a reversion of C7051T. EPI_ISL_18599826 |
The 5th is linked to a completely normal XCT.1 from Austria. EPI_ISL_18385324 |
The 6th is linked to a completely normal looking XDD from France. You could check yours for mutations C6541T, A7842G, T15756A and A26275G to confirm it is an XDD. |
Thanks for the insights @JosetteSchoenma. @MarieLataretu you can see a lot more detail about the neighboring sequences, and what mutations separate your sequences from those sequences, if you click on the 'view in Nextstrain' links. |
Oh shoot, I overlooked that one sample is already on GISAID! 🙈 The 4th sample (63 in the table) is exactly EPI_ISL_18599826! |
They are linked, but the 3 sequences have 4 additional mutations in the ORF1ab compared to EPI_ISL_18715763: |
@MarieLataretu I would like to look into why your sixth sample (51) is not classified as XDD by recent versions of pangolin-data. Can you share the sequence (email: angie at soe dot ucsc dot edu), or if that's not allowed, update this issue with its EPI_ISL ID when it is in GISAID? Thanks! |
This looks like an independent new HV.1/JN.1 recombinant with similar breakpoint as 18715763(which is JG.3/JN.1 recomb) The "additional mutations" basically reverts the JG.3 defining and adds the HV.1 defining mutations. |
Thanks @MarieLataretu for sharing the sample 51 sequence. It turns out that one missing mutation (or reversion to reference relative to XDD) is causing it to be placed just short of XDD in the pangolin-data 1.25.1 minimized tree. In the minimized tree, the final node on the path to XDD has these mutations: C6541T, G11727A, C18894T, T22926C, A26275G, C26529G, T26681C, T26833C, C29625T sample 51 has all of those except for T22926C. If it had an N at 22926, then usher would impute a C because of all the other matches, but it has the reference allele T at 22926. So usher splits that node up, creating a new node, with all mutations except T22926C, and moving the original node (labeled XDD) to become a child of the new node with only T22926C. sample 51 also becomes a child of the new node -- a sibling of XDD, so it misses the assignment. That's the long way of saying that missing a single mutation at the final node can cause a missed assignment, unfortunately. In the full tree, there are some XDD sequences that share the mutation G5155A with sample 51, so sample 51 is placed in XDD on that branch, with one private mutation (T21810C) and multiple reversions to reference (T21711C, C22926T, G26610A): How strong is the read-level evidence for sample 51 having the reference allele instead of the expected XDD mutations at reference positions 21711, 22926 and 26610? If the coverage is very low there, it would be better from the usher point of view to have N instead of reference allele. I can make the matching a little less stringent in the next release of pangolin-data by adding a pseudo-lineage label "XDD_dropout" in the full tree, a couple nodes upstream of XDD. When minimizing the full tree to make the next release of pangolin_data, the "_dropout" will be truncated so there will be a second "XDD" label a bit upstream of where XDD really starts, and that will assign XDD a bit more broadly (hopefully not too broadly). |
Thanks for the insight, @AngieHinrichs ! |
Hi there,
First, thanks for your work and the latest updates!
We stumbled across a few samples from the last months that pangolin assigns to a top-level lineage, namely BA.2 or XBB.1.
The nextclade calde assignment resolves to
recombinant
; theNextclade_pango
assignment XDD or XCT.1. Since XDD and XCT.1 were not part of the1.23.1
pangolin-data version, it's not surprising, that pangolin does not assign these lineages.However, we'd expect that pangolin would assign a (new) recombinant with the latest data release.
I did a little test series:
(Tool versions: pangolin v4.3, nexclade3 v3.2.1, nextclade2 v2.14.0)
I'm wondering now, if this is a problem in pangolin - or we see an undesignated lineage. I read that Nextclade is not perfect in assigning recombinants. However, it is (more) consistent over the dataset versions.
I'm happy for any input or feedback! 🙂
Best
Marie
The text was updated successfully, but these errors were encountered: