Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2 small South African clusters of Omicron/Delta recombinants with interesting Spike mutations (8 sequences) #844

Closed
JosieLikesCats opened this issue Jul 14, 2022 · 48 comments
Assignees
Labels
designated monitor currently too small, watch for future developments recombinant recombinant proposal
Milestone

Comments

@JosieLikesCats
Copy link

Hi everyone, I'm just opening this issue to highlight that there are several sequences with an unusual mutation pattern in our most recent upload from South Africa, which will potentially represent two new lineages if more sequences are detected. The teams in our genomic surveillance network (NGS-SA) as well as our public health institute (NICD) are closely monitoring the sequences and cases in the country. These new constellations have been detected only in a small proportion of recent data, and our cases remain low.

I know these do not yet meet requirements for designation, as there are only N=4 and N=3 (2 available on GISAID, last 1 will be released tomorrow) sequences for each constellation, but we thought they would probably be of interest and picked up here/on Twitter eventually. For now, please see below for some details and the major mutation profiles for the two groups of sequences.

N=4 constellation 1
Earliest sequence: 28 June 2022
Most recent sequence: 29 June 2022
Circulating: Gauteng, South Africa
Nextclade assigns 21M but flags lots of private mutations (mainly 21J), pango assigns Unassigned/B.1.1.529

Genomes
EPI_ISL_13830378
EPI_ISL_13830377
EPI_ISL_13830376
EPI_ISL_13830375

N=3 constellation 2
Earliest sequence: 13 June 2022
Most recent sequence: 24 June 2022
Circulating: Limpopo, South Africa
Nextclade assigns 21J but flags lots of private mutations (mainly 21K/21L), pango assigns XD

Genomes
EPI_ISL_13830379
EPI_ISL_13830380

Evidence
constellation1_defining_aa_changes.xlsx
constellation2_defining_aa_changes.xlsx

Spike mutations in constellation 1 only, relative to Omicron: R21G, F486P, P621S, A706V
Spike mutations in constellation 2 only, relative to Omicron: S477D
Shared mutations relative to Omicron BA.4/5: L18F, T19R, W152L, E156del, F157del, R158G, F186L, G446D, T1117I
Notably both clusters have a second silent nt change in L452R not present in BA.4/5.
There are some significant differences outside spike (see attached mutation profiles).
The sites 213, 371, 373, 375, 376, 408, and 764 are not reliably covered by the data, so they cannot be confirmed yet.
UShER tree (including 7th sequence to be uploaded): https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/singleSubtreeAuspice_genome_1ef08_5f410.json (in a previous Usher tree they clustered near XD).

@corneliusroemer
Copy link
Contributor

Thanks @JosieLikesCats for flagging these at such an early stage, they are indeed very interesting sequences.

I'm sorry, I can't really provide much insights yet, but I would very much appreciate if you could try to answer some questions that come to mind:
Do you happen to have raw reads for these - this would be very helpful to validate that the sequences are real and not for example due to coinfection. Are there other possible explanations for these? Do you know how closely related the individuals are who were sequenced?

Where in Limpopo and Gauteng are these from? Google maps says ~400km but could the Limpopo ones also be from a place closer to Gauteng province?

These sequences should be run through sc2rf to check for recombination - first impression is that it's very recombinanty. But also that both sequences seem to have some things in common. Very speculatively, these could represent different results of intra host recombination - but one needs to look at this in more detail.

It may be worth splitting this issue up and separate the proposals but for now to keep things simple I think it's fine to keep them both together.

Here are screenshots from Nextclade to save everyone some time:
image
image

@silcn
Copy link

silcn commented Jul 14, 2022

Just spotted these too, went to check and found this issue. These look like some weird combination of three different lineages: AY.45, a divergent BA.4/5, and BA.2 (seen near the 3' end where BA.4/5 mutations are absent from the Omicron-derived sections). Both have lots of apparent breakpoints but the 3' end of constellation 2 is particularly messy and seems to switch back and forth almost every mutation - the N protein alone goes Omicron/Delta/Omicron/Delta.

If these are real I wouldn't be surprised if they are new emergences from the Omicron source, especially given the location.

@JosetteSchoenma
Copy link

JosetteSchoenma commented Jul 14, 2022

Hi, I have run them through Sc2rf. It looks mostly like a Delta/BA.5 recombinant to me. But with several breaking points and 11 shared private mutations between the 2 clusters.
The graph on the bottom is best. Green being BA.2 and red being Delta mutations. It starts with some example BA.4 and BA.5 samples. I included the first graph, where I selected the BA.1 and BA.2 clade, because it shows 6 South-African samples, where the bottom one somehow loses 2 and shows just 4.
By the way, the C21614T is a mutation these samples share with Gamma.
image

@corneliusroemer
Copy link
Contributor

The small cluster (no 2) very much looks like a 21J and BA.4/5 recombinant to me - but with an unusually high number of 5 breakpoints. The ranges are as follows:

0-2790: BA.4/5 [length ~ 3-4k]
2791-4180: BA.4/5 -> 21J (BP1)
4181-21846: 21J [length 18-19k]
21847-21986: 21J -> BA.2 (BP2)
21987-24912: BA.4/5 [length 3k]
24913-24999: BA.4/5 -> 21J (BP3)
25000-28310: 21J [length 3-4k]
28271-28880: 21J -> BA.2 (BP4)
28881-28916: BA.4/5 [length 40-1.2k]
28917-29401: BA.4/5 -> 21J (BP5)
29402-end     : 21J [length 500-1k]

@JosetteSchoenma
Copy link

Here is the Usher tree:
Screenshot_20220715-001558_Twitter.jpg

@silcn
Copy link

silcn commented Jul 14, 2022

Here is a spreadsheet comparing the mutations to AY.45, BA.2, BA.4 and BA.5: pango844.xlsx
Mutations are included if they appear in at least one of the sequences.

Constellation 1 has at least 6 breakpoints; constellation 2 has at least 5 as @corneliusroemer says. Could potentially be even more, as there are a couple of places where a single mutation or reversion could have been gained convergently or through recombination.

I'm also unconvinced that BA.4/5 are involved - I think the Omicron parent may just be a divergent BA.2. Looking at all of the locations where BA.2 and BA.5 differ, referring to the constellations as C1 and C2 for short:
12160: C1 looks like BA.2, C2 is Delta
21765-21770 (S:69/70): both Delta
22917 (S:452): both look like BA.5
23018 (S:486): C1 is Omicron-derived but has different nucleotide from BA.2 and BA.5; C2 looks like BA.2
23040 (S:493): both look like BA.5
26529 (M:3): both Delta
26858, 27259, 27382-4: C1 could be Delta or BA.5, C2 is Delta
27889: C1 looks like BA.2, C2 is Delta

If BA.5 is involved and not BA.2, we have to believe that the silent mutations at 12160 and 27889 both reverted, as well as S:F486V in C2. In my humble opinion, this seems much less likely than S:L452R and S:R493Q arising independently on top of BA.2 - after all, we've seen that in BA.2.77 too.

The evidence against BA.4 is even stronger, e.g. neither constellation has the deletion in nsp1.

@AngieHinrichs
Copy link
Member

Thanks @JosetteSchoenma -- here is a link to the UShER view with a permanently saved .json file that won't be deleted in a couple days, and with branches labeled by reversions/back-mutations:
https://nextstrain.org/fetch/hgwdev.gi.ucsc.edu/~angie/pango-designation-844.json?branchLabel=back-mutations&c=pango_lineage_usher&label=nuc%20mutations:T19955C

Since this is apparently a recombinant, UShER is not as useful as it might be otherwise. The phylogenetic tree assumes a steady accumulation of mutations, but recombinants violate that assumption. UShER places a recombinant sequence on the branch of the tree where it has the fewest differences from existing sequences, which usually corresponds to one of the parent lineages -- but there are reversions/back-mutations for the portions of the genome contributed by the other parent. (In fact, a long branch with multiple reversions is the signal that our RIPPLES tool uses to look for potential recombinants in the big tree... we should run that again one of these days!)

These sequences are placed on a branch of BA.5 that is already riddled with various reversions that are probably mostly sequencing artifacts -- but the long branch makes it pretty clear that these sequences are different from the others, and that placement is just the best that usher could do given the circumstances, not necessarily an indication that the sequences on that subtree are closely related.

@Sinickle
Copy link

Sinickle commented Jul 14, 2022

I agree with @silcn's analysis and reasoning.

One small thought to add - Regarding the silent mutation C22916A at S:452.
If the Omicron contributor is BA.2, then that mutation would create S:L452M, which we have seen on other successful BA.2 lineages. Possibly this was an intermediate step, before gaining S:M452R.

@UnusualTimes
Copy link

Could this be an XT (or similar) + BA.5 recombination ?

@FedeGueli
Copy link
Contributor

@silcn there is also a little branch of BA.5 with 27889C of WT proposed in #797. maybe acquired via recomb or ancestral who knows.

@FedeGueli
Copy link
Contributor

FedeGueli commented Jul 15, 2022

Possible CvSp query to catch both constellation 1 and 2:

https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?aaMutations=M%3A146H&nucMutations=8595T%2C15026T&

Edit it catches 7/7 sequences on Covspectrum

based on the great list by @silcn

@JosieLikesCats
Copy link
Author

JosieLikesCats commented Jul 15, 2022

Thanks for all analysis so far, very interesting to read all the comments! The final sequence has been uploaded to GISAID, so it should be available soon - I'll edit this comment once it's released. EDIT: sequence now released, EPI_ISL_13843609

@corneliusroemer to answer some of your questions:
I'm currently analysing the raw data more closely, and will update the sequences if any of the sites turn out to be incorrect/poorly supported. I've also got some other people helping me look at them, and we can potentially share the read counts per site etc. once we're done. We do currently have the samples in the pipeline to be resequenced to confirm some of the sites as well.

The sequences are from Johannesburg (Gauteng) and Polokwane (Limpopo), which are two of the main cities in each province. We have also had school and university holidays recently, and so there has likely been increased travel between provinces.

I see recombination is being looked at quite closely by everyone, so I'll just add that we have some NGS-SA team members also taking a look with a variety of tools; we'll update accordingly if we find anything interesting.

Thanks for adding the screenshots! I had considered two separate issues but thought for now since there are so few it made sense to keep it together. Happy to split these in future if needed.

@FedeGueli
Copy link
Contributor

NGS-SA report on these sequences:
https://t.co/smHLpdRsF7

@FedeGueli
Copy link
Contributor

I can confirm this query catches 7 out of 7 sequences of this new variant:
https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?aaMutations=M%3A146H&nucMutations=8595T%2C15026T&

@ryhisner
Copy link

ryhisner commented Jul 19, 2022

There's a new sequence from Gauteng uploaded today clearly related to the others, though it's 29.7% NNN's—EPI_ISL_13913050

It has S:F486P and S:P621S, so it's part of constellation 1, but spike residues 1-340 and 670-1044 are blank according to NextClade. Unlike the other four sequences from constellation 1, this one has S:T572I. Collection date 2022-7-4.

Here's the Usher tree: https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_2267d_6f9c40.json?c=pango_lineage&label=nuc%20mutations:C7124T,C8595T,C8986T,G9053T,C15026T,G18163A,C21614T,C21618G,T21762C,G22017T,T22118C,G22899A,C22916A,T22917G,G23040A,C24912T,C25413T,G26959A,C27259A

@AngieHinrichs
Copy link
Member

@ryhisner thanks for adding the EPI_ISL ID there, was just about to ask, and sorry the UShER web interface is using an old fasta-reading library that truncates names at the space character... "hCoV-19/South" is a pretty useless label.

@AngieHinrichs
Copy link
Member

EPI_ISL_13913050 (SouthAfrica/CERI-KRISP-K045132/2022) is the first of these from outside NICD, and it is more similar to C1 than it looks in that UShER tree view -- it has Ns at 4456, 5869, 10198, 12163, 21623, and 23679, and it has C10954T and C28531T like NICD-N47701 and NICD-N47705, so I would expect UShER to place it at the end of that branch, instead of splitting it in the middle. Looking into why it didn't.

Meanwhile here's a sc2rf view so you can see how CERI-KRISP-K045132 looks pretty much like the NICD C1 sequences, but with more Ns, and without reversions at 22686, 22688 and 22786 (common casualties of amplicon dropout I think):

image

@corneliusroemer corneliusroemer changed the title N=7 sequences with unusual mutations 2 small South African clusters of Omicron/Delta recombinants with interesting Spike mutations (8 sequences) Jul 20, 2022
@thomasppeacock
Copy link

Although these clusters don't fulfill the minimum number of sequences - I think the extremely unique potential pattern of recombination might mean if sequences keep appearing they should get assigned as there will be justification for being able to refer to them by a non-ambiguous designation. Going to put a monitoring tag on this for now (hope thats okay @chrisruis @InfrPopGen !).

@thomasppeacock thomasppeacock added the monitor currently too small, watch for future developments label Jul 24, 2022
@corneliusroemer
Copy link
Contributor

Agree that it's worth seeing if any new sequences in this cluster appear and if they do to designate.

The minimum number of sequences is not a hard limit, we can make exceptions if there are good reasons (there are here).

@corneliusroemer corneliusroemer added the recombinant recombinant proposal label Jul 25, 2022
@JosieLikesCats
Copy link
Author

JosieLikesCats commented Jul 28, 2022

Hi everyone, just a heads-up that one more sequence from constellation 2 will be released in the next couple of hours (N46078, EPI_ISL_14112354). Also from Limpopo, with collection date of 30 May. We haven't yet detected any more recent samples but are monitoring closely.

@silcn
Copy link

silcn commented Jul 29, 2022

The new constellation 2 sequence is missing M:R146H and so will not be picked up by @FedeGueli's cov-spectrum query. Here is a query that will pick up everything once cov-spectrum is updated with the new sequence:
https://cov-spectrum.org/explore/South%20Africa/AllSamples/AllTimes/variants?variantQuery=%5B5-of%3A+8595T%2C+15026T%2C+21614T%2C+22118C%2C+22899A%2C+22916A%2C+24912T%2C+26959A%5D&

@silcn
Copy link

silcn commented Aug 23, 2022

2 more sequences from Constellation 1: EPI_ISL_14585888, EPI_ISL_14585891
Both from Western Cape, South Africa, sampled 2022-08-08.

@corneliusroemer
Copy link
Contributor

Here's the Usher tree with the two new sequences - I think it'd be worth designating the bigger cluster of the two as it seems to continue circulating and has a very intriguing spike profile.
image

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_17d5d_648160.json?c=Nextstrain_clade_usher&label=nuc%20mutations:C12747T,C12880T,T23018C,T23019C,C23423T,C26681T,C27575G,A29510C

@corneliusroemer corneliusroemer added the accepted A proposal for a new lineage has been accepted and will be designated. label Aug 24, 2022
@aq-sun aq-sun added the monitor currently too small, watch for future developments label Aug 26, 2022
@corneliusroemer
Copy link
Contributor

corneliusroemer commented Aug 26, 2022 via email

@aq-sun
Copy link

aq-sun commented Aug 26, 2022

I'll leave the first cluster as XAY then!

@FedeGueli
Copy link
Contributor

FedeGueli commented Aug 30, 2022

One more sequence i think popped up:

EPI_ISL_14728611

Gauteng

i think it is XAY (Constellation 1 ) :S:A706V,F486P, P621S, R21G,

@aq-sun
Copy link

aq-sun commented Aug 31, 2022

Added this to XAY.

@InfrPopGen InfrPopGen removed their assignment Sep 4, 2022
@FedeGueli
Copy link
Contributor

Gisaid query M_R146H, Spike_F186L for XAY

@AngieHinrichs
Copy link
Member

Gisaid query M_R146H, Spike_F186L for XAY

@FedeGueli I think that query covers both C1 (designated XAY) and C2 (monitored).

Extending your query for XAY: M_R146H, Spike_F186L, Spike_F486P

for C2: M_R146H, Spike_F186L, N_D63G

@FedeGueli
Copy link
Contributor

Thx @AngieHinrichs !! yes i usually do the S:F486P and then the other! good to have them separately thank you very much

@InfrPopGen InfrPopGen assigned InfrPopGen and unassigned aq-sun Sep 14, 2022
InfrPopGen added a commit that referenced this issue Sep 14, 2022
Added new recombinant lineage XBA from #844 with 4 new sequence designations, and 0 updated designations
@InfrPopGen
Copy link
Contributor

Lineage XBA has been designated for constellation 2, with four example sequences. The lineage alias is given as an interim AY.45/BA.2 recombinant, with one breakpoint, because that at least gives pipelines what they expect when reading the json.

@corneliusroemer corneliusroemer modified the milestones: XAY, XAY XBA Sep 14, 2022
@FedeGueli
Copy link
Contributor

New XAY just uploaded an elderly man in Cape Town collected on 31/08/22.

EPI_ISL_14975893

@corneliusroemer
Copy link
Contributor

corneliusroemer commented Sep 28, 2022

The first international XAY has just appeared in Denmark:

hCoV-19/Denmark/DCGC-585187/2022|EPI_ISL_15155569|2022-09-23

Travel information is not available, but we know this is a reinfection with last infection in January, possibly BA.2 given this was Denmark.

Additional host information: n_infections=2,last_infection_date=2022-01-24

image

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_16aa0_445360.json?c=pango_lineage_usher&label=nuc%20mutations:G15451A,C16466T,C19220T,A22688G,A22786C,T25584C,T26270C,G26577C,A26709G,T26767C,G29742T

@JosetteSchoenma
Copy link

There is another XAY from Denmark. Getting the same place on the Usher tree as the one @corneliusroemer mentioned. 2nd of October 2022. EPI_ISL_15284246.

@FedeGueli
Copy link
Contributor

Two XAY more sequences from south africa have been uploaded collected on 29/8 and 14/9 both from Gauteng and baseline surveillance.
EPI_ISL_15259248, EPI_ISL_15259343

@FedeGueli
Copy link
Contributor

i think i have found a better query for XAY : Spike_P621S,Spike_F186L , it actually founds 27 sequences from 3 countries while Gisaid Pangolin calls 23 viruses XAY and our old manual query a lot less.

@AngieHinrichs

@AngieHinrichs
Copy link
Member

Thanks @FedeGueli, that helped me find a couple new CA sequences that were being excluded from the tree but should be added tomorrow!

https://nextstrain.org/fetch/hgwdev.gi.ucsc.edu/~angie/XAY.2022-10-13.json?branchLabel=Spike%20mutations&c=pango_lineage_usher&label=nuc%20mutations:G4184A,C4321T,G10447A,C12747T,C12880T,T23018C,T23019C,C23423T,C26681T,C27575G,A29510C,T29742G&s=hCoV-19/USA/CA-FG-296207/2022%7CEPI_ISL_15324975%7C2022-09-30,hCoV-19/USA/CA-FG-296179/2022%7CEPI_ISL_15324972%7C2022-09-30

Also missing but hopefully added tomorrow: SouthAfrica/SU-NHLS_5859/2022|EPI_ISL_14975893|2022-08-31

@FedeGueli
Copy link
Contributor

FedeGueli commented Nov 9, 2022

Found a sequence that Nextclade sees as XBA but Usher puts outside every branch starting directly from the B.1.1.529 root, bu it is mixed between Delta and Omicron.

It is from Belgium and sampled recently: EPI_ISL_15537619

@corneliusroemer @thomasppeacock @AngieHinrichs @JosieLikesCats @JosetteSchoenma @c19850727 @silcn @shay671

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_274cf_c03220.json?c=pango_lineage_usher&s=hCoV-19/Belgium/UZB_548202210266/2022%7CEPI_ISL_15537619.%7C2022-10-24

Schermata 2022-11-09 alle 20 49 17
Schermata 2022-11-09 alle 20 51 14

@AngieHinrichs
Copy link
Member

Command-line nextclade places it with XBA as the closest match... but with 21 reversions relative to the XBA placement, as well as 7 mutations associated with other clades, and 28 additional mutations. It's excluded from the UShER tree because it's Omicron-ish but so divergent from its nextclade placement. My guess is contamination, but that's just my guess based on looking at nextclade numbers; someone looking at the raw data might see something else.

@oobb45729
Copy link

oobb45729 commented Nov 9, 2022

It looks like another recombinant strain that is related to XAY/XBA to me. The reversions can be easily explained by different breakpoints. This one is an XBA-like with a Delta-like S2. The S2 part (P681R+V736I+T859N+D950N) looks pretty real. T859N is one of the most notable convergent mutations in the late Delta era.

@oobb45729
Copy link

This one also may give some hints about how XAY/XBA evolved. This one has L452M, elaborating the L452M->R theory.

@oobb45729
Copy link

oobb45729 commented Nov 10, 2022

No, this one is unrelated to XAY/XBA. Orf1b:M115I and C25413T from AY.45 is missing. It is another Omicron/Delta recombinant that is strikingly similar to XBA.

@oobb45729
Copy link

oobb45729 commented Nov 10, 2022

A breaking point between S:EFR156G and S:V213G like XAY/XBA, which is also close to XAW and XBC breaking points.
Also like XAY/XBA/XBC, another breaking point somewhere in nsp1-nsp3 to get BA.2's orf1a:S135R.
And like XBA, a breaking point between N:D63G and N:R203K.
S:L18F+T19R+R21G like XAY and more intriguingly, Orf7a:T61S like XAY but with A27574T, not XAY's C27575G (both mutations are not common so far).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
designated monitor currently too small, watch for future developments recombinant recombinant proposal
Projects
None yet
Development

No branches or pull requests