XBB.1 Sublineage with S:E180V, S:K478R, S:S486P, ORF9b:I5T, ORF9b:N55S, ORF1a:L3829F, ORF1b:D1746Y (42 seq) #1723

ryhisner · 2023-03-04T01:14:12Z

Description
Sub-lineage of: XBB.1
Earliest sequence: 2023-1-23, USA, New York — EPI_ISL_16835403
Most recent sequence: 2023-2-24, India, Maharashtra— EPI_ISL_17073064; Singapore (with travel from India) — EPI_ISL_17030043; Denmark — EPI_ISL_17048705
Countries circulating: Primarily in India. Has been sequenced in India (23), USA (7—at least five with international travel history), Singapore (6—all with travel from India), England (2), Denmark (1), Germany (1), Ireland (1), Italy (1),
Number of Sequences: 42
GISAID Query: T12730A, T28297C, A28447G
CovSpectrum Query: T12730A, T28297C, A28447G
Substitutions on top of XBB.1:
Spike: E180V, K478R, S486P
ORF9b: I5T, N55S
ORF1a: L3829F (NSP6_L260F)
ORF1b: D1746Y (NSP14_D222Y)
Nucleotide: C11750T, C11956T, T12730A, A14856G, G18703T, A22101T, A22995G, T23018C, T28297C, A28447G, C29386T

USHER Tree
The Usher tree looks as if it has two very separate branches, but this is an artifact from the very low spike coverage in most of the Indian sequences here. The branches in the lower section of the tree consist almost entirely of artifactual reversions. Similarly, all the sequences that appear to lack S:E180V merely lack coverage there and therefore almost certainly possess it.
https://nextstrain.org/fetch/raw.githubusercontent.com/ryhisner/jsons/main/XBB.1_Lineage_20_seq_Tree_subtreeAuspice1_genome_1cd4f_28ede0.json

Evidence
This saltation lineage has already spread quite widely across the globe, but of the non-Indian sequences with adequate metadata about travel history, almost all indicate international travel, mostly from India. One USA sequence lists travel history from Ethiopia, two with India, and the rest do not specify a country (but are sequenced by Gingko Bioworks, which only sequences incoming international travelers). All six sequences from Singapore have travel history in India. Sequencing in India has been rather sparse of late, so this may comprise a substantial fraction of infections there, particularly given it was first sequenced on January 23.

S:K478R has been present in a few smaller lineages (CM.4.1, BA.2.38.3) and regularly appears in scattered sequences here and there. ORF1a:L3829F is of course found in all BQ* sequences, but it is also one of the most convergent ORF1a mutations found in chronic-infection mutations. ORF9b:I5T (T28297C) is in XBB.1.9 and has been posited to be the reason XBB.1.9 lineages seem to grow somewhat faster than XBB.1.5. ORF9b has been implicated in immune evasion, primarily interferon suppression I think, so it's possible ORF9b:N55S could confer some further resistance to immunity. Both of these ORF9b mutations are synonymous in N.

Genomes

Genomes

EPI_ISL_16835403, EPI_ISL_16940118, EPI_ISL_17012463, EPI_ISL_17012465, EPI_ISL_17012469, EPI_ISL_17016337, EPI_ISL_17016347, EPI_ISL_17020434, EPI_ISL_17024073, EPI_ISL_17029900, EPI_ISL_17029986, EPI_ISL_17030006, EPI_ISL_17030031, EPI_ISL_17030043, EPI_ISL_17032330, EPI_ISL_17048705, EPI_ISL_17066648, EPI_ISL_17066668, EPI_ISL_17073035-17073036, EPI_ISL_17073038-17073041, EPI_ISL_17073047, EPI_ISL_17073050, EPI_ISL_17073054, EPI_ISL_17073059, EPI_ISL_17073061-17073064, EPI_ISL_17076689, EPI_ISL_17078570-17078572, EPI_ISL_17078574-17078577, EPI_ISL_17078591, EPI_ISL_17084712

AnonymousUserUse · 2023-03-04T15:05:07Z

CovSpectrum Query: Nextcladepangolineage:

CoV-Spectrum query missing

Both of these ORF9b mutations are synonymous in N.

I cannot understand this sentence. What do you mean with N here?

FedeGueli · 2023-03-04T16:17:03Z

N

CovSpectrum Query: Nextcladepangolineage:

CoV-Spectrum query missing

Both of these ORF9b mutations are synonymous in N.

I cannot understand this sentence. What do you mean with N here?

CovSpectrum: T12730A, T28297C, A28447G

Orf9b is just the alternate reading of Orf9a=N protein N.

FedeGueli · 2023-03-04T16:20:14Z

@ryhisner i m seeing a lot of S:478R mainly from SA Russia and in XBB.1.5 .
It was defining in BH.1 that with BJ.1 and Ba.2.10.4 was a main actor the first era of heavy mutated BA.2 from Indian area won then by BA.2.75 and its recombinant XBB.

FedeGueli · 2023-03-04T16:31:29Z

@corneliusroemer @thomaspeacock @InfrPopGen @AngieHinrichs i suggest a very fast designation of this one to monitor it as soon as possible ( i already added it to internal charts and its growth is in the top range comparable to both XBB.1.9.1 and XBB.1.9.2 at the same number of seqs) , from its profile i bet it will compete with the other leading XBB.1+486P spikes

FedeGueli · 2023-03-04T16:50:04Z

We didnt care too much to XBB.1.9 early advantage but that was then shown real, so i highlight you that the signal is present here too and clearly also against XBB.1.9:

https://cov-spectrum.org/explore/World/AllSamples/Past2M/variants?nextcladePangoLineage=XBB.1.9.1*&nucMutations1=T12730A%2CT28297C%2CA28447G&analysisMode=CompareToBaseline&

ryhisner · 2023-03-04T17:43:42Z

@AnonymousUserUse, ORF9b overlaps with N (nucleocapsid) in the SARS-CoV-2 genome, but they are out of frame with respect to each other, meaning that a nucleotide mutation that results in an amino acid (AA) substitution in ORF9b does not always cause an AA substitution in N. Nucleotide mutations that cause an AA substitution are called non-synonymous. Those that do not cause an AA change are called synonymous. Everything below is a layman's simplification, some of which may not be precisely correct but which I think gets the basic picture right.

For example, the nucleotide mutation T28297 is the third nucleotide in N:N8, which has the nucleotide sequence AAT. T28297C changes the sequence for this AA to AAC. However, both AAT and AAC code for the same amino acid: asparagine (symbolized by N). So T28297C is synonymous in N. In ORF9b, T28297 is the 2nd nucleotide the 5th amino acid, ORF9b:I5, whose nucleotides are ATC. T28297C changes this from ATC to ACC, which results in a change in amino acid from isoleucine (I) to threonine (T).

You can see how N and ORF9b overlap in the diagram below, which I pasted together using screenshots from NextClade. The N gene spans nucleotides 28274-29533 while ORF9b stretches from 28284-28577. The RNA-dependent RNA polymerase (RDRP), which basically makes copies of each viral gene by creating a complementary RNA strand, runs along the genome, beginning at the 3' end (the far right side in the diagram below). Each of the genes pictured (except ORF1b) has its own code (called a transcription regulatory sequence, or TRS) near its 5' end (left side in diagram) that the RDRP can recognize as a signal to stop, latch onto the RNA, and begin scanning the other direction. When it reaches a start codon (the nucleotide sequence ATG), it starts creating the complementary RNA strand. When it reaches a stop codon (TAA, TAG, or TGA), it stops copying.

HynnSpylor · 2023-03-04T17:49:38Z

Great proposal with amazing growth rate. I support XBB.1+S:F486P+X (any other important mutation) should also be monitoring carefully.
Several days ago I noticed two other possible sublineage (#1704 #1712 ) but missed it.

xz-keg · 2023-03-04T17:56:11Z

orf1a:L3829F again, it seems that this mutation occurs independently in many chronic seqs.
#405
#764
#770
#871
#1052, BS.1
#1266, BA.5.2.42
#1724

It seems that this mutation is convergent among chronic long branches.

AnonymousUserUse · 2023-03-04T19:53:31Z

@ryhisner
Thanks a lot for the detailed explanation!
What is the range of ORF1a and ORF10? I have often heard of that, but cannot find an answer for the exact range of these two genes.

AngieHinrichs · 2023-03-04T20:57:32Z

What is the range of ORF1a and ORF10? I have often heard of that, but cannot find an answer for the exact range of these two genes.

The NCBI RefSeq https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2 includes gene annotations at the nucleotide coding level and the protein level (ORF1a and ORF1ab are each split into several small proteins), so if you search for ORF1a and ORF10 on that page you can find their ranges in the reference genome and some other info about them.

The RefSeq annotations for NC_045512.2 include only N (a.k.a. ORF9 or nucleocapsid), they don't divide it into ORF9a and ORF9b.

Nextstrain's annotations include ORF9b: https://github.com/nextstrain/ncov/blob/master/defaults/annotation.gff Beware, those annotations also artificially split ORF1ab into separate ORF1a (which is real) and ORF1b (which is not real) in order to avoid having to account for ribosomal slippage in ORF1ab when translating nucleotide changes to protein changes.

FedeGueli · 2023-03-04T20:59:56Z

orf1a 1-4401
orf10 29558-end of genome (3' end)

here what u need: https://codon2nucleotide.theo.io/

AnonymousUserUse · 2023-03-04T22:05:02Z

Summary:

Gene	Range of codon	Range of nucleotide	Used in Nextstrain	Used in GISAID	Real or not
ORF1ab	1-7098	266-21555	No	Yes	Real
ORF1a	1-4401	266-13468	Yes	No	Real
ORF1b	1-2696	13468-21555	Yes	No	Not real
S	1-1274	21563-25384	Yes	Yes	Real
ORF3a	1-276	25393-26220	Yes	Yes	Real
E	1-76	26245-26472	Yes	Yes	Real
M	1-223	26523-27191	Yes	Yes	Real
ORF6	1-62	27202-27387	Yes	Yes	Real
ORF7a	1-122	27394-27759	Yes	Yes	Real
ORF7b	1-44	27756-27887	Yes	Yes	Real
ORF8	1-122	27894-28259	Yes	Yes	Real
N	1-420	28274-29533	Yes	Yes	Real
ORF9b	1-98	28284-28577	Yes	No	Real
ORF10	1-39	29558-29674	No	Yes	Real

CoV-Spectrum also uses the annotation from Nextstrain.
https://codon2nucleotide.theo.io/ shows the annotation from GISAID.

Is so correct?
Thanks all. And I apologize for off-topic.

------edit 2023/3/5------
Range of M and ORF9b has been corrected

FedeGueli · 2023-03-05T00:15:24Z

@InfrPopGen @corneliusroemer @thomasppeacock @AngieHinrichs
To better contestualize reccomended lineages @alurqu and me tried to add them to collection 24 to preview how they will rank min the global competion

In the case of this proposed issue it is worth flagging it even if low numbers should make us take this growth advantage with a big big grain of salt.
https://cov-spectrum.org/collections/24

Mike-Honey · 2023-03-05T05:03:45Z

Is this S:S486P or S:F486P? I'm guessing F?

c19850727 · 2023-03-05T06:05:37Z

@Mike-Honey it depends on how you choose your reference. if it's relative to ancestral then it's F486P, but I think @ryhisner is using XBB.1 as the reference here.

NkRMnZr · 2023-03-05T06:08:10Z

Is this S:S486P or S:F486P? I'm guessing F?

Those are using different references, for XBB.1, or tracing back to BM.1.1.1 that is S on codon 486 of S protein; if ref to the root that is F on that codon.

Added new lineage XBB.1.16 from #1723 with 3 new sequence designations, and 0 updated

InfrPopGen · 2023-03-05T12:31:58Z

Thanks for submitting. We've added lineage XBB.1.16 with 3 newly designated sequences, and 0 updated. Defining mutations A22101T (S:E180V), T28297C, C29386T (following C11750T (ORF1a:L3829F), G18703T (ORF1b:D1746Y), A22995G (S:K478R), T23018C (S:S486P), A14856G, A28447G).

FedeGueli · 2023-03-05T13:41:12Z

Thank you @InfrPopGen for your sunday work!

corneliusroemer · 2023-03-07T12:44:08Z

I added extra sequences to make inference more robust - it was only 3 designations thus far.

The the lineage seems to be on a shared branch with XBB.1.12, defined by 11956T. Do you agree? The presence of many basal sequences and clean tree on that branch makes me think that this looks like a plausible sequence of events.

I was struck by some XBB.1.9 having the same mutation (11956T) - is that a defining mutation or was it pulled in to the tree via preferential sampling by @ryhisner? Maybe there's some dropout samples in the designated sequences causing >10% of sequences to miss that mutation. In that case, XBB.1.9 may also be on that branch.

Edit: I looked into 11956T in XBB.1.9 (sampling homogeneously from across XBB.1.9). It looks like 11956T pops up in XBB.1.9.1 only. Strange. So real homoplasy? Or could it be that this is an artefact in some way? Investigation welcome :)

That part of the tree is unfortunately messed up on Usher. I have a gut feeling that low coverage/bad qc sequences screw up the tree there. Maybe it would be possible to make 2 Usher trees: 1. One with a high bar for quality, maybe including only known labs of good quality, use that for the macro-structure. 2. Place lower quality sequences but make sure this doesn't cause flip-flopping. Maybe lower quality sequences need to be scored differently in the parsimony cost function - or flip-flopping needs to be penalized so that it gets optimized away. @AngieHinrichs I try very hard to make the Nextclade reference tree as close to what we consider to be the real tree in macro structure as possible. Maybe some sort of hybrid could be possible - macro constraint tree using human curation (pango lineages, manual constraint tree, overwriting artefacts etc) then letting usher fill in the gaps below there - and potentially suggest where the macro tree may be wrong.

AngieHinrichs · 2023-03-08T00:56:14Z

That part of the tree is unfortunately messed up on Usher.

Yes, XBB.1.9 needs a little fixup. If you label w/back-mutations you can see a couple there. XBB.1.9.1 (XBB.1.9 > C11956T > S:S486P (T23018C)) has ~200 sequences with N:T362I (C29358T); then there are 5 sequences with N:T362I (C29358T) and S:S486P (T23018C) but without 11956T (so reversion T11956C) -- and those have pulled in sequences that have S:S486P (T23018C) but neither 11956T nor N:T362I (C29358T), including the XBB.1.9.2 branch, doh.

I think I can fix this by temporarily removing those 5 sequences and reoptimizing. But it would be even better to prevent these situations where a few wayward sequences can pull in a large branch in by adding reversions. I'm wondering if there is a way to prevent that from happening in matOptimize, or maybe to make a utility that recognizes the pattern and puts those post-reversion branches where they would go without reversions (unless that would be a clear loss parsimony-wise).

I'm not sure that excluding all sequences from certain labs is the right way to go about it (although I agree some labs seem to cause more trouble than others). Even labs that produce some frustrating sequences also produce some OK sequences, and sometimes they cover an undersampled part of the world. On the other hand, any lab that produces tons of sequences will produce some bad ones that cause trouble like this even if overall quality is good.

FedeGueli · 2023-03-08T11:55:56Z

@corneliusroemer C11956T is highly homoplsic i highlighted you back in the xbb.1.9.1/2 issue. I dont think it has missed one single lineage from the start of the pandemic (i m exaggerating to make things clear, it popped up everywhere, everytime)

https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?nucMutations=C11956T&aaMutations1=S%3A153I%2CS%3A1258Q%2CN%3A151L&

xz-keg · 2023-03-08T12:37:48Z

I'm wondering if there is a way to prevent that from happening in matOptimize, or maybe to make a utility that recognizes the pattern and puts those post-reversion branches where they would go without reversions (unless that would be a clear loss parsimony-wise).

Is there a way to do that? For example, adding punishment for reversions? (For example, giving 2 instead of 1 for reversion mutations in parsimony score)

AngieHinrichs · 2023-03-08T19:12:47Z

Is there a way to do that? For example, adding punishment for reversions? (For example, giving 2 instead of 1 for reversion mutations in parsimony score)

Yes, I think giving 2 instead of 1 would get rid of most reversions -- but occasionally there is a real one, e.g. in BA.2 we saw several genuine reversions of A23040G (S:Q493R), and any time there is a recombinant, reversions may be helpful to place it on or near one of its parent lineages.

Adding a small fractional penalty for reversions would probably be nice, but it would mess up the simple integer parsimony scoring that is very fast. I have seen an alignment scoring scheme that uses larger integers like 100 instead of 1 for a match, so certain nucleotide matches/changes could be given slightly higher or lower scores/penalties calibrated to the species being aligned (Chiaromonte 2002), and gap extension could be penalized significantly less than gap initiation.

I think there are tie-breaking situations in which reversions are supposed to be favored less. Really I don't understand matOptimize well enough to know exactly what to ask for. I need to compile some good examples and test cases for @yceh and hope he has some time. :)

xz-keg · 2023-03-10T01:10:37Z

Is there a way to do that? For example, adding punishment for reversions? (For example, giving 2 instead of 1 for reversion mutations in parsimony score)

Yes, I think giving 2 instead of 1 would get rid of most reversions -- but occasionally there is a real one, e.g. in BA.2 we saw several genuine reversions of A23040G (S:Q493R), and any time there is a recombinant, reversions may be helpful to place it on or near one of its parent lineages.

I guess under parsimony score=2-3, real reversions like BA.2 will still be detected, as they're usually combined with other groups of mutations that makes parsimony score=2-3 of that reversion still optimal.

While false reversions will be largely reduced under score=2-3, making branch-specific labels to counter the remaining false reversions more applicable.

thomasppeacock added recommended Recommended for designation by pango team member XBB proposed sublineage of XBB labels Mar 4, 2023

InfrPopGen self-assigned this Mar 5, 2023

InfrPopGen added a commit that referenced this issue Mar 5, 2023

Merge pull request #1728 from InfrPopGen/master

d6dd885

Added new lineage XBB.1.16 from #1723 with 3 new sequence designations, and 0 updated

InfrPopGen added the designated label Mar 5, 2023

InfrPopGen added this to the XBB.1.16 milestone Mar 5, 2023

InfrPopGen closed this as completed Mar 5, 2023

xz-keg mentioned this issue Mar 5, 2023

Lineages with orf1a:L3829F #1729

Closed

HynnSpylor mentioned this issue Mar 13, 2023

Potential XBB.1 (or XBB.1.5?) sublineage with S:Y200C (A22161G), S:F486P(T23018C) and Orf9b:I5T(T28297C) (85seqs) #1704

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XBB.1 Sublineage with S:E180V, S:K478R, S:S486P, ORF9b:I5T, ORF9b:N55S, ORF1a:L3829F, ORF1b:D1746Y (42 seq) #1723

XBB.1 Sublineage with S:E180V, S:K478R, S:S486P, ORF9b:I5T, ORF9b:N55S, ORF1a:L3829F, ORF1b:D1746Y (42 seq) #1723

ryhisner commented Mar 4, 2023 •

edited

Loading

AnonymousUserUse commented Mar 4, 2023

FedeGueli commented Mar 4, 2023 •

edited

Loading

FedeGueli commented Mar 4, 2023

FedeGueli commented Mar 4, 2023

FedeGueli commented Mar 4, 2023

ryhisner commented Mar 4, 2023

HynnSpylor commented Mar 4, 2023 •

edited

Loading

xz-keg commented Mar 4, 2023

AnonymousUserUse commented Mar 4, 2023

AngieHinrichs commented Mar 4, 2023

FedeGueli commented Mar 4, 2023

AnonymousUserUse commented Mar 4, 2023 •

edited

Loading

FedeGueli commented Mar 5, 2023

Mike-Honey commented Mar 5, 2023

c19850727 commented Mar 5, 2023

NkRMnZr commented Mar 5, 2023

InfrPopGen commented Mar 5, 2023

FedeGueli commented Mar 5, 2023

corneliusroemer commented Mar 7, 2023

AngieHinrichs commented Mar 8, 2023

FedeGueli commented Mar 8, 2023

xz-keg commented Mar 8, 2023

AngieHinrichs commented Mar 8, 2023

xz-keg commented Mar 10, 2023 •

edited

Loading

XBB.1 Sublineage with S:E180V, S:K478R, S:S486P, ORF9b:I5T, ORF9b:N55S, ORF1a:L3829F, ORF1b:D1746Y (42 seq) #1723

XBB.1 Sublineage with S:E180V, S:K478R, S:S486P, ORF9b:I5T, ORF9b:N55S, ORF1a:L3829F, ORF1b:D1746Y (42 seq) #1723

Comments

ryhisner commented Mar 4, 2023 • edited Loading

AnonymousUserUse commented Mar 4, 2023

FedeGueli commented Mar 4, 2023 • edited Loading

FedeGueli commented Mar 4, 2023

FedeGueli commented Mar 4, 2023

FedeGueli commented Mar 4, 2023

ryhisner commented Mar 4, 2023

HynnSpylor commented Mar 4, 2023 • edited Loading

xz-keg commented Mar 4, 2023

AnonymousUserUse commented Mar 4, 2023

AngieHinrichs commented Mar 4, 2023

FedeGueli commented Mar 4, 2023

AnonymousUserUse commented Mar 4, 2023 • edited Loading

FedeGueli commented Mar 5, 2023

Mike-Honey commented Mar 5, 2023

c19850727 commented Mar 5, 2023

NkRMnZr commented Mar 5, 2023

InfrPopGen commented Mar 5, 2023

FedeGueli commented Mar 5, 2023

corneliusroemer commented Mar 7, 2023

AngieHinrichs commented Mar 8, 2023

FedeGueli commented Mar 8, 2023

xz-keg commented Mar 8, 2023

AngieHinrichs commented Mar 8, 2023

xz-keg commented Mar 10, 2023 • edited Loading

ryhisner commented Mar 4, 2023 •

edited

Loading

FedeGueli commented Mar 4, 2023 •

edited

Loading

HynnSpylor commented Mar 4, 2023 •

edited

Loading

AnonymousUserUse commented Mar 4, 2023 •

edited

Loading

xz-keg commented Mar 10, 2023 •

edited

Loading