-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Lineages Update: 20 new lineages active through 2023-02-15 #193
base: master
Are you sure you want to change the base?
Conversation
jmcbroome
commented
Feb 16, 2023
Lineage Name | Parent Lineage | Size | Exponential Growth Coefficient CI | Earliest Appearance | Latest Appearance | Regions | Nucleotide Changes | View On Cov-Spectrum | View On Taxonium (Public Samples Only) | Download Open Sequence FASTA | EPI ISLs | Amino Acid Changes | Nucleotide Reversions |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
XBB.1.9.2.1 | XBB.1.9.2 | 111 | [0.46792743 0.7430019 ] | 2023-01-02 | 2023-02-08 | Austria, Australia, and Germany | C28928T,G23401T | View On Cov-Spectrum | View On Taxonium (Public Samples Only) | No Data Available | Get EPI ISLs | S:Q613H,N:L219F | No Reversions |
XBB.1.5.2.1 | XBB.1.5.2 | 59 | [0.36211221 0.73847185] | 2022-12-28 | 2023-02-01 | USA | A22002T,C5221T | View On Cov-Spectrum | View On Taxonium (Public Samples Only) | No Data Available | Get EPI ISLs | S:K147I | No Reversions |
XBB.3.1.1 | XBB.3.1 | 17 | [0.35314583 0.91739771] | 2022-12-04 | 2023-02-04 | Denmark and Netherlands | C10263T,C9042T | View On Cov-Spectrum | View On Taxonium (Public Samples Only) | Download Example Sequence FASTA (LAPIS) | Get EPI ISLs | ORF1ab:S2926F,ORF1ab:A3333V | No Reversions |
XBM.1 | XBM | 219 | [0.35292579 0.46126096] | 2022-11-20 | 2023-02-07 | Canada | T18429C,C12534T | View On Cov-Spectrum | View On Taxonium (Public Samples Only) | Download Example Sequence FASTA (LAPIS) | Get EPI ISLs | ORF1ab:T4090I | No Reversions |
XBK.1 | XBK | 180 | [0.28522462 0.38813739] | 2022-07-27 | 2023-02-08 | Slovenia, Germany, and Italy | C25046T | View On Cov-Spectrum | View On Taxonium (Public Samples Only) | Download Example Sequence FASTA (LAPIS) | Get EPI ISLs | S:P1162S | No Reversions |
XAY.2.3 | XAY.2 | 82 | [0.28382549 0.43708748] | 2022-11-25 | 2023-02-08 | Denmark | A22034G,C657T | View On Cov-Spectrum | View On Taxonium (Public Samples Only) | No Data Available | Get EPI ISLs | ORF1ab:A131V,S:R158G | No Reversions |
XBF.3.1 | XBF.3 | 90 | [0.28305407 0.42545791] | 2022-11-02 | 2023-02-04 | Netherlands, Australia, and England | C842T | View On Cov-Spectrum | View On Taxonium (Public Samples Only) | No Data Available | Get EPI ISLs | ORF1ab:P193S | No Reversions |
XBB.1.4.2 | XBB.1.4 | 40 | [0.23443399 0.55073977] | 2022-10-16 | 2023-02-06 | Italy, Thailand, and Austria | G1148T,T26160C | View On Cov-Spectrum | View On Taxonium (Public Samples Only) | Download Example Sequence FASTA (LAPIS) | Get EPI ISLs | ORF1ab:G295C | No Reversions |
XBB.1.5.11 | XBB.1.5 | 860 | [0.23090608 0.27395253] | 2022-11-09 | 2023-02-07 | USA, England, and Canada | T17124C | View On Cov-Spectrum | View On Taxonium (Public Samples Only) | Download Example Sequence FASTA (LAPIS) | Get EPI ISLs | S:V252G | T22317G |
XBB.1.9.3 | XBB.1.9 | 62 | [0.19420051 0.7738821 ] | 2022-12-07 | 2023-02-04 | Netherlands, Spain, and England | G18169T,G19480A | View On Cov-Spectrum | View On Taxonium (Public Samples Only) | Download Example Sequence FASTA (LAPIS) | Get EPI ISLs | ORF1ab:G5969C,ORF1ab:G6406S | No Reversions |
XBB.6.1.1 | XBB.6.1 | 48 | [0.18384719 0.40605432] | 2022-11-27 | 2023-02-03 | USA | C19895T | View On Cov-Spectrum | View On Taxonium (Public Samples Only) | Download Example Sequence FASTA (LAPIS) | Get EPI ISLs | ORF1ab:A6544V | No Reversions |
XBB.1.15 | XBB.1 | 3385 | [0.16053001 0.17860341] | 2022-02-16 | 2023-02-08 | USA, Guatemala, and Mexico | G27915T,C1884T | View On Cov-Spectrum | View On Taxonium (Public Samples Only) | Download Example Sequence FASTA (LAPIS) | Get EPI ISLs | ORF1ab:A540V,ORF8:G8* | No Reversions |
XBB.1.5.7.1 | XBB.1.5.7 | 114 | [0.12235608 0.27905442] | 2022-12-01 | 2023-02-04 | USA, Germany, and Mexico | View On Cov-Spectrum | View On Taxonium (Public Samples Only) | No Data Available | Get EPI ISLs | ORF1ab:F4649V,S:N417K | T14209G,T22813G | |
XBC.1.3.1 | XBC.1.3 | 60 | [0.10071042 0.21145987] | 2022-10-05 | 2023-02-07 | Australia | C6145T,G15743A | View On Cov-Spectrum | View On Taxonium (Public Samples Only) | No Data Available | Get EPI ISLs | ORF1ab:S5160N | No Reversions |
XBB.1.4.1.1 | XBB.1.4.1 | 275 | [0.08893083 0.16888427] | 2022-10-30 | 2023-02-03 | Sweden and Denmark | C12741T,C14922T | View On Cov-Spectrum | View On Taxonium (Public Samples Only) | Download Example Sequence FASTA (LAPIS) | Get EPI ISLs | ORF1ab:T4159I | No Reversions |
XBB.2.2.1 | XBB.2.2 | 93 | [0.04227181 0.27881254] | 2022-10-18 | 2023-02-04 | Spain, Germany, and England | G27870T | View On Cov-Spectrum | View On Taxonium (Public Samples Only) | Download Example Sequence FASTA (LAPIS) | Get EPI ISLs | ORF7b:E39* | No Reversions |
XBF.4 | XBF | 80 | [0.03033164 0.20510676] | 2022-12-01 | 2023-02-07 | England, Luxembourg, and Iceland | G625T,C1514T,C4252T | View On Cov-Spectrum | View On Taxonium (Public Samples Only) | Download Example Sequence FASTA (LAPIS) | Get EPI ISLs | ORF1ab:H417Y,ORF1ab:K120N | No Reversions |
XBB.2.5 | XBB.2 | 201 | [-0.03259708 0.05205796] | 2022-10-31 | 2023-02-03 | USA, England, and India | G23401T | View On Cov-Spectrum | View On Taxonium (Public Samples Only) | Download Example Sequence FASTA (LAPIS) | Get EPI ISLs | S:Q613H | No Reversions |
miscBA.5.2CJ.1.1 | miscBA.5.2CJ.1 | 114 | [-0.03782871 0.07828514] | 2022-10-27 | 2023-02-07 | Japan and England | G14829T | View On Cov-Spectrum | View On Taxonium (Public Samples Only) | No Data Available | Get EPI ISLs | ORF1ab:M4855I | No Reversions |
XBB.1.9.1.1 | XBB.1.9.1 | 73 | [-0.04567397 0.36771319] | 2023-01-02 | 2023-02-08 | Germany, USA, and France | G16741T | View On Cov-Spectrum | View On Taxonium (Public Samples Only) | Download Example Sequence FASTA (LAPIS) | Get EPI ISLs | ORF1ab:V5493F | No Reversions |
Immediate note: a lot of these look fairly good, though XBB.1.5.7.1 is probably spurious- its defined by two reversion mutations. |
Definitely -- the first reversion (T22813G) is a common dropout-reversion and the second reversion (T14209G) reverts the defining mutation of XBB.1.5.7! |
I see its cell is empty in the "Nucleotide changes" column, which I assume means "no nucleotide changes that aren't reversions" -- that might be a good reason to not propose a lineage. |
I get empty text from the link Get EPI ISLs for XBB.1.9.2.1. The headers look OK (aside from content-length: 0):
-- just no EPI_ISLs. |
Oh, maybe cov-spectrum doesn't yet have XBB.1.9.2. It was designated on Feb. 3, and added to the tree as of the 2023-02-25 build, but there hasn't yet been a pangolin data release that would include it (I'm working on that...). Or @corneliusroemer does CoV-Spectrum use nextclade and does that have XBB.1.9.2? [Edit: yep, if I alter the URL to have XBB.1.9 instead of XBB.1.9.2 then it returns plenty of IDs. I'm not suggesting that you do that! Just pointing it out as a workaround that humans looking into these can do.] |
Anyway, the XBB.1.9.2.1 proposal is good and it already has a pango-designation issue (as of 9 hours ago): cov-lineages#1664 -- except that doesn't include S:Q613H as written, and Fede Gueli pointed out the S:Q613H so good call there. 🙂 |
This looks really great- if there are a set of commonly seen reversion mutations, should we maintain a list and perhaps rule out auto-lineage suggestions based on these mutations? |
I'll review these carefully over the weekend. @FedeGueli, @Sinickle, @ryhisner if you'd like to have a look at the proposed lineages, it'd be great to have some extra eyes! |
I did add a filter to block proposals that have empty mutation sets with respect to their parents/are defined by reversions only after seeing XBB.1.5.7.1 in this test. I'd appreciate additional eyes on this, of course, and if you want to designate any of the proposals here, feel free, but this is still intended as a test PR- I'm planning on opening one directly to the pango-designation repo this weekend, such that with a quick bit of review it can be merged and the update will be complete in a single button press! |
Awesome! I realized after I asked for a filter that it would also be useful to be alerted that there's a faulty-looking branch like that... sometimes by removing a few problematic sequences and re-optimizing, I can get the sequences placed on a better branch so at least the reversion on the parent lineage-defining mutation is not necessary. Sorry to keep asking for things, but would it be possible to call out branches that were filtered for that reason? |
I could add a logging file to the pipeline that does it with minimal effort- I don't think it fits really with the actual pull request, though. Tracking that kind of branch is a good idea in general, though- could be a small project in setting up an automated system that scans daily builds for reversion-heavy paths and emails/posts issues when they're detected. This idea is related to some thoughts I've had around tracking saltations that might be due to Molnupiravir treatment and similar, actually. |
That would be great, thanks! |
Very willingly to do. How to do it? commenting directly down here? first i let here my thoughts on Now i'll finish the work on the xbb.1.5 spike issue and then coming back here. |
XBB.3.1.1 i think it is just the sequencing intensity of Denmark versus the other countries that makes it grow faster? or do your model account for that already? to be noted , this designated lineages has further the Orf6:61 mutations reverted. Maybe its advantage could come from there. |
XBK.1 defined by S:P1162S :i. compared it to the main branches and subranches of XBK on covspectrum and while likely is less transmissible than the big branch with C2701T that should be designated( it is not uk or dk) it has a relevant advantage versus the other branches. But its advantage in my estimation comes deeper in the tree starting after C14694A (Orf1b:D409E) and not immediately after S:P1162S |
XAY.2.3 with orf1a:A131V is faster than the recently designated XAY.2.1 XAY.2.2 in the upper branch of the tree . designation deserved. |
XBF.3.1 is slower than XBF.3. i would not designate it (although it is a significant branch of it , it is mainly a Netherlands sublineage with also a visible clusterization after C1204T, that could explain why your model flagged it) https://cov-spectrum.org/explore/World/AllSamples/Past2M/variants?aaMutations=orf1a%3Av274I&nextcladePangoLineage=XBF*&aaMutations1=ORF1a%3AP193S&nextcladePangoLineage1=xbf*&analysisMode=CompareToBaseline& |
XBB.1.4.2 Usher clean but i thinkmthe interesting part is just the Austrian branch with S:G75S, recent and very fast vs parent. I tested the S:E1188D part but it doesnt show any sign of advantage : |
XBB.1.5.11 has the reversion of the defining (S:G252V) of XBB.1 as defining. very unlikely to me. @AngieHinrichs |
XBB.1.9.3 although slower of its sibling top growing lineages .1 and .2 it seems clearly faster than the other XBB.1.9 branches. |
I ll try to end up the second half of lineages later. |
XBB.6.1.1 i dont see a valid reason to designate it beyond the fact being the only one big branch of XBB.6.1 |
XBB.1.15 ok 3000+ sequences but why to designate? it could stay XBB.1 withouth big thoughts. |
XBC.1.3.1 it doesnt seem to be competitive with XBC.1 but to have some advantage vs XBC.1.3 (that starts after 25614T) To me its potential fitness become more evident just after acquiring T6447C - ORF1a:V2061A, still a very slight advantage versus the grandma lineage XBC.1 . If proposed by me i would not advocate to designate it at this point. |
XBB.1.4.1 |
Thanks @FedeGueli! I independently reviewed the proposed lineages with comments (and Usher links) in this sheet: https://docs.google.com/spreadsheets/d/1vEKbYtX7HDfFtm20t-bGJ6Qog-l_mPVHF4Q3ZIkRNk8/edit?usp=sharing I've got similar thoughts to yours - the type of lineages proposed here are very different than the ones from #191. I preferred the ones from the previous PR. Not sure what the difference in settings was, maybe the inclusion of some sort of growth estimate in the new PR? Many of the proposed lineages are very small (<100) and encompass a large portion (>50%) of the parent lineage. I didn't designate these for now as they wouldn't be designated under the ordinary procedure through issues or systematic designation. Maybe Autolin is onto something - we can review again in a month and see if these branches exploded. A few specific points:
For various reasons, the PR can't be used/merged as is:
See further comments in #194 I've designated 7 lineages based on this PR, see https://github.com/cov-lineages/pango-designation/commits/master Reviewing took quite some time, a similar amount (or more) compared to reviewing designations proposed by the community as issues often contain reasoning and are less likely to be artefacts. They are currently less efficient from a designation perspective than manual systematic designations like last weekend the various BA.5/BQs. A few thoughts on time savers are in #194. Prebuilt Usher trees viewable with link would be a big time saver. |
Sorry to hear you're disappointed in the quality of these lineages and the necessary time to review. The primary difference between this and the last pull request is the series of designations put in in the meantime- many of the higher quality lineages it picked up were designated by you in advance of the release, were incorporated into the public builds in the days between the 9th and the 14, and otherwise overlapped with the proposed designations. This PR represents more edge cases and leftovers after the release, and I had to relax some parameters to get more than a handful of designations to examine. I appreciate the detailed feedback! I will go over it this week. To briefly address your points about why it can't be used/merged:
|
Question - if I read the paper correctly, then I believe the growth advantage of auto-lineages doesn't take into consideration the growth advantage of the parent lineage, right? I think this will result in many unimportant mutations on the fastest growing variants to be given their own designation. This makes sense if you are just trying to describe the things that are currently growing, but less sense if you're trying to describe the ways that things are meaningfully changing. For the piece on reversions -- |
@Sinickle The growth modeling of the autolineages does not consider the parent or any contextual information, no- but it is only used to filter and prioritize the output, not to generate designations itself. The foundation of my approach is about the agnostic representation of genotypes through lineages, allowing researchers to communicate about the genetic diversity of SARS-CoV-2 without inherent assumptions as to what mutations may or may not be important. I do incorporate weighting schema optionally into the pipeline, but don't actually apply mutation-level weighting for this particular output- though I do apply additional weight to underrepresented countries to encourage the designation of international lineages. Essentially, this method is attempting to describe the full breadth of diversity of active SARS-CoV-2 virions- the first of your two statements. Identifying what lineages are important a priori- "trying to describe the way things are meaningfully changing"- requires significant assumptions about the behavior of the viral genome that can easily be violated by epistatic effects or simply fitness effects more complex than reduced antibody binding. Navigating the competing philosophies of lineage designation- between representation of what is there, and prediction of what is important- has been a serious challenge in developing this work, given the wide diversity of opinions among the community as to the viability and importance of each of these functions. My stance has generally been that creating new names is relatively cheap and that we can identify what lineages are important or different after they are given names. Even if a new lineage doesn't appear epidemiologically distinct on its face, the genetic distinction confers the possibility for altered fitness as environments and context changes. I hope this serves as sufficient explanation to you and @FedeGueli and others who might question why this method would designate a sublineage that doesn't have immediate and obvious behavioral differences from the parent lineage or any mutations that we would consider interesting a priori. |
Hi @jmcbroome i didnt put any question. my only role here is (and i couldnt do more of thst)to compare old and new method for enhance its performance. |
@FedeGueli I meant in reply to your comment "XBB.1.15 ok 3000+ sequences but why to designate? it could stay XBB.1 withouth big thoughts." above. You seemed implicitly confused about why I might be designating lineages that don't have an obvious growth advantage, and my last comment attempted to address that question. I do appreciate your feedback, though. RE: countries with denser sequencing leading to more growth- I already control for this. I use growth stratified by country, and compute it as the percentage of all samples collected in each week. Being 5% of 100 samples from Japan is the same as being 5% of 10000 samples from England with respect to the model. There could still be some inherent biases resulting from sequencing strategy, of course- some countries intentionally sequence outbreaks, that are more likely to be closely related to one another, instead of doing unbiased population sequencing- but variation in overall sequencing volume shouldn't impact these estimates too much. |
Yep. And when I look at a public-tree taxonium query, or look at the CoV-Spectrum query sequences in the full tree in taxonium, they are spread out all over XBB.1.5 because the false reversion is not limited to that one cluster.
There is some tension between this request and the request to easily cherry-pick the lineages. If you accept only a subset of the proposed lineages, and autolin starts picking as-yet-unassigned aliases, then search-and-replaces will still be necessary in order to get correct aliases and it might be even more confusing than four-number proposed names that obviously are wrong and need alias conversion. |
A script that does the search and replace of four-number accepted lineage to new alias, including adding the alias to alias_key.json and noting it in lineage_notes.txt, would be helpful to avoid error-prone human/editor search and replace. |