Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Lineages Update: 20 new lineages active through 2023-02-15 #193

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jmcbroome
Copy link
Owner

Lineage Name Parent Lineage Size Exponential Growth Coefficient CI Earliest Appearance Latest Appearance Regions Nucleotide Changes View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Open Sequence FASTA EPI ISLs Amino Acid Changes Nucleotide Reversions
XBB.1.9.2.1 XBB.1.9.2 111 [0.46792743 0.7430019 ] 2023-01-02 2023-02-08 Austria, Australia, and Germany C28928T,G23401T View On Cov-Spectrum View On Taxonium (Public Samples Only) No Data Available Get EPI ISLs S:Q613H,N:L219F No Reversions
XBB.1.5.2.1 XBB.1.5.2 59 [0.36211221 0.73847185] 2022-12-28 2023-02-01 USA A22002T,C5221T View On Cov-Spectrum View On Taxonium (Public Samples Only) No Data Available Get EPI ISLs S:K147I No Reversions
XBB.3.1.1 XBB.3.1 17 [0.35314583 0.91739771] 2022-12-04 2023-02-04 Denmark and Netherlands C10263T,C9042T View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF1ab:S2926F,ORF1ab:A3333V No Reversions
XBM.1 XBM 219 [0.35292579 0.46126096] 2022-11-20 2023-02-07 Canada T18429C,C12534T View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF1ab:T4090I No Reversions
XBK.1 XBK 180 [0.28522462 0.38813739] 2022-07-27 2023-02-08 Slovenia, Germany, and Italy C25046T View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs S:P1162S No Reversions
XAY.2.3 XAY.2 82 [0.28382549 0.43708748] 2022-11-25 2023-02-08 Denmark A22034G,C657T View On Cov-Spectrum View On Taxonium (Public Samples Only) No Data Available Get EPI ISLs ORF1ab:A131V,S:R158G No Reversions
XBF.3.1 XBF.3 90 [0.28305407 0.42545791] 2022-11-02 2023-02-04 Netherlands, Australia, and England C842T View On Cov-Spectrum View On Taxonium (Public Samples Only) No Data Available Get EPI ISLs ORF1ab:P193S No Reversions
XBB.1.4.2 XBB.1.4 40 [0.23443399 0.55073977] 2022-10-16 2023-02-06 Italy, Thailand, and Austria G1148T,T26160C View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF1ab:G295C No Reversions
XBB.1.5.11 XBB.1.5 860 [0.23090608 0.27395253] 2022-11-09 2023-02-07 USA, England, and Canada T17124C View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs S:V252G T22317G
XBB.1.9.3 XBB.1.9 62 [0.19420051 0.7738821 ] 2022-12-07 2023-02-04 Netherlands, Spain, and England G18169T,G19480A View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF1ab:G5969C,ORF1ab:G6406S No Reversions
XBB.6.1.1 XBB.6.1 48 [0.18384719 0.40605432] 2022-11-27 2023-02-03 USA C19895T View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF1ab:A6544V No Reversions
XBB.1.15 XBB.1 3385 [0.16053001 0.17860341] 2022-02-16 2023-02-08 USA, Guatemala, and Mexico G27915T,C1884T View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF1ab:A540V,ORF8:G8* No Reversions
XBB.1.5.7.1 XBB.1.5.7 114 [0.12235608 0.27905442] 2022-12-01 2023-02-04 USA, Germany, and Mexico View On Cov-Spectrum View On Taxonium (Public Samples Only) No Data Available Get EPI ISLs ORF1ab:F4649V,S:N417K T14209G,T22813G
XBC.1.3.1 XBC.1.3 60 [0.10071042 0.21145987] 2022-10-05 2023-02-07 Australia C6145T,G15743A View On Cov-Spectrum View On Taxonium (Public Samples Only) No Data Available Get EPI ISLs ORF1ab:S5160N No Reversions
XBB.1.4.1.1 XBB.1.4.1 275 [0.08893083 0.16888427] 2022-10-30 2023-02-03 Sweden and Denmark C12741T,C14922T View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF1ab:T4159I No Reversions
XBB.2.2.1 XBB.2.2 93 [0.04227181 0.27881254] 2022-10-18 2023-02-04 Spain, Germany, and England G27870T View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF7b:E39* No Reversions
XBF.4 XBF 80 [0.03033164 0.20510676] 2022-12-01 2023-02-07 England, Luxembourg, and Iceland G625T,C1514T,C4252T View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF1ab:H417Y,ORF1ab:K120N No Reversions
XBB.2.5 XBB.2 201 [-0.03259708 0.05205796] 2022-10-31 2023-02-03 USA, England, and India G23401T View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs S:Q613H No Reversions
miscBA.5.2CJ.1.1 miscBA.5.2CJ.1 114 [-0.03782871 0.07828514] 2022-10-27 2023-02-07 Japan and England G14829T View On Cov-Spectrum View On Taxonium (Public Samples Only) No Data Available Get EPI ISLs ORF1ab:M4855I No Reversions
XBB.1.9.1.1 XBB.1.9.1 73 [-0.04567397 0.36771319] 2023-01-02 2023-02-08 Germany, USA, and France G16741T View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF1ab:V5493F No Reversions

@jmcbroome
Copy link
Owner Author

Immediate note: a lot of these look fairly good, though XBB.1.5.7.1 is probably spurious- its defined by two reversion mutations.

@AngieHinrichs
Copy link

though XBB.1.5.7.1 is probably spurious- its defined by two reversion mutations.

Definitely -- the first reversion (T22813G) is a common dropout-reversion and the second reversion (T14209G) reverts the defining mutation of XBB.1.5.7!

@AngieHinrichs
Copy link

I see its cell is empty in the "Nucleotide changes" column, which I assume means "no nucleotide changes that aren't reversions" -- that might be a good reason to not propose a lineage.

@AngieHinrichs
Copy link

I get empty text from the link Get EPI ISLs for XBB.1.9.2.1. The headers look OK (aside from content-length: 0):

HTTP/2 200 
server: nginx/1.18.0 (Ubuntu)
date: Fri, 17 Feb 2023 02:40:41 GMT
content-type: text/plain
content-length: 0
vary: Origin
vary: Access-Control-Request-Method
vary: Access-Control-Request-Headers
lapis-data-version: 1676538737
content-disposition: inline
cache-control: no-store

-- just no EPI_ISLs.

@AngieHinrichs
Copy link

AngieHinrichs commented Feb 17, 2023

Oh, maybe cov-spectrum doesn't yet have XBB.1.9.2. It was designated on Feb. 3, and added to the tree as of the 2023-02-25 build, but there hasn't yet been a pangolin data release that would include it (I'm working on that...). Or @corneliusroemer does CoV-Spectrum use nextclade and does that have XBB.1.9.2?

[Edit: yep, if I alter the URL to have XBB.1.9 instead of XBB.1.9.2 then it returns plenty of IDs. I'm not suggesting that you do that! Just pointing it out as a workaround that humans looking into these can do.]

@AngieHinrichs
Copy link

Anyway, the XBB.1.9.2.1 proposal is good and it already has a pango-designation issue (as of 9 hours ago): cov-lineages#1664 -- except that doesn't include S:Q613H as written, and Fede Gueli pointed out the S:Q613H so good call there. 🙂

@aineniamh
Copy link

aineniamh commented Feb 17, 2023

This looks really great- if there are a set of commonly seen reversion mutations, should we maintain a list and perhaps rule out auto-lineage suggestions based on these mutations?

@corneliusroemer
Copy link

I'll review these carefully over the weekend. @FedeGueli, @Sinickle, @ryhisner if you'd like to have a look at the proposed lineages, it'd be great to have some extra eyes!

@jmcbroome
Copy link
Owner Author

I did add a filter to block proposals that have empty mutation sets with respect to their parents/are defined by reversions only after seeing XBB.1.5.7.1 in this test.

I'd appreciate additional eyes on this, of course, and if you want to designate any of the proposals here, feel free, but this is still intended as a test PR- I'm planning on opening one directly to the pango-designation repo this weekend, such that with a quick bit of review it can be merged and the update will be complete in a single button press!

@AngieHinrichs
Copy link

I did add a filter to block proposals that have empty mutation sets with respect to their parents/are defined by reversions only after seeing XBB.1.5.7.1 in this test.

Awesome! I realized after I asked for a filter that it would also be useful to be alerted that there's a faulty-looking branch like that... sometimes by removing a few problematic sequences and re-optimizing, I can get the sequences placed on a better branch so at least the reversion on the parent lineage-defining mutation is not necessary. Sorry to keep asking for things, but would it be possible to call out branches that were filtered for that reason?

@jmcbroome
Copy link
Owner Author

jmcbroome commented Feb 17, 2023

I could add a logging file to the pipeline that does it with minimal effort- I don't think it fits really with the actual pull request, though. Tracking that kind of branch is a good idea in general, though- could be a small project in setting up an automated system that scans daily builds for reversion-heavy paths and emails/posts issues when they're detected. This idea is related to some thoughts I've had around tracking saltations that might be due to Molnupiravir treatment and similar, actually.

@AngieHinrichs
Copy link

I could add a logging file to the pipeline that does it with minimal effort

That would be great, thanks!

@FedeGueli
Copy link

FedeGueli commented Feb 17, 2023

I'll review these carefully over the weekend. @FedeGueli, @Sinickle, @ryhisner if you'd like to have a look at the proposed lineages, it'd be great to have some extra eyes!

Very willingly to do. How to do it? commenting directly down here?

first i let here my thoughts on
The ones i already "looked at" in the last weeks
1 XBB.1.9.2.1 already proposed ok
2 i noticed xbb.1.5+s:K147I in steep rise too! so also the second one is clearly ok.
BUT defining mutations are two S:t284I and S:k147I
Schermata 2023-02-18 alle 00 53 04
( i think today is already bigger than this) and please note that it gained further S:E619K very similar to S:E619Q designated in one of the first sublineages of BQ.1
EDITED it is already XBB.1.5.2 i think maybe better to start it from S:k147I than designating a new sublineage
3 XBM.1: already looked at it and commented a bit i think in one issue of xbm+455f : it is clearly the most fit branch of XBM so designating could have sense. tree is clear.
4 miscBA.5.2CJ.1.1 was the one i proposed named third recombinant with CJ.1 or something similar. i closed it after we stated there were no way to be sure it was not just XBK even if any hypothetical ancestor without the defining mutations of both was never sampled. to me it os ok to be designated.

Now i'll finish the work on the xbb.1.5 spike issue and then coming back here.

@FedeGueli
Copy link

FedeGueli commented Feb 18, 2023

XBB.3.1.1 i think it is just the sequencing intensity of Denmark versus the other countries that makes it grow faster? or do your model account for that already? to be noted , this designated lineages has further the Orf6:61 mutations reverted. Maybe its advantage could come from there.
https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_3716a_967d0.json?c=country&label=id:node_8000678

@FedeGueli
Copy link

FedeGueli commented Feb 18, 2023

XBK.1 defined by S:P1162S :i. compared it to the main branches and subranches of XBK on covspectrum and while likely is less transmissible than the big branch with C2701T that should be designated( it is not uk or dk) it has a relevant advantage versus the other branches. But its advantage in my estimation comes deeper in the tree starting after C14694A (Orf1b:D409E) and not immediately after S:P1162S

https://cov-spectrum.org/explore/World/AllSamples/Past2M/variants?nucMutations=C2701T&nextcladePangoLineage=XBK*&aaMutations1=S%3A1162S&nucMutations1=C14694A&nextcladePangoLineage1=XBK*&analysisMode=CompareToBaseline&

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_3b372_9afe0.json?c=country&label=id:node_8300141

@FedeGueli
Copy link

XBF.3.1 is slower than XBF.3. i would not designate it (although it is a significant branch of it , it is mainly a Netherlands sublineage with also a visible clusterization after C1204T, that could explain why your model flagged it)

https://cov-spectrum.org/explore/World/AllSamples/Past2M/variants?aaMutations=orf1a%3Av274I&nextcladePangoLineage=XBF*&aaMutations1=ORF1a%3AP193S&nextcladePangoLineage1=xbf*&analysisMode=CompareToBaseline&
https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_12ec0_a8a20.json?label=id:node_8294552

@FedeGueli
Copy link

@FedeGueli
Copy link

FedeGueli commented Feb 18, 2023

XBB.1.5.11 has the reversion of the defining (S:G252V) of XBB.1 as defining. very unlikely to me. @AngieHinrichs

@FedeGueli
Copy link

XBB.1.9.3 although slower of its sibling top growing lineages .1 and .2 it seems clearly faster than the other XBB.1.9 branches.
Interestingly it gained further spike mutation S:L212S but surprisingly it seems to slow it down ( at the opposite we had seen in BA.2 with @corneliusroemer in the first quarter of 2022)
https://cov-spectrum.org/explore/World/AllSamples/Past2M/variants?nucMutations=C11758T&nextcladePangoLineage=XBB.1.9*&nucMutations1=G18169T%2CG19480A&nextcladePangoLineage1=XBB.1.9*&analysisMode=CompareToBaseline&

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_1a253_b5d60.json?c=userOrOld&label=id:node_8017574

@FedeGueli
Copy link

FedeGueli commented Feb 18, 2023

I ll try to end up the second half of lineages later.
@AngieHinrichs @corneliusroemer my personal thought from a basic variant seeker person (so not speaking of any bioinfo thing) the overall quality of lineages picked up is good. I suggest to exclude reversions from the game maybe just for the first phase .

@FedeGueli
Copy link

XBB.6.1.1 i dont see a valid reason to designate it beyond the fact being the only one big branch of XBB.6.1
No growth advantage.
https://cov-spectrum.org/explore/World/AllSamples/Past2M/variants?nextcladePangoLineage=XBB.6.1*&nucMutations1=C19895T&nextcladePangoLineage1=XBB.6.1*&analysisMode=CompareToBaseline&
tree clear but quite not interesting:
https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_30cc7_d3620.json?c=userOrOld&label=id:node_7998784

@FedeGueli
Copy link

FedeGueli commented Feb 18, 2023

XBB.1.15 ok 3000+ sequences but why to designate? it could stay XBB.1 withouth big thoughts.
(only note is that it has a very good prevalence in Ecuador)
https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_3c9b0_e09e0.json?label=id:node_8013121
https://cov-spectrum.org/explore/World/AllSamples/Past2M/variants?nextcladePangoLineage=XBB.1&nucMutations1=G27915T%2CC1884T&nextcladePangoLineage1=XBB.1*&analysisMode=CompareToBaseline&

@FedeGueli
Copy link

XBC.1.3.1 it doesnt seem to be competitive with XBC.1 but to have some advantage vs XBC.1.3 (that starts after 25614T)

To me its potential fitness become more evident just after acquiring T6447C - ORF1a:V2061A, still a very slight advantage versus the grandma lineage XBC.1 . If proposed by me i would not advocate to designate it at this point.
https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_351b9_e22b0.json?label=id:node_8651060
https://cov-spectrum.org/explore/World/AllSamples/Past3M/variants?nextcladePangoLineage=XBC.1*&nucMutations1=C6145T%2CG15743A%2C6447C&analysisMode=CompareToBaseline&

@FedeGueli
Copy link

XBB.1.4.1
Also here i have doubts ok it is the biggest branch of XBB.1.4 , but as usually happens this means not so much in a not growing fast lineage.
Checking with other branches it has no clear advantage vs them, often it has disadvantage.
Also taking account a Danish effect in artificailly boosting some lil branches i sont think this deserves a .1 or it is helpful to designate.
https://cov-spectrum.org/explore/World/AllSamples/Past3M/variants?nucMutations=C16260T&nextcladePangoLineage=XBB.1.4.1*&aaMutations1=ORF1a%3AT4159I&nextcladePangoLineage1=XBB.1.4.1*&analysisMode=CompareToBaseline&
https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_377df_e51c0.json?label=id:node_8060493

corneliusroemer added a commit to cov-lineages/pango-designation that referenced this pull request Feb 19, 2023
@corneliusroemer
Copy link

Thanks @FedeGueli! I independently reviewed the proposed lineages with comments (and Usher links) in this sheet: https://docs.google.com/spreadsheets/d/1vEKbYtX7HDfFtm20t-bGJ6Qog-l_mPVHF4Q3ZIkRNk8/edit?usp=sharing

I've got similar thoughts to yours - the type of lineages proposed here are very different than the ones from #191. I preferred the ones from the previous PR. Not sure what the difference in settings was, maybe the inclusion of some sort of growth estimate in the new PR?

Many of the proposed lineages are very small (<100) and encompass a large portion (>50%) of the parent lineage. I didn't designate these for now as they wouldn't be designated under the ordinary procedure through issues or systematic designation. Maybe Autolin is onto something - we can review again in a month and see if these branches exploded.

A few specific points:

  • As @FedeGueli noted, the proposed XBB.1.5.11 has the reversion of the defining (S:G252V) of XBB.1 as defining. Likely artefact
  • The proposed XAY.2.3 relies on S:158G, that mutation is apparently defining of XAY.2 generally - just often not correctly sequenced in Denmark. It's very homplasic on the XAY.2 subtree, hence should not be used as defining mutation. If you leave out the Spike mutation it's not clear why it should be designated over the other branches of XAY.2
  • Two proposed lineages have issues attached to them, the ones designated as XBQ and XBK.1

For various reasons, the PR can't be used/merged as is:

  • Strain names are not the way they need to be (they contain dates and EPI_ISLs and | separator)
  • One big commit rather than 1 commit per lineage make it hard to impossible to cherrypick those that we want to designate
  • Lineage note entries are not in right place (below the parent lineage) but at the very end of the notes
  • Lineages with already 3 levels don't get an alias (which they should get) and there's also no entry in alias_key.json creating that alias (e.g. XBB.1.9.2.1 should be called EG.1 and get appropriate entry in alias_key.json

See further comments in #194

I've designated 7 lineages based on this PR, see https://github.com/cov-lineages/pango-designation/commits/master

Reviewing took quite some time, a similar amount (or more) compared to reviewing designations proposed by the community as issues often contain reasoning and are less likely to be artefacts. They are currently less efficient from a designation perspective than manual systematic designations like last weekend the various BA.5/BQs. A few thoughts on time savers are in #194. Prebuilt Usher trees viewable with link would be a big time saver.

@jmcbroome
Copy link
Owner Author

jmcbroome commented Feb 19, 2023

Sorry to hear you're disappointed in the quality of these lineages and the necessary time to review. The primary difference between this and the last pull request is the series of designations put in in the meantime- many of the higher quality lineages it picked up were designated by you in advance of the release, were incorporated into the public builds in the days between the 9th and the 14, and otherwise overlapped with the proposed designations. This PR represents more edge cases and leftovers after the release, and I had to relax some parameters to get more than a handful of designations to examine.

I appreciate the detailed feedback! I will go over it this week.

To briefly address your points about why it can't be used/merged:

  1. I use the sample names as they are present in the UShER tree and associated metadata file- this is obviously an oversight. I can apply some processing to extract the appropriate subsection.
  2. I can return to the issue output. I had originally moved away from issue output as part of the goal was to prevent the maintainers from having to select/update the notes and lineages.csv files manually. I could potentially open multiple parallel pull requests, but this could lead to merge conflicts. Your suggestion to split each into separate commits is also potentially viable, of course.
  3. I wasn't aware the notes were sorted in this way.
  4. I've been using your pango_aliasor tool to sort out lineage compression without worrying about the details of its implementation. On closer examination, it appears it can only handle compressions predefined in the JSON you mention. I can look into updating a local copy of the JSON, but it will take a bit of time to sort out the implementation. Do you programmatically identify what names are available when doing novel third-level compression? If so, do you have that code available publicly? Or perhaps pango_aliasor can do that, and I'm misapplying it somehow?

@Sinickle
Copy link

Question - if I read the paper correctly, then I believe the growth advantage of auto-lineages doesn't take into consideration the growth advantage of the parent lineage, right?

I think this will result in many unimportant mutations on the fastest growing variants to be given their own designation. This makes sense if you are just trying to describe the things that are currently growing, but less sense if you're trying to describe the ways that things are meaningfully changing.

For the piece on reversions --
At least to me, before I believe a reversion is real, I want to see it accompanied by some other unique mutations, and see that when those unique mutations are present the reversion is much more likely to be present, and that this is true in multiple countries.

@jmcbroome
Copy link
Owner Author

@Sinickle The growth modeling of the autolineages does not consider the parent or any contextual information, no- but it is only used to filter and prioritize the output, not to generate designations itself. The foundation of my approach is about the agnostic representation of genotypes through lineages, allowing researchers to communicate about the genetic diversity of SARS-CoV-2 without inherent assumptions as to what mutations may or may not be important. I do incorporate weighting schema optionally into the pipeline, but don't actually apply mutation-level weighting for this particular output- though I do apply additional weight to underrepresented countries to encourage the designation of international lineages.

Essentially, this method is attempting to describe the full breadth of diversity of active SARS-CoV-2 virions- the first of your two statements. Identifying what lineages are important a priori- "trying to describe the way things are meaningfully changing"- requires significant assumptions about the behavior of the viral genome that can easily be violated by epistatic effects or simply fitness effects more complex than reduced antibody binding.

Navigating the competing philosophies of lineage designation- between representation of what is there, and prediction of what is important- has been a serious challenge in developing this work, given the wide diversity of opinions among the community as to the viability and importance of each of these functions. My stance has generally been that creating new names is relatively cheap and that we can identify what lineages are important or different after they are given names. Even if a new lineage doesn't appear epidemiologically distinct on its face, the genetic distinction confers the possibility for altered fitness as environments and context changes.

I hope this serves as sufficient explanation to you and @FedeGueli and others who might question why this method would designate a sublineage that doesn't have immediate and obvious behavioral differences from the parent lineage or any mutations that we would consider interesting a priori.

@FedeGueli
Copy link

Hi @jmcbroome i didnt put any question. my only role here is (and i couldnt do more of thst)to compare old and new method for enhance its performance.
Maybe if i can tell from my point of experience that Uk and Denmark are big distorsions to real growth advantage. Removing or weighing them at the max will help a lot.

@jmcbroome
Copy link
Owner Author

@FedeGueli I meant in reply to your comment "XBB.1.15 ok 3000+ sequences but why to designate? it could stay XBB.1 withouth big thoughts." above. You seemed implicitly confused about why I might be designating lineages that don't have an obvious growth advantage, and my last comment attempted to address that question.

I do appreciate your feedback, though. RE: countries with denser sequencing leading to more growth- I already control for this. I use growth stratified by country, and compute it as the percentage of all samples collected in each week. Being 5% of 100 samples from Japan is the same as being 5% of 10000 samples from England with respect to the model. There could still be some inherent biases resulting from sequencing strategy, of course- some countries intentionally sequence outbreaks, that are more likely to be closely related to one another, instead of doing unbiased population sequencing- but variation in overall sequencing volume shouldn't impact these estimates too much.

@AngieHinrichs
Copy link

XBB.1.5.11 has the reversion of the defining (S:G252V) of XBB.1 as defining. very unlikely to me. @AngieHinrichs

Yep. And when I look at a public-tree taxonium query, or look at the CoV-Spectrum query sequences in the full tree in taxonium, they are spread out all over XBB.1.5 because the false reversion is not limited to that one cluster.

Lineages with already 3 levels don't get an alias (which they should get) and there's also no entry in alias_key.json creating that alias (e.g. XBB.1.9.2.1 should be called EG.1 and get appropriate entry in alias_key.json

There is some tension between this request and the request to easily cherry-pick the lineages. If you accept only a subset of the proposed lineages, and autolin starts picking as-yet-unassigned aliases, then search-and-replaces will still be necessary in order to get correct aliases and it might be even more confusing than four-number proposed names that obviously are wrong and need alias conversion.

@AngieHinrichs
Copy link

A script that does the search and replace of four-number accepted lineage to new alias, including adding the alias to alias_key.json and noting it in lineage_notes.txt, would be helpful to avoid error-prone human/editor search and replace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants