Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Lineages Update: 20 new lineages active through 2023-02-09 #191

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jmcbroome
Copy link
Owner

Lineage Name Parent Lineage Size Earliest Appearance Latest Appearance Regions Nucleotide Changes View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Open Sequence FASTA EPI ISLs Amino Acid Changes
auto.CH.1.1.1.1 CH.1.1.1 1564 2022-12-11 2023-02-03 England, Wales, and Denmark C1545T,T24991A,A19886G,T15591C,C25721T,T4402A View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF3a:A110V,ORF1ab:K6541R,ORF1ab:A427V
auto.BQ.1.1.18.1 BQ.1.1.18 2261 2022-12-11 2023-02-01 Japan T1562A,C25714T,G28079T View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF1ab:C433S,ORF3a:L108F
auto.XBB.1.12 XBB.1 3198 2022-12-11 2023-02-01 USA, Guatemala, and Mexico G27915T,C1884T View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF1ab:A540V,ORF8:G8*
auto.BA.5.2.48.1 BA.5.2.48 2062 2022-12-11 2023-02-03 China, Japan, and SouthKorea C11824T,C28994A View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs N:Q241K
auto.BQ.1.1.5.1 BQ.1.1.5 2105 2022-12-11 2023-02-02 USA A6790G,G11828A,G20518A View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF1ab:D6752N,ORF1ab:V3855I
auto.BQ.1.1.33 BQ.1.1 2283 2022-12-11 2023-01-30 USA, England, and Canada C19547T,C25821T View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF1ab:S6428L
auto.BA.5.2.6.1 BA.5.2.6 1253 2022-12-11 2023-02-02 Japan C26645T,C20115T,C19602T,C1929T View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF1ab:A555V,E:I9T
auto.BQ.1.29 BQ.1 1460 2022-12-11 2023-02-05 USA, Germany, and Japan C26833T,T12667C,A28389T View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs N:Q39L,M:A104V
auto.BF.7.16 BF.7 690 2022-12-11 2023-02-02 Japan T7111C,C16470T,A26219G,C1915T,A27747C,C28153T View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF8:T87I,ORF7a:R118S
auto.BQ.1.1.20.1 BQ.1.1.20 1365 2022-12-11 2023-02-03 Denmark C24771T View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs S:A1070V
auto.BA.5.1.13 BA.5.1 1419 2022-12-11 2023-02-02 Denmark C11530T,C2142T,C15720T,C25046T View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF1ab:P626L,S:P1162S
auto.BQ.1.2.2 BQ.1.2 2377 2022-12-11 2023-01-28 Canada and USA G26314C View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs E:V24L
auto.BF.7.15.1 BF.7.15 546 2022-12-11 2023-02-02 Japan G28077A,G12170A,A1558G View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF8:V62M,ORF1ab:I431M,ORF1ab:A3969T
auto.XBB.1.5.4 XBB.1.5 820 2022-12-11 2023-02-02 England, USA, and Canada T17124C,G14209T View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF1ab:V4649F
auto.BN.1.3.6 BN.1.3 1922 2022-12-11 2023-01-28 Japan, USA, and SouthKorea G27043A,G11146T,C7390A,C593T View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs M:R174Q,ORF1ab:H110Y,ORF1ab:M3627I
auto.BF.5.4 BF.5 5205 2022-12-11 2023-01-28 Japan C16954T,A8882G,C21114T,A28402G View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF1ab:I2873V
auto.CH.1.1.10 CH.1.1 1072 2022-12-11 2023-02-03 England, Ireland, and Germany G3791A,C21811T View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF1ab:A1176T
auto.BN.1.5.2 BN.1.5 871 2022-12-11 2023-01-31 England, Wales, and USA A10258G,G20774A View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF1ab:G6837D
auto.BN.1.4.2 BN.1.4 708 2022-12-11 2023-01-30 Denmark T3347C,G16912T,G18634A,C8782T View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF1ab:F1028L,ORF1ab:V5550L,ORF1ab:V6124M
auto.BQ.1.1.13.1 BQ.1.1.13 691 2022-12-11 2023-02-01 England and Scotland G13822A View On Cov-Spectrum View On Taxonium (Public Samples Only) Download Example Sequence FASTA (LAPIS) Get EPI ISLs ORF1ab:V4520I

@aineniamh
Copy link

All looking great so far, love the informative table with the PR too!
For the actual PR to pango-designation main, the auto lineage names shouldn't be prefixed with auto in the lineage_notes.txt file lineage field, the description should include that it's auto-generated, but including a different pattern will likely break websites, pipelines and aliasor scripts without an update. We could always add an additional field to the file that describes whether it's an autolineage or a manually curated lineage.
At this point we could also consider whether we want to start hosting the non-aliased name and more metadata about the lineage in the lineage_notes.txt too.
Plans are to release a pango-designation v2.0 with the introduction of autolin, so an opportunity to make some other changes too!
Really great work though, so excited for this to be put into use!

@corneliusroemer
Copy link

but including a different pattern [in lineage notes] will likely break websites, pipelines and aliasor scripts without an update

I don't think we currently make any real guarantees about the format of lineage_notes.txt - this is like a changelog for human consumption, but still of course good to be consistent.

At this point we could also consider whether we want to start hosting the non-aliased name and more metadata about the lineage in the lineage_notes.txt too.

I'm not quite sure what you mean, can you give an example? I think if we want to add structured data we should start fresh with something with a strict schema, notes are free text.

Here is some structured data I derive from various sources: https://raw.githubusercontent.com/corneliusroemer/pango-sequences/main/data/pango-consensus-sequences_summary.json

These are all auto-generatable without any manual curation - I guess that pattern is more useful for us. The less humans involved the less chance of inconsistency. But then some things aren't really auto-deductible. Would be curious what you are thinking of adding specifically.

Plans are to release a pango-designation v2.0 with the introduction of autolin, so an opportunity to make some other changes too!

If we use semver we'd only bump major version if we break backwards compatibility - what are we planning on breaking? I can't think of something off the top of my head.

@jmcbroome
Copy link
Owner Author

jmcbroome commented Feb 15, 2023

Thanks Aine! Reiterating my qualifications from the slack discussion, this particular example does not include the modeling step (as I was wanting to iterate more quickly while I check whether links are being generated and displayed correctly) and the EPI_ISL download link is misformatted in this particular PR (it will be fixed in the real one).

With regards to names, I've been including the "auto." prefix in order to be explicit in all cases about what I'm generating versus what's already in the system. I can simply remove the prefix easily enough, but there will be likely inconsistencies/potential name collisions. We will also have to keep in mind that we may need to change the names of new proposed lineages- for example, if we propose two sibling lineages XXX.1 and XXX.2, then reject 1 and accept 2, we need to rename XXX.2 to XXX.1 for consistency. That will probably be relatively rare, though, and could be handled on a case-by-case basis.

Also, in this particular case, it's claiming that all of the lineages were first sampled on 2023-12-11 - that's because I removed samples prior to that from consideration for this run. I may need to move the date further back, or otherwise look into having it more correctly report the earliest date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants