Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a case study on Viridian SARS-Cov-2 data? #172

Open
jeromekelleher opened this issue Oct 8, 2024 · 0 comments
Open

Add a case study on Viridian SARS-Cov-2 data? #172

jeromekelleher opened this issue Oct 8, 2024 · 0 comments

Comments

@jeromekelleher
Copy link
Contributor

A fun example to include would be the Viridian SARS-Cov-2 data, which has ~4M whole genomes.

The data is available as a tar archive of FASTAs here

It would be simplest to write a script to just do the conversion sequentially say that we could imagine doing a fasta2zarr program at some point. It would illustrate some nice points, that we can store whole alignments, and the format is flexible enough to include "-" as a gap character for deletions (although these may be removed here, annoyingly).

Preprint: https://www.biorxiv.org/content/10.1101/2024.04.29.591666v1.full.pdf

It's worth doing if it's a day or two's work, no more. I'll have a go at some point, as I'm quite well up on SARS2 data at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant