-
Notifications
You must be signed in to change notification settings - Fork 11
[index] About
We are in the midst a devestating pandemic. To help the global research community develop a vaccine and therapies as efficiently as possible, we are uncovering as many coronaviruses as possible.
It is critically important to catalogue all coronaviruses and their animal reservoirs, since coronaviruses can mix RNA (recombine) resulting in new viral strains, and potentially new outbreaks. Since SARS-CoV-2 is a novel virus, it is of paramount importance to identify related viruses as they are potential sources for recombination.
The NCBI SRA database contains DNA and RNA sequencing data from millions of biologically diverse samples, collected over a decade from research labs across the world. We have undertaken a comprehensive re-analysis of the 10s of million gigabytes of data to catalogue every vertebrate virus in this data, especially rare or undiscovered coronaviruses.
Big data requires big computing. We've built a cloud architecture that allows us to access upto 22,250 CPU with Amazon Web Services. Using this method allows us to perform hundreds of years of computing in only a few hours and discovery these coronaviruses now.
Our primary goal is to generate the coronavirus data to accelerate the global research efforts in fighting SARS-CoV-2. This means sharing all data and tools immediately.
We adhere to the Bermuda Principles set out originally by the Human Genome Project, all data is freely and publicly available within 24 hours of generation. If there is a way CoV sequence data can assist your research, please reach out and we can work towards advancing COVID-19 related applications.
Reference
Records
Work in Progress
Stale