-
Notifications
You must be signed in to change notification settings - Fork 8
Polaris In BaseSpace
Mitch Bekritsky edited this page Oct 17, 2018
·
7 revisions
- Introduction
- Getting access to the data in BaseSpace
- Data pipeline
- HowTos
- Multi-sample VCF
- Single-sample BAM
- Metadata
- Data access methods
- Tools and Services
The 150 WGS samples from Polaris 1 Diversity Cohort are available in BaseSpace, including variant calls and metadata.
- Log in to the Frankfurt instance of BaseSpace Sequence Hub: https://euc1.sh.basespace.illumina.com
- Open this link to get access to the shared Polaris 1 Diversity Cohort project: https://euc1.sh.basespace.illumina.com/s/ec3X5yWG3QEP
The Polaris 1 Diversity Cohort BSSH project is made of 150 samples that independently went through Whole Genome Sequencing alignment and variant calling, then went together through the GVCF Genotyper app and finally through Hail ingestion (apps available on request).
- FASTQ files are individually stored as BSSH BioSamples
- BAM files are stored in each sample's Whole Genome Sequencing appResult, accessible form the list of analyses
- The multi-sample VCF is stored as merged.vcf.gz in the GVCF Genotyper appResult
- The Hail VDS format (which contains the same multi-sample variants + metrics as the above-mentioned GVCF Genotyper app result, but converted to a format that can be queried by Hail) is stored in the Hail ingest appResult.
Sample metrics are:
- individually stored in each Whole Genome Sequencing appResult
- aggregated in the metadata.csv and metadata.header.csv files of the GVCF Genotyper appResult
- included in the Hail VDS data structure
The FASTQ and BAM files are also available from ENA.