Skip to content

Polaris In BaseSpace

Mitch Bekritsky edited this page Oct 17, 2018 · 7 revisions

Table of Contents

Introduction

The 150 WGS samples from Polaris 1 Diversity Cohort are available in BaseSpace, including variant calls and metadata.

Getting access to the data in BaseSpace

  1. Log in to the Frankfurt instance of BaseSpace Sequence Hub: https://euc1.sh.basespace.illumina.com
  2. Open this link to get access to the shared Polaris 1 Diversity Cohort project: https://euc1.sh.basespace.illumina.com/s/ec3X5yWG3QEP

Data pipeline

The Polaris 1 Diversity Cohort BSSH project is made of 150 samples that independently went through Whole Genome Sequencing alignment and variant calling, then went together through the GVCF Genotyper app and finally through Hail ingestion (apps available on request).

  • FASTQ files are individually stored as BSSH BioSamples
  • BAM files are stored in each sample's Whole Genome Sequencing appResult, accessible form the list of analyses
  • The multi-sample VCF is stored as merged.vcf.gz in the GVCF Genotyper appResult
  • The Hail VDS format (which contains the same multi-sample variants + metrics as the above-mentioned GVCF Genotyper app result, but converted to a format that can be queried by Hail) is stored in the Hail ingest appResult.

Sample metrics are:

The FASTQ and BAM files are also available from ENA.