The Polaris project provides
- Population sequencing resources on high throughput Illumina sequencing platforms
- Variant calls from multiple technologies, validated by population genetics and Mendelian methods
Further details of the sequencing resources, input data sources, genotyping methods and validation methods can be found in the project wiki.
Our latest truth set of Structural Variants (SVs) is v2.1. Please check our release-notes/v2.1 for details.
To download the SV truth set, please do:
Genome version hg38
wget https://s3-us-west-1.amazonaws.com/illumina-polaris-v2.1-sv-truthset/all_merge.vcf.gz
wget https://s3-us-west-1.amazonaws.com/illumina-polaris-v2.1-sv-truthset/all_merge.vcf.gz.tbi
Population cohorts with unrestricted access sequenced as part of Polaris are available through BaseSpace, the European Nucleotide Archive (ENA), and the Sequence Read Archive.
Additional cohorts are available through the EGA or dbGaP with restricted access subject to approval through a Data Access Committee. No variant calls are ever reported in Polaris for restricted access cohorts.
Further information the sequencing resources described below can be found in the [project wiki][0.3].
All HiSeq™ X PCR-Free data was generated by Illumina Laboratory Services (ILS) with a target whole genome coverage of 30X.
There are currently four unrestricted access cohorts available in Polaris:
- Diversity Cohort (BaseSpace, ENA, SRA) — 150 samples selected to represent a diversity of populations
- Kids Cohort (BaseSpace, ENA, SRA) — 50 children whose parents were sequenced as part of the Diversity cohort
- PGx Cohort (BaseSpace, ENA, SRA) — 70 samples with orthogonally validated genotypes for 28 genes relevant for PGx4
- PGx 10X© Cohort (ENA, SRA) — the same 70 samples from the PGx cohort, prepared with the 10X Genomics Chromium Controller™ and sequenced on the HiSeq™ 4000
There is also a restricted access repeat expansion cohort available through EGA.
- Parents & grandparents
- ENA — pending
- BaseSpace — pending
- Children
- dbGaP — pending
- Platinum Genomes pedigree
- NIST Ashkenazi Jewish trio
- Platinum Genomes Pedigree
- Platinum Genomes pedigree
- NIST Ashkenazi Jewish trio
When citing the repeat expansion cohort, please refer to the Expansion Hunter paper where it was originally described:
Dolzhenko, Egor, et al. "Detection of long repeat expansions from PCR-free whole-genome sequence data." Genome research 27.11 (2017): 1895-1903.
Please open an issue to provide feedback or ask questions.
- Eberle, et al (2017) A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Res. 27:157-164. doi:10.1101/gr.210500.116
- English, et al (2015) Assessing structural variation in a personal genome-towards a human reference diploid genome. BMC Genomics. 16:286 doi:10.1186/s12864-015-1479-3
- Kehr, et al (2017) Diversity in non-repetitive human sequences not found in the reference genome. Nat Genet. 49(4):588-593. doi: 10.1038/ng.3801
- Pratt, et al (2016) Characterization of 137 Genomic DNA Reference Materials for 28 Pharmacogenetic Genes: A GeT-RM Collaborative Project. J Mol Diagn. 18(1):109-23. doi:10.1016/j.jmoldx.2015.08.005
- Sedlazeck, et al (2018) Accurate detection of complex structural variations using single-molecule sequencing. Nat Method. 15:461-468.