-
Notifications
You must be signed in to change notification settings - Fork 16
Pisces VCF Specifications
tamsen edited this page Nov 12, 2016
·
1 revision
The following specification document is valid for both somatic VCF and gVCF formatted files.
SDS ID | Specification |
---|---|
VCF-1 | The application shall write a header section at the top of the VCF file with the following header lines. These lines shall have the format “##{key}={value}”. The keys and their descriptions are given below: |
Key | Description |
---|---|
fileformat | Version of vcf format, which is “VCFv4.1”. |
fileDate | Date in YYYYMMDD format. |
Source | Application name and version, e.g. “CallSomaticVariants 1.0.0.0” |
CallSomaticVariants_cmdline | Command line call for the program, including all arguments. |
Reference | File name for reference genome fasta file. |
INFO | Description of INFO fields used in the file. There is one INFO header line for each field in the file. |
FILTER | Description of FILTER fields used in the file. There is one FILTER header line for each field in the file. |
FORMAT | Description of FORMAT fields used in the file. There is one FORMAT header line for each field in the file. |
contig | List of processed chromosomes and their lengths. There is a contig header line for each chromosome. Format is “##contig=<ID={chrName},length={length}” |
SDS ID | Specification |
---|---|
VCF-2 | The application shall write the following INFO and FORMAT lines to the VCF header, if the associated configuration rule is satisfied. These lines shall have the format: “##{Key}=<ID={FieldName},Number={Number},Type={Type},Description={Description}”. |
Key | Field name | Number | Type | Description | Configuration Rule |
---|---|---|---|---|---|
INFO | DP | 1 | Integer | Total Depth | None |
FORMAT | GT | 1 | String | Genotype | None |
FORMAT | GQ | 1 | Integer | Genotype Quality | None |
FORMAT | AD | . | Integer | Allele Depth | None |
FORMAT | DP | . | Integer | Total Depth Used For Variant Calling | None |
FORMAT | VF | . | Float | Variant Frequency. One number if 0/0 or 0/1. Two numbers for 1/2 | None |
FORMAT | NL | 1 | Integer | Applied BaseCall Noise Level | Debug mode enabled, or outputting bias files, or strand bias threshold < 1 |
FORMAT | SB | 1 | Float | StrandBias Score | Debug mode enabled, or outputting bias files, or strand bias threshold < 1 |
FORMAT | NC | 1 | Float | Fraction of bases which were uncalled or with basecall quality below the minimum threshold | Report no calls enabled |
SDS ID | Specification |
---|---|
VCF-3 | The application write the following FILTER lines to the VCF header, if the associated configuration rule is satisfied. FILTER lines shall have format “##{Key}=<ID={FieldName}, Description={Description}”. |
Key | FieldName | Description | Configuration Rule |
---|---|---|---|
FILTER | q{threshold}, e.g. “q20” | Quality below {thresholdValue} | Minimum variant score configured > 0. |
FILTER | LowDP | Low coverage (DP tag), therefore no genotype called | Minimum coverage configured > 0. |
FILTER | SB | One of the following, depending on the rule: | Strand bias threshold configured > 0. |
FILTER | SB | A)Variant strand bias too high | Strand bias threshold configured > 0. |
FILTER | SB | B)Variant support on only one strand | Filter variants on only one strand |
FILTER | SB | C)Variant strand bias too high or coverage on only one strand | Three possible rules: |
SDS ID | Specification |
---|---|
VCF-4 | The application shall write a data section to the VCF file as a tab-delimited table below the header section. |
VCF-5 | The application shall write a column header line, as below, at the top of the data section of a VCF file. The column header line shall be prefixed by a single “#” and have the following format. The SampleName is set to the input BAM file name (without extension). |
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT {SampleName}
SDS ID | Specification |
---|---|
VCF-6 | By default VCF mode, after the column header line, the data section of a VCF file shall have one line per variant allele. |
VCF-7 | If gVCF mode is selected, after the column header line, the data section of a VCF file shall have one line per allele (reference or variant). |
VCF-8 | If CrushVcf mode is selected, after the column header line, the data section of a VCF file shall have one line per genomic loci. |
VCF-9 | For each data line item, the application shall write the following values to the data section of a VCF file, as below: |
Column Name | Value |
---|---|
CHROM | Chromosome name |
POS | Reference position |
ID | Source ID for variant, always “.”. This columns is provided for downstream annotators to update as appropriate. |
REF | Reference allele |
ALT | Alternate (variant) allele |
QUAL | Variant Quality Score |
FILTER | “PASS” if no filters. Otherwise, comma-separated list of filter names, e.g. “LowDP,SB”. |
INFO | Comma-separated list of INFO name and value pairs, in the format “{name}={value}”. Currently only supporting DP INFO field, e.g. “DP=500”. |
FORMAT | Colon-separated list of field names, e.g. “GT:GQ:AD”. |
{SampleName} | Colon-separated list of FORMAT field values. |
SDS ID | Specification |
---|---|
VCF-10 | For each data line time, the application shall write the following FORMAT fields to the data section in a VCF file, if the associated configuration rule is satisfied. As below: |
FORMAT Field Name | Field Value | Configuration Rule |
---|---|---|
GT | Genotype | None |
GQ | Genotype Quality Score | None |
AD | If variant call, value is “{X},{Y}” where X is the reference depth and Y is the allele depth. If reference call, value is allele depth | None |
DP | Total coverage depth used in variant calling | None |
VF | Variant frequency | None |
NL | Estimated basecall quality | Debug mode enabled, or outputting bias files, or strand bias threshold < 1 |
SB | Strand bias score | Debug mode enabled, or outputting bias files, or strand bias threshold < 1 |
NC | No call frequency or fraction | Report no calls enabled |
- Pisces 5.2.10 Design Document
- Pisces 5.2.10 Supported Options
- Scylla 5.2.10 Design Document
- Stitcher 5.2.10 Design Document
- VQR 5.2.10 Design Document
- VennVcf 5.2.10 Design Document
- Gemini 5.2.10 Design Document
- AdaptiveGenotyper 5.2.10 Design Document
- Pisces Tools 5.2.10
- Suggested Pipeline Configuration 5.2.10
- Pisces 5.2.9 Quick Start
- Pisces 5.2.9 Design Document
- Pisces 5.2.9 Supported Options
- Scylla 5.2.9 Design Document
- Stitcher 5.2.9 Design Document
- VQR 5.2.9 Design Document
- VennVcf 5.2.9 Design Document
- Pisces Tools 5.2.9
- Suggested Pipeline Configuration 5.2.9
- Pisces 5.2.7 Quick Start
- Pisces 5.2.7 Design Document
- Pisces 5.2.7 Supported Options
- Scylla 5.2.7 Design Document
- Stitcher 5.2.7 Design Document
- VQR 5.2.7 Design Document
- VennVcf 5.2.7 Design Document
- Pisces Tools 5.2.7
- Suggested Pipeline Configuration 5.2.7
- Pisces 5.2.5 Design Document
- Pisces 5.2.5 Supported Options
- Scylla 5.2.5 Design Document
- Stitcher 5.2.5 Design Document
- VQR 5.2.5 Design Document
- Suggested Pipeline Configuration 5.2.5
- Pisces 5.2.0 Design Document
- Pisces 5.2.0 Supported Options
- Scylla 5.2.0 Design Document
- Stitcher 5.2.0 Design Document
- VQR 5.2.0 Design Document
- Suggested Pipeline Configuration 5.2.0
- Pisces Suite 5.2.0 Known Issues and Limitations