Skip to content

Formats

Lydia Buntrock edited this page Nov 25, 2021 · 9 revisions

Annotation I/O

VCF & BCF GFF GFF2 & GTF* GFF3 BED
reference_sequence(s) Only diff (optional)
Region of reference_sequences(s) x x x x x
Annotation data variants feature feature feature Regions (less than gff)
Is File Seekable Yes No Use Case? No Use Case? No Use Case? No Use Case?

VCF & BCF GFF GFF2 & GTF* GFF3 BED
Specific Header / Metainformation x no header no header No header
Header line x no
CHROM x seqname Reference sequence x
feature*** Name (optional)
source x x
method x type
POS x start & end start & end start & end start & end
thickStart x (optional)
thickEnd x (optional)
itemRgb x (optional)
blockCount x (optional)
blockSizes x (optional)
blockStarts x (optional)
ID x seqid
ALT x
QUAL x score score score score (optional)
strand x x x x (optional)
frame x phase phase
group x
FILTER x
INFO**** x attribute attributes**** *
FORMAT**** ** (optional) x
SAMPLES (optional) x

*The GTF is identical to GFF version 2.

***Gene, Variation, Similarity

****INFO: Arbitrary keys are permitted, although some sub-fields are reserved (albeit optional).

**** *attributes: ID, Name, Alias, Parent, Target, Gap, Derives_from, Note, Dbxref, Ontology_term

**** **FORMAT Tags: AD, ADF, ADR, DP, EC, FT, GL, GP , GQ, GT, HQ, MQ, PL, PQ, PS

Input / Output:

Formats:

VCF (Variant Call Format)*, BCF, GFF (General Feature Format), GFF2 (deprecated), GTF (General Transfer Format), GFF3, BED (Browser Extensible Data), GVF

HGVS vs BED vs GVF vs VCF Format example: https://www.ncbi.nlm.nih.gov/variation/tools/reporter/docs/examples#section-1.2.3

Format Overviews of Broad Institute of MIT and Harvard and UCSC (University of California, Santa Cruz)

Format specification:

Desired format conversions

VCF to GFF: http://seqanswers.com/forums/showthread.php?t=9796&highlight=gff+vcf%3C/a

*The VCF specification is no longer maintained by the 1000 Genomes Project. The group leading the management and expansion of the format is the Global Alliance for Genomics and Health (GA4GH) Large Scale Genomics Work Stream file format team[7], http://ga4gh.org/#/fileformats-team (Wikipedia).

Format specification:

VCF & BCF: https://samtools.github.io/hts-specs/VCFv4.3.pdf & https://en.wikipedia.org/wiki/Variant_Call_Format & http://vcftools.sourceforge.net/VCF-poster.pdf

Review for all GFF and GTF formats: https://github.com/NBISweden/GAAS/blob/master/annotation/knowledge/gxf.md -> https://github.com/NBISweden/GAAS/blob/master/annotation/knowledge/gxf.md#main-points-and-differences-between-gff-formats & https://github.com/NBISweden/GAAS/blob/master/annotation/knowledge/gxf.md#main-points-and-differences-between-gtf-formats

GFF: https://www.ensembl.org/info/website/upload/gff.html

GFF2: http://gmod.org/wiki/GFF2

GFF3: http://gmod.org/wiki/GFF3 & https://en.wikipedia.org/wiki/General_feature_format

BED: https://m.ensembl.org/info/website/upload/bed.html

Probably desired format conversions:

VCF to GFF: http://seqanswers.com/forums/showthread.php?t=9796&highlight=gff+vcf%3C/a

GFF to BED: https://bedops.readthedocs.io/en/latest/content/reference/file-management/conversion/gff2bed.html

GTF to GFF: http://seqanswers.com/forums/showthread.php?t=8321

BAM, GFF, GTF, GVF, PSL, RepeatMasker annotation output (OUT), SAM, VCF and WIG to BED: https://bedops.readthedocs.io/en/latest/content/reference/file-management/conversion/convert2bed.html