Gff annotations #354

jameshadfield · 2019-08-15T05:47:58Z

Moves feature annotation to a subset of GFF syntax. Closes #187, as future extensions can be added by extending the GFF features we support as needed. This results in changes to the output of augur translate and augur export v2 (see below). The format of mutations is unchanged.

annotations

where	`start`	`end`	`strand`	`seqid`	`type`
GFF input	1-based	fully-closed	`"+"` or `"-"`
genbank input	1-based	fully-closed	`"+"` or `"-"`
`augur translate` output	1-based	fully-closed	`"+"` or `"-"`	included	included
`augur export v1` output	0-based	half-open	`1` or `-1`	not included	not included
`augur export v2` output	1-based	fully-closed	`"+"` or `"-"`	included	included

mutations

nt & aa are both 1-based

…ncluding specifying strand as +/- and going to 1-based locations.

…in translate. It might be more sensible to move this into export in case no translations are done.

…d by the accession number

Output from `augur translate` and `augur export v2` is GFF-like. `augur export v1` produces BED-like coordinates. See JSON schemas for details.

rneher · 2019-08-15T08:00:30Z

with a few minor changes, we can also make augur use annotations like this:

KX369547.1		genome	1	10769			0
KX369547.1		gene	90	456		+	0	gene "CA";
KX369547.1		gene	456	735		+	0	gene "PRO";
KX369547.1		gene	735	960		+	0	gene "MP";
KX369547.1		gene	960	2472		+	0	gene "ENV";
KX369547.1		gene	2472	3528		+	0	gene "NS1";
KX369547.1		gene	3528	4206		+	0	gene "NS2A";
KX369547.1		gene	4206	4596		+	0	gene "NS2B";
KX369547.1		gene	4596	6447		+	0	gene "NS3";
KX369547.1		gene	6447	6828		+	0	gene "NS4A";
KX369547.1		gene	6828	6897		+	0	gene "2K";
KX369547.1		gene	6897	7650		+	0	gene "NS4B";
KX369547.1		gene	7650	10359		+	0	gene "NS5";

This already works for vcf, but it would be quite straight forward to also allow this for fasta alignments. The reference sequence could then be supplied as fasta or simply by name and we don't need to mess around with genbank files anymore. this gff/tsv is much easier to edit.
(it is also straightforward to parse such that we could dump BioGFF and just do it ourselves.)

emmahodcroft · 2019-08-15T10:11:21Z

I'd def like to keep supporting GenBank though - at least for me that's the fastest way to get annotations for something new. Genbank's GFF export doesn't seem to capture all features. But agree that if you're working on something long-term (or sufficiently in need of editing) this would be an easier format to edit and maintain.

Definitely try out any new GFF parser on the TB GFF & others before dumping BioGFF for them - they can be much less tidy than your example! 😅

rneher and others added 7 commits August 12, 2019 14:17

schema_meta: Following GFF format here would require a few changes, i…

2690e0f

…ncluding specifying strand as +/- and going to 1-based locations.

augur/translate: add seqid and type fields to annotation compilation …

74125f5

…in translate. It might be more sensible to move this into export in case no translations are done.

augur.translate: use reference file name as ref id, should be replace…

ac426eb

…d by the accession number

fix feature type for vcf

4209ecb

fixup: change "str" to "string" in schema

0d0fb3d

augur.translate: Update comment for JSON schema

33435cb

update export v1 to maintain BED-like feature annotation coordinates

f0635dd

Output from `augur translate` and `augur export v2` is GFF-like. `augur export v1` produces BED-like coordinates. See JSON schemas for details.

jameshadfield mentioned this pull request Aug 15, 2019

Move to GFF-style annotations nextstrain/auspice#770

Merged

jameshadfield merged commit f0635dd into v6 Aug 15, 2019

jameshadfield deleted the gff_annotations branch August 15, 2019 21:02

jameshadfield mentioned this pull request Aug 15, 2019

Genome annotations aren't very generic #187

Closed

ivan-aksamentov mentioned this pull request May 17, 2024

feat: use Auspice JSON as a dataset nextstrain/nextclade#1455

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gff annotations #354

Gff annotations #354

jameshadfield commented Aug 15, 2019 •

edited

Loading

rneher commented Aug 15, 2019

emmahodcroft commented Aug 15, 2019

Gff annotations #354

Gff annotations #354

Conversation

jameshadfield commented Aug 15, 2019 • edited Loading

annotations

mutations

rneher commented Aug 15, 2019

emmahodcroft commented Aug 15, 2019

jameshadfield commented Aug 15, 2019 •

edited

Loading