-
Notifications
You must be signed in to change notification settings - Fork 128
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[translate] Fix reference sequence translation
For JSON inputs we were previously incorrectly exporting the root-sequence translations as the "reference". Instead, we now translate the provided reference (nuc) sequence. (There is some subtlety here because the provided nuc reference sequence may in fact be the root-sequence rather than an actual reference, but this is a problem with `augur ancestral`. See <#1362> for more details.) This allows us to compare the reference translation to the root-sequence translation and thus detail any AA mutations at the root node. A side-effect of this is that we now always export an array of mutations for each gene/CDS at the root node, although this may often be empty. This brings the behaviour of JSON inputs in-line with that of VCF inputs.
- Loading branch information
1 parent
2a9f585
commit e744c86
Showing
5 changed files
with
69 additions
and
20 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
Setup | ||
|
||
$ export AUGUR="${AUGUR:-$TESTDIR/../../../../bin/augur}" | ||
$ export SCRIPTS="$TESTDIR/../../../../scripts" | ||
$ export ANC_DATA="$TESTDIR/../../ancestral/data/simple-genome" | ||
$ export DATA="$TESTDIR/../data/simple-genome" | ||
|
||
This is the same as the "general.t" test, but we are modifying the input data | ||
such that the reference sequence contains "G" at pos 20 (1-based), and | ||
include a compensating mutation G20A on the root node. | ||
This results in the reference translation of gene1 to be MPCE* not MPCG*. | ||
(Note that the compensating nuc mutation doesn't actually need to be present | ||
in the JSON, `augur translate` just looks at the sequence attached to each node.) | ||
|
||
$ sed '29s/^/ "G20A",\n/' "$ANC_DATA/nt_muts.ref-seq.json" | | ||
> sed 's/"nuc": "AAAAAAAAAATGCCCTGCGGG/"nuc": "AAAAAAAAAATGCCCTGCGAG/' > nt_muts.json | ||
|
||
$ "${AUGUR}" translate \ | ||
> --tree "$ANC_DATA/tree.nwk" \ | ||
> --ancestral-sequences nt_muts.json \ | ||
> --reference-sequence "$DATA/reference.gff" \ | ||
> --output-node-data "aa_muts.json" > /dev/null | ||
|
||
The output should be a gene1 reference of MPCE* (not MPCG*). The root-sequence | ||
is unchanged (MPCG*). There is also a mutation E4G at the root node to compensate. | ||
|
||
$ python3 "$SCRIPTS/diff_jsons.py" \ | ||
> "$DATA/aa_muts.json" \ | ||
> "aa_muts.json" \ | ||
> --exclude-regex-paths "root\['annotations'\]\['.+'\]\['seqid'\]" | ||
{'values_changed': {"root['reference']['gene1']": {'new_value': 'MPCE*', 'old_value': 'MPCG*'}}, 'iterable_item_added': {"root['nodes']['node_root']['aa_muts']['gene1'][0]": 'E4G'}} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters