-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix VCF to protein HGVS conversions of insertions near splice sites #68
Conversation
Codecov Report
@@ Coverage Diff @@
## master #68 +/- ##
==========================================
- Coverage 45.67% 45.63% -0.05%
==========================================
Files 16 16
Lines 1990 2003 +13
Branches 63 64 +1
==========================================
+ Hits 909 914 +5
- Misses 1018 1025 +7
- Partials 63 64 +1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the fix. LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@ToshimitsuArai @niyarin Thank you for your reviews! @nokara26 I marked this PR as ready for review. Would you take a look at it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the fix! LGTM👍
Summary
This PR fixes a problem in
varity.vcf-to-hgvs/vcf-variant->protein-hgvs
reported by #67.Problem
The test case of #67 fails on master branch returning an incorrect value,
p.R1335Qfs*954
As ARID1A is on the
:forward
strand and there's a repeated sequence ofGCA
around the variant, 3'-rule realigns the variant to the following one:This can be illustrated as follows. The correct output should be
p.Q1334_R1335insQQ
.Cause
varity.vcf-to-hgvs.protein/->protein-variant
splits REF and ALT AA sequences into three parts each:varity/src/varity/vcf_to_hgvs/protein.clj
Lines 152 to 164 in e401a36
At L162, a genomic position
(+ pos (count alt) -1)
is converted into a transcript coordinate byvarity.vcf-to-hgvs.protein/protein-position
to calculate the end index of the direct effect (2.) in the ALT AA sequence.But the position
(+ pos (count alt) -1)
, which evaluates to be26774720
, is inside of an intron and thus be clamped to the start position of the subsequent exon:26773802
varity/src/varity/vcf_to_hgvs/protein.clj
Lines 105 to 110 in e401a36
resulting in an incorrect splitting and an incorrect HGVS
p.R1335Qfs*954
Fix
This PR fixes the coordinate converting by shifting feature positions in refGene entry by applying ALT.
varity/src/varity/vcf_to_hgvs/protein.clj
Lines 100 to 103 in 1edd963
Along with
:exon-ranges
which is already calculated asalt-exon-ranges*
,:cds-(start|end)
are converted using the same fn:varity.vcf-to-hgvs.protein/alt-exon-ranges
.With these changes applied,
(+ pos (count alt) -1)
will no longer come in an intron, allowing the correct conversion of positions.Known issues
Since the current implementation relies only on positions, it will fail if a realigned variant contains a boundary of exon and intron.
varity/test/varity/vcf_to_hgvs_test.clj
Lines 275 to 286 in 6f322e0
This PR does not address the issue. When such a situation is detected, it just throws an exception instead of returning an incorrect result. 6f322e0