-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix upstream and downstream sequence of sequence-info and delins process #81
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #81 +/- ##
==========================================
+ Coverage 45.64% 46.45% +0.81%
==========================================
Files 16 16
Lines 2007 2090 +83
Branches 66 70 +4
==========================================
+ Hits 916 971 +55
- Misses 1025 1049 +24
- Partials 66 70 +4 ☔ View full report in Codecov by Sentry. |
acfbd2e
to
146ed99
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks PR. The strategy looks good to me! 🙆♂️
I pointed out some trivial points from a refactoring perspective.
Please check the comments.
src/varity/vcf_to_hgvs/protein.clj
Outdated
(defn- get-first-diff-aa-info | ||
[pos ref-seq alt-seq] | ||
(->> (map vector ref-seq alt-seq) | ||
(drop (dec pos)) | ||
(map-indexed vector) | ||
(drop-while (fn [[_ [r a]]] (= r a))) | ||
first | ||
((fn [[i [r a]]] | ||
{:ppos (+ pos i) | ||
:pref (str r) | ||
:palt (str a)})))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prettier to use varity.vcf-to-hgvs.common/diff-bases
:
(defn- get-first-diff-aa-info | |
[pos ref-seq alt-seq] | |
(->> (map vector ref-seq alt-seq) | |
(drop (dec pos)) | |
(map-indexed vector) | |
(drop-while (fn [[_ [r a]]] (= r a))) | |
first | |
((fn [[i [r a]]] | |
{:ppos (+ pos i) | |
:pref (str r) | |
:palt (str a)})))) | |
(defn- get-first-diff-aa-info | |
[pos ref-seq alt-seq] | |
(let [[ref-only alt-only offset _] (diff-bases ref-seq alt-seq (dec pos))] | |
(when (and (seq ref-only) | |
(seq alt-only)) | |
{:ppos (+ pos offset) | |
:pref (str (first ref-only)) | |
:palt (str (first alt-only))}))) |
with extending as follows:
(defn diff-bases
([s1 s2]
;; diff-bases original implementation
)
([s1 s2 offset]
(diff-bases (subs s1 offset) (subs s2 offset))))
src/varity/vcf_to_hgvs/protein.clj
Outdated
[pref palt ppos]) | ||
[del ins offset _] (diff-bases pref palt) | ||
ndel (count del) | ||
include-ter-site? (if ref-include-ter-site |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include-ter-site? (if ref-include-ter-site | |
alt-retain-ter-site? (if ref-include-ter-site |
Do you think renaming it would make more sense?
src/varity/vcf_to_hgvs/protein.clj
Outdated
(-> (pstring/split-at seq (dec pos)) | ||
last |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use subs
:
(-> (pstring/split-at seq (dec pos)) | |
last | |
(-> (subs seq (dec pos)) |
src/varity/vcf_to_hgvs/protein.clj
Outdated
(if ref-include-ter-site | ||
:indel | ||
:frame-shift)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try using cond
to expand nesting from L287.
src/varity/vcf_to_hgvs/protein.clj
Outdated
(some (fn [[_ e]] (when (<= e %) e)) (reverse alt-exon-ranges*)))] | ||
{:ref-exon-seq ref-exon-seq1 | ||
:ref-prot-seq (codon/amino-acid-sequence (cond-> ref-exon-seq1 | ||
(if ref-include-ter-site |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The apply-offset
function has gotten a bit too complicated. Please split it into a new function and simplify the implementation 🙏
src/varity/vcf_to_hgvs/protein.clj
Outdated
[ref alt] | ||
(or (and (= 1 (count alt)) | ||
(= (first ref) (first alt))) | ||
(and (not (= 1 (count ref) (count alt))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and (not (= 1 (count ref) (count alt))) | |
(and (not= 1 (count ref) (count alt)) |
Equivalent?
src/varity/vcf_to_hgvs/protein.clj
Outdated
(defn- is-deletion-variant? | ||
[ref alt] | ||
(or (and (= 1 (count alt)) | ||
(= (first ref) (first alt))) | ||
(and (not (= 1 (count ref) (count alt))) | ||
(not= (first ref) (first alt))))) | ||
|
||
(defn- cds-start-upstream-to-cds-variant? | ||
[cds-start pos ref] | ||
(and (< pos cds-start) | ||
(<= cds-start (dec (+ pos (count ref)))))) | ||
|
||
(defn- cds-to-cds-end-downstream-variant? | ||
[cds-end pos ref] | ||
(and (<= pos cds-end) | ||
(< cds-end (dec (+ pos (count ref)))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add simple tests for these functions 🙏
src/varity/vcf_to_hgvs/protein.clj
Outdated
ter-site-adjusted-alt-seq (make-ter-site-adjusted-alt-seq alt-exon-seq alt-up-exon-seq alt-down-exon-seq | ||
strand cds-start cds-end pos ref) | ||
ref-include-ter-site (include-ter-site? rg pos ref) | ||
apply-offset* (apply-offset pos ref alt alt-exon-ranges* ref-include-ter-site)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apply-offset
and apply-offset*
become more readable when you use the partial
function.
src/varity/vcf_to_hgvs/protein.clj
Outdated
(defn- is-deletion-variant? | ||
[ref alt] | ||
(or (and (= 1 (count alt)) | ||
(= (first ref) (first alt))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a tricky condition. How about checking the length of ref
?
635a539
to
bbd85b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. LGTM 🙆♂️
I fixed the sequence-info and delins because error occurred when gene is transcribed from reverse strand and variant includes stop codon and upstream sequence from cds.
I found upstream and downstream sequence is not corrected when deletion is extended to outside of cds.
So I fix upstream sequence from cds-start and downstream sequence from cds-end, protein alteration type determinning process and delins process.
I executed
lein run
andlein run :slow
and I confirmed all tests are passed.