Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repair illegal substitutions to an inversion #3

Merged
merged 7 commits into from
Sep 5, 2020

Conversation

nuggetoriniku
Copy link
Contributor

I added two repairer functions to correct illegal hgvs strings.
Could you review this?

@codecov
Copy link

codecov bot commented Aug 28, 2020

Codecov Report

Merging #3 into master will increase coverage by 0.08%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master       #3      +/-   ##
==========================================
+ Coverage   88.96%   89.04%   +0.08%     
==========================================
  Files           5        5              
  Lines        1803     1817      +14     
  Branches      127      127              
==========================================
+ Hits         1604     1618      +14     
  Misses         72       72              
  Partials      127      127              
Impacted Files Coverage Δ
src/clj_hgvs/repairer.cljc 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a920313...6a16cac. Read the comment docs.

Copy link
Member

@totakke totakke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for PR. I added some comments. And you should add test cases to clj-hgvs.core-test/repair-hgvs-str-test.

@@ -87,6 +87,12 @@
(string/replace s #"([-\d+*]+_[-\d+*]+)([A-Z]+)?>([A-Z]+)" "$1del$2ins$3")
s))

;; c.233_234TC>CT -> c.233_234delinsCT
(defn ^:no-doc substitutions->delins*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot understand the necessity of this repair rule. The repair case written as the comment is covered by existing substitutions->delins.

(require '[clj-hgvs.core :as hgvs]
         '[clj-hgvs.repairer :as repairer])

(hgvs/repair-hgvs-str "c.233_234TC>CT" [repairer/substitutions->delins])
;;=> "c.233_234delTCinsCT"

If you want to omit the deletion bases, you can use :show-bases? false option.

(-> (hgvs/parse "c.233_234delTCinsCT")          
    (hgvs/format {:show-bases? false}))
;;=> "c.233_234delinsCT"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review.
I removed substitutions->delins* function.

;; c.2210_2211CA>TG -> c.2210_2211inv
(defn ^:no-doc basis->inv
[s kind]
(let [base-pairs {\A \T \T \A \C \G \G \C}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An inversion mutation is not appeared in protein mutations.

(if (#{:genome :mitochondria :coding-dna :non-coding-dna :circular-dna :rna}
     kind)
  ...
  s)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set the condition for exclude :protein

@@ -102,6 +108,19 @@
(string/replace s #"(del|dup|inv)\d+$" "$1")
s))

;; c.2210_2211CA>TG -> c.2210_2211inv
(defn ^:no-doc basis->inv
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to substitutions->inv and add it to above substitution->delins in built-in-repairers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed to substitutions->inv and added it in built-in-repairers.

Comment on lines 115 to 121
[_ region b a] (re-find #"(\d+_\d+)([A-Z]+)>([A-Z]+)" s)]
(if (= a
(->> b
(map base-pairs)
reverse
(apply str)))
(str "c." region "inv")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex does not cover all coordinate patterns. Additionally, I recommend using replace fn.

(letfn [(revcomp [s]
          (->> (reverse s)
               (map {\A \T \T \A \C \G \G \C})
               (apply str)))]
  (string/replace s
                  #"([-\d+*]+_[-\d+*]+)([A-Z]{2,})>([A-Z]{2,})"
                  (fn [[s coords del ins]]
                    (if (= del (revcomp ins))
                      (str coords "inv")
                      s))))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the regex to cover all coordinate patterns, and changed process with using replace fn.

Copy link
Member

@totakke totakke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, your fix is perfect, but a test is still missing (#3 (review)).

Could you add c.2210_2211CA>TG -> c.2210_2211inv test case to just above the following line?

;; substitution->delins

Copy link
Member

@totakke totakke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I plan to merge it after @federkasten approval.

Copy link
Member

@federkasten federkasten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in reviewing.
I have checked. No additional comments. Thanks! 👍

@totakke totakke changed the title To correct illegal hgvs strings Repair illegal substitutions to an inversion Sep 5, 2020
@totakke totakke merged commit b4d2dda into master Sep 5, 2020
@totakke totakke deleted the feature/repairer-for-illegal-hgvs branch September 5, 2020 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants