Clojure(Script) library for handling HGVS.
clj-hgvs provides:
- Data structure for HGVS
- HGVS text parser
- HGVS text formatter
Clojure CLI/deps.edn:
clj-hgvs {:mvn/version "0.5.0"}
Leiningen/Boot:
[clj-hgvs "0.5.0"]
To use clj-hgvs with Clojure 1.8, you must include a dependency on clojure-future-spec.
- Fix uncertain bases and amino acids format of ins and delins because HGVS nomenclature was updated.
- DNA:
ins(10)
->insN[10]
- RNA:
ins(10)
->insn[10]
- Protein:
ins10
->insX[10]
- DNA:
- HGVS data structure changes from map to record (
clj-hgvs.core/HGVS
). - (n)cdna is renamed to (non-)coding-dna to avoid misunderstanding.
See CHANGELOG for more information.
(require '[clj-hgvs.core :as hgvs])
;; `parse` parses a HGVS text, returning a HGVS record.
(def hgvs1 (hgvs/parse "NM_005228.3:c.2573T>G"))
hgvs1
;;=> #clj_hgvs.core.HGVS
;; {:transcript "NM_005228.3"
;; :kind :coding-dna
;; :mutation #clj_hgvs.mutation.DNASubstitution
;; {:coord #clj_hgvs.coordinate.CodingDNACoordinate
;; {:position 2573
;; :offset 0
;; :region nil}
;; :ref "T"
;; :type ">"
;; :alt "G"}}
;; `format` returns a HGVS text.
(hgvs/format hgvs1)
;;=> "NM_005228.3:c.2573T>G"
#clj-hgvs/hgvs
tagged literal is useful for easy and readable definition of a
HGVS data.
#clj-hgvs/hgvs "NM_005228.3:c.2573T>G"
clj-hgvs.core/format
has various options for specifying HGVS styles.
(hgvs/format #clj-hgvs/hgvs "NM_005228.3:c.2307_2308insGCCAGCGTG"
{:ins-format :count})
;;=> "NM_005228.3:c.2307_2308ins(9)"
(hgvs/format #clj-hgvs/hgvs "p.Leu858Arg"
{:amino-acid-format :short})
;;=> "p.L858R"
See API reference for all formatter options.
clj-hgvs.core/==
tests the fundamental equivalence of the given HGVS.
(hgvs/== #clj-hgvs/hgvs "NM_005228:c.2361G>A"
#clj-hgvs/hgvs "NM_005228.4:c.2361G>A")
;;=> true
(hgvs/== #clj-hgvs/hgvs "p.K53Afs*9"
#clj-hgvs/hgvs "p.Lys53Alafs")
;;=> true
(hgvs/== #clj-hgvs/hgvs "p.L858R"
#clj-hgvs/hgvs "p.L858M")
;;=> false
plain
converts HGVS data to a plain map, and restore
converts the map back
to the HGVS data. These functions are useful for sending HGVS data through
another codec.
(hgvs/plain #clj-hgvs/hgvs "NM_005228.3:c.2573T>G")
;;=> {:transcript "NM_005228.3"
;; :kind "coding-dna"
;; :mutation {:mutation "dna-substitution"
;; :coord {:coordinate "coding-dna"
;; :position 2573
;; :offset 0
;; :region nil}
;; :ref "T"
;; :type ">"
;; :alt "G"}}
(hgvs/restore *1)
;;=> #clj-hgvs/hgvs "NM_005228.3:c.2573T>G"
repair-hgvs-str
attempts to repair an invalid HGVS text.
(hgvs/repair-hgvs-str "c.123_124GC>AA")
;;=> "c.123_124delGCinsAA"
The repair rules are based on frequent mistakes in popular public-domain databases such as dbSNP and ClinVar.
You may supply custom repair rules to the second argument:
(require '[clojure.string :as string]
'[clj-hgvs.repairer :as repairer])
(defn lower-case-ext
[s kind]
(if (= kind :protein)
(string/replace s #"EXT" "ext")
s))
(def my-repairers (conj repairer/built-in-repairers
lower-case-ext))
(hgvs/repair-hgvs-str "p.*833EXT*?" my-repairers)
;;=> "p.*833ext*?"
Copyright 2017-2024 Xcoo, Inc.
Licensed under the Apache License, Version 2.0.