-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tweak GENCODE loading performance #55
Conversation
Codecov Report
@@ Coverage Diff @@
## master #55 +/- ##
==========================================
+ Coverage 45.35% 45.65% +0.30%
==========================================
Files 16 16
Lines 1958 1969 +11
Branches 60 60
==========================================
+ Hits 888 899 +11
Misses 1010 1010
Partials 60 60
Continue to review full report at Codecov.
|
@@ -193,14 +201,14 @@ | |||
:strand strand)))) | |||
|
|||
(defn load-gencode | |||
[f parse-line] | |||
[f parse-line & {:keys [chunk-size] :or {chunk-size 10000}}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to add this optional arg to both load-gtf
and load-gff
because currently these functions are easier to use. Or reimplement load-gencode
to pick up an appropriate function based on f
's extension (sorry my first implementation is not well designed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I had missed it.
BTW, I think the GTF/GFF3 loader should use cljam's implementation (https://github.com/chrovis/cljam/blob/master/src/cljam/io/gff.clj). I am willing to do a complete rewrite on that.
b449e30
to
aad6305
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for quick revision. LGTM
This PR improves GENCODE's GTF/GFF3 loading performance with parallel processing using
pmap
, etc.Test code:
Loading time: