Needed: Speed-optimized FASTA statistics script #7

jorvis · 2013-12-29T15:47:43Z

One of the really common tasks when given a FASTA file is to find the following statistics:

Total sequence count
Total base count
GC content
Longest sequence
Shortest sequence
Mean sequence length
Median sequence length
N50
N90

While this is trivial itself, what can get more interesting is finding the method to do it that performs the best. Because this will be an important component of a few other projects, speed and proper error handling is important. Most apps assume python, but I'm up for implementations in whatever language will give the best results here as long as they don't open up a huge can of worms dependency-wise.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Needed: Speed-optimized FASTA statistics script #7

Needed: Speed-optimized FASTA statistics script #7

jorvis commented Dec 29, 2013

Needed: Speed-optimized FASTA statistics script #7

Needed: Speed-optimized FASTA statistics script #7

Comments

jorvis commented Dec 29, 2013