Releases: kingsfordgroup/sailfish
Salmon beta v0.2.1
What is Salmon?
It's a type of fish! But, in the context of RNA-seq, it's the
successor to Sailfish (a rapid "alignment-free" method for transcript
quantification from RNA-seq reads).
Why use Salmon?
Well, Salmon is designed with a lot of what we learned from Sailfish
in mind. We're still adding to and improving Salmon, but it already
has a number of benefits over Sailfish; for example:
- Salmon can make better use of paired-end information than Sailfish. The
algorithm used in Salmon considers both ends of a paired-end read
simultaneously, rather than independently as does Sailfish. Therefore it is
able to make use of any increased specificity that results from this
information. - Salmon has a smaller memory footprint that Sailfish. While the
quantification phase of Sailfish already has a fairly compact memory
footprint, Salmon's is even smaller. Additionally, building the Salmon index
(which is only required if you don't use the alignment-based mode) requires
substantially less memory, and can be faster, than building the Sailfish
index. - Salmon can use (but doesn't require) pre-computed alignments. If you want,
you can use Salmon much like Sailfish, by building an index and feeding it
raw reads. However, if you already have reads from your favorite, super-fast
aligner (cough cough maybe STAR), you can use them with Salmon to quantify
the transcript abundances. - Salmon is fast --- in all of our testing so far, it's actually faster than
Sailfish (though the same relationship doesn't hold between the actual fish).
Further details are contained in the online documentation.
Salmon Beta v0.2.0
What is Salmon?
It's a type of fish! But, in the context of RNA-seq, it's the
successor to Sailfish (a rapid "alignment-free" method for transcript
quantification from RNA-seq reads).
Why use Salmon?
Well, Salmon is designed with a lot of what we learned from Sailfish
in mind. We're still adding to and improving Salmon, but it already
has a number of benefits over Sailfish; for example:
- Salmon can make better use of paired-end information than Sailfish. The
algorithm used in Salmon considers both ends of a paired-end read
simultaneously, rather than independently as does Sailfish. Therefore it is
able to make use of any increased specificity that results from this
information. - Salmon has a smaller memory footprint that Sailfish. While the
quantification phase of Sailfish already has a fairly compact memory
footprint, Salmon's is even smaller. Additionally, building the Salmon index
(which is only required if you don't use the alignment-based mode) requires
substantially less memory, and can be faster, than building the Sailfish
index. - Salmon can use (but doesn't require) pre-computed alignments. If you want,
you can use Salmon much like Sailfish, by building an index and feeding it
raw reads. However, if you already have reads from your favorite, super-fast
aligner (cough cough maybe STAR), you can use them with Salmon to quantify
the transcript abundances. - Salmon is fast --- in all of our testing so far, it's actually faster than
Sailfish (though the same relationship doesn't hold between the actual fish).
Further details are contained in the online documentation.
Salmon Beta Release
What is Salmon?
It's a type of fish! But, in the context of RNA-seq, it's the
successor to Sailfish (a rapid "alignment-free" method for transcript
quantification from RNA-seq reads).
Why use Salmon?
Well, Salmon is designed with a lot of what we learned from Sailfish
in mind. We're still adding to and improving Salmon, but it already
has a number of benefits over Sailfish; for example:
- Salmon can make better use of paired-end information than Sailfish. The
algorithm used in Salmon considers both ends of a paired-end read
simultaneously, rather than independently as does Sailfish. Therefore it is
able to make use of any increased specificity that results from this
information. - Salmon has a smaller memory footprint that Sailfish. While the
quantification phase of Sailfish already has a fairly compact memory
footprint, Salmon's is even smaller. Additionally, building the Salmon index
(which is only required if you don't use the alignment-based mode) requires
substantially less memory, and can be faster, than building the Sailfish
index. - Salmon can use (but doesn't require) pre-computed alignments. If you want,
you can use Salmon much like Sailfish, by building an index and feeding it
raw reads. However, if you already have reads from your favorite, super-fast
aligner (cough cough maybe STAR), you can use them with Salmon to quantify
the transcript abundances. - Salmon is fast --- in all of our testing so far, it's actually faster than
Sailfish (though the same relationship doesn't hold between the actual fish).
Further details contained in the README.md in the download.
Version 0.6.3
- Now handles the presence of 'N' characters in the reference transcripts. Currently, these k-mers are just discarded and the effective length of the reference transcripts are adjusted accordingly.
- Fixed numerical instability caused by very small probabilities that would sometimes lead to NaNs in quantification.
- Interface Change: Changed the specification of read files to the
quant
phase of Sailfish. The user must now provide a library format string (the specification of which is described in the README and the manual). The format string informs Sailfish about e.g. the relative orientation of the reads. While not all of this information is currently used, this change was made in anticipation of an upcoming feature, which will allow Sailfish to perform very rapid quantification of abundance using read alignments, if the user already has these available or needs to compute them anyway for other analyses. - Significant speed improvements to the optimization procedure during the
quant
phase. This allows one to perform many more EM steps in substantially less time. - Changed the default convergence criterion to be data-based rather than a fixed number if iterations. The old behavior (a fixed number of iterations) can be mimicked by setting the
-i
option with the desired number of iterations and setting the-m
option to 0. However, the new default convergence criteria is generally recommended. - Fixed a bug in the computation of the read counts of bias-corrected quantification estimates.
- Added a new output type, the estimated number of k-mers. This is like the estimated number of reads originating from a transcript, but is somewhat more natural for Sailfish as it reports the number of k-mers, which are the fundamental unit of coverage.
- Added initial logging support, which is part of an ongoing effort to improve error handling and messages in Sailfish (this is currently only activated under Linux, until I can figure out how to get g2log to work with g++ under OSX).
There are a number of important improvements and bug-fixes in this release, and we strongly encourage all users to upgrade to version 0.6.3 at this time. There are also some new features, which are close to completion, and we anticipate the time between this release and 0.6.4 to be much smaller than the time between 0.6.2 and 0.6.3.
Version 0.6.2
- Moved computation of k-mer equivalence classes to the index-building phase. This substantially reduces the memory usage during estimation as well as the size of several of the stored indexes. The algorithm used to compute the equivalence classes was also changed from a parallel-hashing based algorithm to a divisive partition refinement algorithm. This latter algorithm is more suitable to the per-transcript processing that happens during the indexing phase.
- Implemented reading from named pipes and input redirection. Sequencing
reads can now be streamed in from a named pipe (e.g. using process
substitution syntax.) - The indexing phase now uses the streaming read parser instead of the mem-mapping parser. This now allows the entire pipeline to be run without using the mem-mapping parser (which may cause issues on a small number of systems). To force usage of the streaming read parser during quantification, just create a named pipe to stream in the reads. For example, if the reads are in the files
reads1.fq
andreads2.fq
, you can create a named-pipe to stream in all of the reads by passing<(cat reads1.fq reads2.fq)
to the-r
option. This will trigger usage of the streaming read parser instead of the mem-mapping parser. - Implemented direction-aware k-mer counting. If the directionality (sense /
anti-sense) for a set of reads is known (e.g. as the result of a direction-aware
protocol), it can now be specified on the command line. Thus, there are
conceptually 3 "classes" of reads; forward/sense, reverse/anti-sense and
undirected. - The estimated number of reads originating from each transcript is now written
to the output file. This may be useful for differential-expression tools which are
based on read counts. - Fixed oversight in bias-correction phase where only RPKM estimates (and not
e.g. TPMs) were corrected. Now all different estimates are corrected during the
bias-correction phase.