Skip to content

Releases: kingsfordgroup/sailfish

Salmon beta v0.2.1

27 Nov 16:44
Compare
Choose a tag to compare
Salmon beta v0.2.1 Pre-release
Pre-release

What is Salmon?

It's a type of fish! But, in the context of RNA-seq, it's the
successor to Sailfish (a rapid "alignment-free" method for transcript
quantification from RNA-seq reads).

Why use Salmon?

Well, Salmon is designed with a lot of what we learned from Sailfish
in mind. We're still adding to and improving Salmon, but it already
has a number of benefits over Sailfish; for example:

  • Salmon can make better use of paired-end information than Sailfish. The
    algorithm used in Salmon considers both ends of a paired-end read
    simultaneously, rather than independently as does Sailfish. Therefore it is
    able to make use of any increased specificity that results from this
    information.
  • Salmon has a smaller memory footprint that Sailfish. While the
    quantification phase of Sailfish already has a fairly compact memory
    footprint, Salmon's is even smaller. Additionally, building the Salmon index
    (which is only required if you don't use the alignment-based mode) requires
    substantially less memory, and can be faster, than building the Sailfish
    index.
  • Salmon can use (but doesn't require) pre-computed alignments. If you want,
    you can use Salmon much like Sailfish, by building an index and feeding it
    raw reads. However, if you already have reads from your favorite, super-fast
    aligner (cough cough maybe STAR), you can use them with Salmon to quantify
    the transcript abundances.
  • Salmon is fast --- in all of our testing so far, it's actually faster than
    Sailfish (though the same relationship doesn't hold between the actual fish).

Further details are contained in the online documentation.

Salmon Beta v0.2.0

24 Nov 03:40
Compare
Choose a tag to compare
Salmon Beta v0.2.0 Pre-release
Pre-release

What is Salmon?

It's a type of fish! But, in the context of RNA-seq, it's the
successor to Sailfish (a rapid "alignment-free" method for transcript
quantification from RNA-seq reads).

Why use Salmon?

Well, Salmon is designed with a lot of what we learned from Sailfish
in mind. We're still adding to and improving Salmon, but it already
has a number of benefits over Sailfish; for example:

  • Salmon can make better use of paired-end information than Sailfish. The
    algorithm used in Salmon considers both ends of a paired-end read
    simultaneously, rather than independently as does Sailfish. Therefore it is
    able to make use of any increased specificity that results from this
    information.
  • Salmon has a smaller memory footprint that Sailfish. While the
    quantification phase of Sailfish already has a fairly compact memory
    footprint, Salmon's is even smaller. Additionally, building the Salmon index
    (which is only required if you don't use the alignment-based mode) requires
    substantially less memory, and can be faster, than building the Sailfish
    index.
  • Salmon can use (but doesn't require) pre-computed alignments. If you want,
    you can use Salmon much like Sailfish, by building an index and feeding it
    raw reads. However, if you already have reads from your favorite, super-fast
    aligner (cough cough maybe STAR), you can use them with Salmon to quantify
    the transcript abundances.
  • Salmon is fast --- in all of our testing so far, it's actually faster than
    Sailfish (though the same relationship doesn't hold between the actual fish).

Further details are contained in the online documentation.

Salmon Beta Release

27 Sep 05:13
Compare
Choose a tag to compare
Salmon Beta Release Pre-release
Pre-release

What is Salmon?

It's a type of fish! But, in the context of RNA-seq, it's the
successor to Sailfish (a rapid "alignment-free" method for transcript
quantification from RNA-seq reads).

Why use Salmon?

Well, Salmon is designed with a lot of what we learned from Sailfish
in mind. We're still adding to and improving Salmon, but it already
has a number of benefits over Sailfish; for example:

  • Salmon can make better use of paired-end information than Sailfish. The
    algorithm used in Salmon considers both ends of a paired-end read
    simultaneously, rather than independently as does Sailfish. Therefore it is
    able to make use of any increased specificity that results from this
    information.
  • Salmon has a smaller memory footprint that Sailfish. While the
    quantification phase of Sailfish already has a fairly compact memory
    footprint, Salmon's is even smaller. Additionally, building the Salmon index
    (which is only required if you don't use the alignment-based mode) requires
    substantially less memory, and can be faster, than building the Sailfish
    index.
  • Salmon can use (but doesn't require) pre-computed alignments. If you want,
    you can use Salmon much like Sailfish, by building an index and feeding it
    raw reads. However, if you already have reads from your favorite, super-fast
    aligner (cough cough maybe STAR), you can use them with Salmon to quantify
    the transcript abundances.
  • Salmon is fast --- in all of our testing so far, it's actually faster than
    Sailfish (though the same relationship doesn't hold between the actual fish).

Further details contained in the README.md in the download.

Version 0.6.3

11 Mar 14:02
Compare
Choose a tag to compare
Version 0.6.3 Pre-release
Pre-release
  • Now handles the presence of 'N' characters in the reference transcripts. Currently, these k-mers are just discarded and the effective length of the reference transcripts are adjusted accordingly.
  • Fixed numerical instability caused by very small probabilities that would sometimes lead to NaNs in quantification.
  • Interface Change: Changed the specification of read files to the quant phase of Sailfish. The user must now provide a library format string (the specification of which is described in the README and the manual). The format string informs Sailfish about e.g. the relative orientation of the reads. While not all of this information is currently used, this change was made in anticipation of an upcoming feature, which will allow Sailfish to perform very rapid quantification of abundance using read alignments, if the user already has these available or needs to compute them anyway for other analyses.
  • Significant speed improvements to the optimization procedure during the quant phase. This allows one to perform many more EM steps in substantially less time.
  • Changed the default convergence criterion to be data-based rather than a fixed number if iterations. The old behavior (a fixed number of iterations) can be mimicked by setting the -i option with the desired number of iterations and setting the -m option to 0. However, the new default convergence criteria is generally recommended.
  • Fixed a bug in the computation of the read counts of bias-corrected quantification estimates.
  • Added a new output type, the estimated number of k-mers. This is like the estimated number of reads originating from a transcript, but is somewhat more natural for Sailfish as it reports the number of k-mers, which are the fundamental unit of coverage.
  • Added initial logging support, which is part of an ongoing effort to improve error handling and messages in Sailfish (this is currently only activated under Linux, until I can figure out how to get g2log to work with g++ under OSX).

There are a number of important improvements and bug-fixes in this release, and we strongly encourage all users to upgrade to version 0.6.3 at this time. There are also some new features, which are close to completion, and we anticipate the time between this release and 0.6.4 to be much smaller than the time between 0.6.2 and 0.6.3.

Version 0.6.2

06 Mar 23:25
Compare
Choose a tag to compare
Version 0.6.2 Pre-release
Pre-release
  • Moved computation of k-mer equivalence classes to the index-building phase. This substantially reduces the memory usage during estimation as well as the size of several of the stored indexes. The algorithm used to compute the equivalence classes was also changed from a parallel-hashing based algorithm to a divisive partition refinement algorithm. This latter algorithm is more suitable to the per-transcript processing that happens during the indexing phase.
  • Implemented reading from named pipes and input redirection. Sequencing
    reads can now be streamed in from a named pipe (e.g. using process
    substitution syntax.)
  • The indexing phase now uses the streaming read parser instead of the mem-mapping parser. This now allows the entire pipeline to be run without using the mem-mapping parser (which may cause issues on a small number of systems). To force usage of the streaming read parser during quantification, just create a named pipe to stream in the reads. For example, if the reads are in the files reads1.fq and reads2.fq, you can create a named-pipe to stream in all of the reads by passing <(cat reads1.fq reads2.fq) to the -r option. This will trigger usage of the streaming read parser instead of the mem-mapping parser.
  • Implemented direction-aware k-mer counting. If the directionality (sense /
    anti-sense) for a set of reads is known (e.g. as the result of a direction-aware
    protocol), it can now be specified on the command line. Thus, there are
    conceptually 3 "classes" of reads; forward/sense, reverse/anti-sense and
    undirected.
  • The estimated number of reads originating from each transcript is now written
    to the output file. This may be useful for differential-expression tools which are
    based on read counts.
  • Fixed oversight in bias-correction phase where only RPKM estimates (and not
    e.g. TPMs) were corrected. Now all different estimates are corrected during the
    bias-correction phase.