Skip to content

salmon v0.13.1

Compare
Choose a tag to compare
@rob-p rob-p released this 09 Mar 03:47
· 708 commits to master since this release

Salmon 0.13.1 release notes

Version 0.13.1 is a patch to 0.13.0. We describe the contents of the patch here, and repeat the v0.13.0 release notes again below for simplicity.

  • This version fixes a non-determinism bug introduced in v0.13.0 that could cause the mapping rate of orphaned mappings to fluctuate slightly between runs.

  • This version adds the --allowDovetail flag which overrides the newly-default behavior of discarding dovetail mappings of paired-end reads. If passed this flag, salmon will not consider dovetailing mappings as discordant, and will consider them.

  • The following fields have been added to meta_info.json:

    • num_dovetail_fragments : which denotes the number of fragments that have only dovetailing mappings. If the --allowDovetail flag was passed, these are counted toward quantification, otherwise they are discarded (but this number is still reported). This field only has a meaningful value in quasi-mapping mode (with or without mapping validation).
    • num_fragments_filtered_vm : which denotes the number of fragments that had a mapping to the transcriptome, but which were discarded because none of the mappings for the fragments exceeded the minimum mapping validation score. This field only has a meaningful value in conjunction with mapping validation (otherwise it is 0).
    • num_alignments_below_threshold_for_mapped_fragments_vm : which denotes the number of mappings discarded because they failed to reach the minimum mapping validation score, but for which the corresponding fragment had at least a single valid mapping. This field only has a meaningful value in conjunction with mapping validation (otherwise it is 0).

Previous Salmon 0.13.0 release notes

Change to default behavior

Starting from this version of salmon, dovetailed mappings (see the Bowtie2 manual for a description) are not accepted by default using the built-in mapping (with or without --validateMappings). Moreover v0.13.0 has no flag to allow dovetail mappings. The --allowDovetail option has been added to v0.13.1 to enable this behavior, if desired.

Exotic library types (e.g. MU, MSF, MSR) are no longer supported. If you need support for such a library type, please submit a feature request describing the use-case.

Improvements and new flags

Again, there have been significant improvements to mapping validation. Through broad benchmarking across many samples, we have worked to considerably improve the algorithm and its sensitivity. We note that it is likely that mapping validation will turned on by default in future releases, and we strongly encourage all users to make use of this feature and report their experiences with it.

Along with the default mapping validation (enabled via --validateMappings), there are two "meta" flags that enable mapping validation parameters meant to mimic configurations in which users might be interested.

  • --mimicBT2 : This flag is a "meta-flag" that sets the parameters related to mapping and mapping validation to mimic alignment using Bowtie2 (with the flags --no-discordant and --no-mixed), but using the default scoring scheme and allowing both mismatches and indels in alignments.

  • --mimicStrictBT2 : This flag is a "meta-flag" that sets the parameters related to mapping and mapping validation to mimic alignment using Bowtie2 (with the flags suggested by RSEM), but using the default scoring scheme and allowing both mismatches and indels in alignments. These setting essentially disallow indels in the resulting alignments.

In addition to these "meta-flags", a few other flags have been introduced that can alter the behavior of mapping:

  • --recoverOrphans : This flag (which should only be used in conjunction with mapping validation), performs orphan "rescue" for reads. That is, if mappings are discovered for only one end of a fragment, or if the mappings for the ends of the fragment don't fall on the same transcript, then this flag will cause salmon to look upstream or downstream of the discovered mapping (anchor) for a match for the opposite end of the given fragment. This is done by performing "infix" alignment within the maximum fragment length upstream of downstream of the anchor mapping using edlib.

  • --hardFilter : This flag (which should only be used with mapping validation) turns off soft filtering and range-factorized equivalence classes, and removes all but the equally highest scoring mappings from the equivalence class label for each fragment. While we recommend using soft filtering (the default) for quantification, this flag can produce easier-to-understand equivalence classes if that is the primary object of study.

  • --skipQuant : Related to the above, this flag will stop execution before the actual quantification algorithm is run.

  • --bandwidth : This flag (which is only meaningful in conjunction with mapping validation), sets the bandwidth parameter of the relevant calls to ksw2's alignment function. This determines how wide an area around the diagonal in the DP matrix should be calculated.

  • --maxMMPExtension : This flag (which should only be used with mapping validation) limits the length that a mappable prefix of a fragment may be extended before another search along the fragment is started. Smaller values for this flag can improve the sensitivity of mapping, but could increase run time.

The default setting for --numPreAuxModelSamples has been lowered from 1,000,000 to 5,000. This simply means that the basic models (and cruically the read alignment error model) will start being applied much earlier on in the online algorithm. This has very little effect on samples with a decent number of fragments, but can considerably improve estimates (especially in alignment-based mode) for samples with only a small number of fragments.

The definition of --consensusSlack has changed. Instead of being an absolute number, it is now a fractional value (between 0 and 1) the describes the number of "hits" (i.e. suffix array intervals) that a mapping may miss and still be consdered valid for chaining.

Improvements and changes to alevin

  • With this release alevin will dump a summary statistics of a single cell experiment into the file alevin_meta_info.json inside the aux folder of the output directory.

  • EquivalenceClassBuilder object will now have a single cell SCRGValue templaization, which will marginally reduce the memory used by the object.

  • Salmon's --initUniform flag has been linked with alevin, if enabled through command line (default false) it initialized the EM step with a uniform prior instead of with a unique equivalence class evidence.

  • Alevin can directly consume bfh file format generated using --dumpBfh. It provides an independant entry point into alevin's UMI deduplication step instead of the raw FASTQ files.

  • A bug in UMI deduplication step has been fixed. Previously the vertices in the maximum connected components of an arborescence were not being removed.

  • The custom mode of the single cell protocol for alevin, does not need explicit protocol specific command line flag. Although the full triplet --umiLength --barcodeLength --end command line options has to be specified to enable the custom mode.

  • Maximum allowable length of a barcode and/or the UMI has been set to 20 for the custom mode of a single cell experiment.

  • A new command line option --keepCBFraction has been added, which expects a value in the range (0, 1]. This parameter forces alevin to use the specified fraction of all the observed Cellular barcode in the input reads after sequence correction.

Bug fixes, deprecations and removals

  • Fixed a rare bug that could cause salmon and alevin to "hang" when many read files were provided as input at the number of records in the read file were a divisor of the mini-batch size. Thanks to @rbenel for finding a dataset that triggers this bug and reporting it in #329.

  • The --strictIntersect flag led to unnecessary complexity in the codebase, and it seems, was not really used by anyone, so it was removed to simplify and streamline the code.

  • The --useFSPD flag has been deprecated for many releases and was removed.