Skip to content

Releases: COMBINE-lab/RapMap

rapmap v0.6.0

05 Feb 04:59
Compare
Choose a tag to compare

This release brings the master branch and tagged release up to date with many of the bug fixes, developments and improvements that have been going into the develop-salmon branch in support of salmon development. Among the most important new features in this release is the ability to have rapmap apply selective-alignment to improve the sensitivity and specificity of the mappings. For more details on selective-alignment and mapping validation, refer to the release notes of salmon --- and specifically options related to the --validateMappings. This release also adds the ability to optionally write out unmapped reads in the output SAM with the -u flag.

RapMap v0.5.0

26 Mar 18:20
Compare
Choose a tag to compare

This release is accompanied by more re-organization and refinement of the code. It also adds some new options and the SAM output format (removing quality scores, resulting is smaller SAM files and faster output).

RapMap v0.4.0

24 Sep 21:07
Compare
Choose a tag to compare

This new release of RapMap is accompanied by a substantial cleanup of the underlying codebase (including a few bug fixes). The new version should better handle cases where there are almost equally-good mappings on the forward and reverse-complement strands (but where one mapping is slightly better than the other). It also introduces a few more user-facing features (see New Features below).

Important note:

The quasi-indices from previous versions of RapMap are not binary compatible with the new version (see below). Please re-build your indices before using RapMap v0.4.0.

New Features

  • New hash map for default index - The default quasiindex command now uses the sparsepp sparse hash map. While providing very similar lookup performance to the prior hash map implementation, sparsepp provides a number of benefits. Specifically, it uses substantially less memory (typically ~50% less) and, crucially, the memory usage grows gradually with the number of keys. A big problem with the previous implementation being used (Google's dense hash map) is that, on resize, the map would double and memory usage would jump by a factor of 3 (a new map of twice the size as the old, plus the original map from which to copy the keys). This means that even if you had enough memory to hold the final map, you might not be able to build it. Sparsepp, on the other hand exhibits memory usage that scales almost linearly with the number of items in the map. For more details on the performance characteristics of the new default hash used in the index, please see the sparsepp benchmarks here.
  • New frugal perfect hash index - The vastly improved memory usage of the new default quasi index essentially obviates the previous perfect-hash-based index. Specifically, since that perfect hash also stored the keys (to validate queries from outside the universe on which the hash was built), the size of the resulting index was similar, it simply required less memory to build. However, sparsepp achieves very similar memory usage to the previous perfect-hash-based index. Instead of removing the perfect-hash-based index entirely, the -p/--perfectHash flag now tells the quasiindex command to build a frugal perfect-hash-based index. This index uses a number of aggressive space-saving techniques which results in a much smaller memory footprint (but it is also slower to construct and has slower lookups than the default index). For large references, the new frugal perfect-hash-based index exhibits a memory reduction (over the new, reduced-memory, default index) of 40-50% (hence, it shows close to this savings over the old perfect-hash-based index as well). Also, for large references, the size of the index on disk is ~40% smaller. The cost of this substantial size reduction is that the frugal perfect-hash-based index takes 2-2.5 times longer to build, and lookups are slower. This slower lookup speed can, conceivably, reduce quasi-mapping speed a bit, but the speed hit (if there is one) is dataset dependent. This new indexing scheme should allow the construction of quasi indices on substantially larger references for a fixed RAM budget, and also reduces the memory required to retain the index in memory during mapping as well.
  • New options to the quasimap command - The following options have been introduced to the quasimap command:
    • sensitive mode - the -e / --sensitive flag will turn off some NIP-based jumping in the algorithm and will allow reads to compete for mapping using MMP-based coverage profiles. This can increase the sensitivity and specificity of difficult-to-map reads.
    • quasi coverage - the -z/--quasiCoverage option takes a number c <= 0 <= 1, that allows the user to specify that a read will only be considered as "mappable" if at least a fraction c of the read is covered by maximum-mappable-prefixes. Note that the condition that the coverage must be in terms of MMPs is rather stringent, and so this parameter is not to be interpreted as the fraction of nucleotides that would be covered under an optimal alignment. Nonetheless, it allows enforcing the requirement that a single k-length hit should not be sufficient evidence of mapping, and can reduce false-positive mappings when similar but distinct sequences are present in the sample but not the reference (the quasiCov option implies sensitive mode, but not vice-versa).
    • quiet flag - the q / --quiet flag will disable all non-warning/non-error output of the quasimapping command to the console.

Other changes

  • Removal of quality strings from the SAM output - RapMap now output * in place of the quality string of a read in the output SAM file. This is consistent with the SAM standard, and produces output that is considerably less verbose (faster to write and takes up less space), and which also compresses to BAM much better. If quality strings for particular reads are desired, they can always be retrieved from the corresponding read IDs and the original file (we may provide a tool for this in the future).
  • CIGAR string of unmapped reads - The CIGAR string of unmapped reads is now reported * rather than NS (i.e. softclipping of length N, where N is the read length).

RapMap v0.3.0

19 Jun 01:51
Compare
Choose a tag to compare

RapMap v0.3.0 includes one major new feature and one major bugfix since v0.2.2

New Feature

The emphf library has been replaced by BooM, which is based on the excellent BBHash minimal perfect hash library. The big benefits of this switch are that (1) generating the perfect hash now doesn't require any significant extra memory, even further reducing the memory requirements for indexing and (2) the perfect hash can be computed much more quickly and in parallel — use the -x flag to pass an argument for the number of flags that should be used to build the perfect hash function.

Bug Fix

Previous versions of RapMap contained a bug that could be triggered when the total transcriptome size was between 2^31 and 2^32-1 (transcriptome of smaller or larger size ere unaffected). This bug would most likely have caused RapMap to segfault. This was the result of a possible integer overflow, and has been fixed in version 0.3.0.

RapMap v0.2.2

29 May 16:36
Compare
Choose a tag to compare

This is mainly a bug-fix and maintenance release.

Bug Fix

  • In rare circumstances, when an equal quality mapping existed between the forward and reverse-complement strand, the algorithm would exhibit a preference for the forward mapping. This bug was due to a shadowed variable and has been resolved.

Features

  • The -c flag enforces co-linearity within chains of hits (i.e. the hits must be monotonically increasing / decreasing with respect to both the reference and query). We're adding the appropriate framework for enforcing different types of filters, so more should be forthcoming in future releases.
  • Though it has been implemented for a while, here we're documenting the existence of the perfect-hash-based quasi indexing. This replaces the dense hash map with a perfect hash (using the fantastic EMPHF library). This is enabled when building the quasi-index by passing the -p or --perfectHash flag. The tradeoff is that index construction will be slower, but the resulting index will require considerably less space (40 - 50% less) during mapping. Mapping speed is roughly equivalent regardless of whether a "normal" or perfect hash is used.

RapMap v0.2.1

24 Mar 21:52
Compare
Choose a tag to compare

This version includes two bug fixes:

  • Correct SAM flags for primary and non-primary mappings of paired-end reads.
  • Correct reported sequence / quality values when the read maps to the reverse complement strand.

RapMap v0.2.0

16 Mar 20:42
Compare
Choose a tag to compare

Description of changes forthcoming.

RapMap v0.1.0-pre

23 Jan 17:01
Compare
Choose a tag to compare

This release provides a (hopefully widely-compatible) Linux binary for RapMap.