Skip to content

Stranded in the Best Way

Pre-release
Pre-release
Compare
Choose a tag to compare
@matthewfallan matthewfallan released this 27 Jun 05:20
· 318 commits to main since this release

What's new in 0.19.0

New Features

Strand-aware alignment

  • Each BAM file can be split into separate files of reads originating from the plus- versus the minus-strand of the RNA using --sep-strands.
  • The option --f1r2-plus / --f1r2-minus controls whether paired-end reads whose mate 1 aligns in the forward orientation and mate 2 in the reverse orientation are considered to originate from the plus or the minus strand (for the Illumina library prep kits that our lab uses, they come from the minus strand, so --f1r2-minus is the default). Reads where mates 1 and 2 align in the reverse and forward orientations, respectively, are considered to originate from the other strand.
  • For single-end reads, the behavior is the same as for read 1: in --f1r2-minus mode, single-end reads that align in the forward orientation are considered to have come from the minus strand.
  • The option --minus-label controls the label appended to the minus strand of each reference (by default, it is the name of the reference followed by -minus).
  • In strand-aware mode, seismic align also writes a FASTA file of all reference sequences (including their minus strands) whose BAM files received a sufficient number of reads (controlled by --min-reads) into the same directory as the BAM files and align report, with the same name as the original FASTA file.
  • Currently, strand-aware alignment is only available through seismic align, not seismic wf. This limitation arises because separating strands actually generates new reference sequences (namely, the minus strands); if those sequences are missing from the FASTA given to the relate step, then any BAM files aligned to the minus strands will not be able to be processed. It is straightforward to run seismic align in strand-aware mode, then use the FASTA file it generates as input for seismic relate or seismic wf. However, switching the FASTA file automatically within seismic wf will require non-trivial re-engineering of how the pipeline works (or some other hacks).

Bug Fixes

  • The mechanism to release files to the output directory now keeps a backup of any existing output files until it is sure that the new files have been written. This setup avoids potentially deleting existing output files but then failing to write the new files, causing data loss.

Full Changelog: v0.18.2...v0.19.0