Releases: rouskinlab/seismic-rna
Releases · rouskinlab/seismic-rna
Cluster Clones
What's new in v0.9.4
Bugfixes
- Prior to this release, on some but not all platforms, running multiple EM clustering runs in parallel (but not in series) would cause them to have identical trajectories. I suspect (but have not demonstrated) that this bug happened because on such platforms, the entire Python state (including the random number generator) was copied to each subprocess by
concurrent.futures.ProcessPoolExecutor
. Further suggesting this mechanism, on the same platforms, subprocesses also log messages the same way as the main process, suggesting that the root logger is also copied. So that each clustering run has a unique trajectory even with the same Python state, each clustering run now accepts a seed for its random number generator, which is randomized by the main process to ensure that each clustering run is seeded uniquely.
Internals
- Adopt the convention where all strings use double quotation marks.
- Add more detailed
__str__
methods to some custom classes.
Documentation
- Add more information to the documentation, especially to Manuals -> Workflow.
Full Changelog: v0.9.3...v0.9.4
WebApp Export
What's new in 0.9.3
New Features
- Export mutational and structural data and metadata for the web app using
seismic export web [-S sample-metadata.csv] [-R reference-metadata.csv] [samples ...]
Bugfixes
- BAM/CRAM files with insufficient reads are no longer returned in the list of output files from the
align
step.
Full Changelog: v0.9.2...v0.9.3
Memory Managed
What's new in v0.9.2
Bugfixes
- Serious memory leaks caused by cached instance methods of the
MutsBatch
class (since v0.9.0) have been fixed (credit to @justinaruda).
Internals
- A new
Header
class has been introduced to handle parsing and formatting table headers for types of relationships and/or clusters, along with a test suite for it. - Both branches for demultiplexing (credit to @heWhosShouldersBlockTheSun) have been merged into the main branch.
What's Changed
- Patched several memory leaks in the mask module. by @justinaruda in #2
- Demulti reset by @matthewfallan in #3
- Demult fixed by @matthewfallan in #4
New Contributors
- @justinaruda made their first contribution in #2
- @matthewfallan made their first contribution in #3
Full Changelog: v0.9.1...v0.9.2
Underworld of the Unsigned
What's new in 0.9.1
Bug fixes
- Fixed a bug in
seismicrna.core.batch.index
: callingnp.full(target.max(initial=-1) + 1, -1)
wheretarget
is a NumPy NDArray of unsigned integer type would implicitly convert -1 to an unsigned integer with the maximum value of its data type (e.g. 4,294,967,295 for a 32-bit integer) and attempt to allocate memory for an enormous array of this size. On some systems, this would cause a crash, and on others would simply waste time allocating the memory.
Speed by Sparsity
What's new in 0.9.0
Performance upgrades
- Mutation data is now processed and saved in a sparse format that tracks only the mutated positions. Since mutations make up only 1 - 5% of most datasets, the sparse format is more storage-, memory-, and time-efficient.
- Batches are now saved in Brotli-compressed pickle ("Brickle") files, which requires less storage and allows more types of data to be saved than the previous Parquet and gzip-compressed CSV formats.
New features
- The
table
step has been sped up via the sparse data format, and computing all fields is nearly as fast as computing one field. Thustable
now computes all fields automatically (the option to compute select fields has been removed).
Bug fixes
- When running
align
on demultiplexed FASTQ files, one report file is now generated for each FASTQ file, rather than all FASTQ files for each sample writing to and overwriting one report file. - When running
relate
on multiple samples that are aligned to the same set of references, every BAM/CRAM file from every sample is processed instead of only one sample BAM/CRAM file for each reference. - When running
fold
, misformatting of the RNAstructureFold
command has been fixed.
Internals
- The
core
modules have been refactored into a group of subpackages, each with their own modules. - The
all
subcommand has been moved from themain.py
module to its own subpackage. - The mutation calling and counting routines in the modules
seismicrna.core.bitcall
andseismicrna.core.bitvect
, respectively, have been rewritten and replaced withseismicrna.core.rel.pattern
andseismicrna.core.batch.accum
. - The unique read finding algorithm has likewise been rewritten and moved to
seismicrna.cluster.uniq
.
Full Changelog: v0.8.0...v0.9.0
Pipe-A-line
What's new in 0.8.0?
- Align now generates CRAM files with minimal headers instead of BAM files with full headers so that large FASTA files and large FASTQ files eventually require less storage space.
- Align has been re-implemented as two shell pipelines instead of as a series of separate commands glued together with Python, to make it run faster and require less storage of temporary files.
- A new function for parsing only the names of references in FASTA files (if the sequences are not needed) is based on grep and runs several times faster on large files than does the Python-based function for parsing both names and sequences.
- The ambiguous nucleotide "N" is now supported in both reference and read sequences (previously, neither).
- Unit tests have been updated to handle N in DNA and RNA sequences.
Full Changelog: v0.7.1...v0.8.0
Hatch Targets
What's new in v0.7.1
- Added Hatch targets to
pyproject.toml
- Updated documentation
Full Changelog: v0.7.0...v0.7.1
Dr. Docs
What's new in 0.7.0
- Output directories are now organized as
out/sample/step/ref
instead ofout/step/sample/ref
. - Documentation has been partially updated.
- More unit tests have been added.
- A release schedule has been added.
- The --min-mapq option has been added.
- Log messages are color coded.
Full Changelog: v0.6.2...v0.7.0
Bugfix for min_nmut_read
What's new in v0.6.2
- Fixed bug with min_nmut_read not being accepted by
main.run()
Full Changelog: v0.6.0...v0.6.2
Sliding correlations
What's new in v0.6.0
- Graph Pearson or Spearman correlations between two samples in sliding windows.
- Fix bug involving missing arguments in
struct
module.