Skip to content

Releases: rouskinlab/seismic-rna

Faster Folding

01 Apr 05:13
085ea3f
Compare
Choose a tag to compare
Faster Folding Pre-release
Pre-release

What's new in 0.15.1

Performance Enhancements

  • seismic fold can now predict secondary structures using multiple threads with the RNAstructure command Fold-smp (instead of just Fold), which speeds up structure prediction almost linearly with respect to the number of threads. Fold-smp is used by default unless the maximum number of processes allowed for folding is 1, in which case Fold is used. See https://rna.urmc.rochester.edu/Text/Fold.html for documentation on Fold and Fold-smp.

Bug Fixes

  • Some incompatibilities with prior versions in report field names have been fixed.
  • In v0.15.0, the observer bias correction would crash if any position had 0 probability of being covered by a read (i.e. all end coordinate probabilities spanning that position were 0), causing division by zero resulting in NaN values propagating to the objective function. This problem has been fixed by setting the NaN values to 0.
  • A new function has been implemented to avoid multiplying invalid values (and generating warnings about doing so).
  • In EM clustering, the likelihood should theoretically never decrease between iterations. If it does, then it's either due to a bug in the algorithm or to the accumulation of rounding errors during floating-point arithmetic. In all prior versions, if the likelihood decreased, the algorithm would issue a warning and continue iterating until the likelihood increased by a positive amount less than the threshold. However, this could cause the likelihood to decrease for several consecutive iterations, yielding a sub-optimal solution. In this version, EM clustering will terminate the first time the likelihood increases by less than the positive threshold, including if it decreases.
  • In all prior versions,samtools sort would use its own default choice for where to write temporary files produced during sorting. In this version, it uses an appropriate sub-directory within the user-specified temporary directory (with --temp-dir).

What's Changed

Full Changelog: v0.15.0...v0.15.1

Unbias Unbounded

27 Mar 01:22
ed3ad31
Compare
Choose a tag to compare
Unbias Unbounded Pre-release
Pre-release

What's new in 0.15.0

Clustering sections longer than the reads

With version 0.15, you can correctly cluster sections longer than the reads. In previous versions such as 0.14, if the section was longer than the reads, then the reads would be clustered by their location rather than on their actual mutations.

For example, I simulated 1 million 200-300 nt reads over a 600 nt reference that come from two clusters, and then clustered them over the full reference using --min-mut-gap 3.

With v0.14.1, the reads are clustered based on the location to which they align, resulting in fake clusters:

Mutation rate
Screenshot 2024-03-26 at 8 55 32 PM

Coverage
Screenshot 2024-03-26 at 8 56 23 PM

With v0.15.0, the reads are clustered based on their mutations alone, regardless of their alignment location:

Mutation rate
Screenshot 2024-03-26 at 8 56 03 PM

Coverage
Screenshot 2024-03-26 at 8 56 12 PM

Because the data are simulated, the ground truth mutation rates for both clusters are known. The clustering algorithm in v0.15.0 is able to infer the ground truth mutation rates for both clusters very well:

Cluster 1 (ground truth vs. SEISMIC-RNA)
Screenshot 2024-03-26 at 8 59 12 PM

Cluster 2 (ground truth vs. SEISMIC-RNA)
Screenshot 2024-03-26 at 9 00 44 PM

Masking sections longer than the reads while correcting drop-out bias

Additionally, the algorithm that corrects drop-out bias can now handle reads that are shorter than the section. Previously, if you used --min-mut-gap in the mask step with a section that was longer than the reads, then the drop-out bias correction would artificially inflate the read coverage and lead to somewhat incorrect mutation rates. In v0.15, this problem has been fixed, so you can safely use --min-mut-gap with sections of any length. The read coverage inflation can be seen in the mask coverage from the same dataset:

Coverage (v0.14.1)
Note that the peak coverage erroneously exceeds 1 million, the total number of reads in the dataset.
Screenshot 2024-03-26 at 9 16 02 PM

Coverage (v0.15.0)
Note that the peak coverage is now less than 1 million, as it should be.
Screenshot 2024-03-26 at 9 16 19 PM

What's Changed

Full Changelog: v0.14.1...v0.15.0

Pair Programming

28 Feb 00:29
Compare
Choose a tag to compare
Pair Programming Pre-release
Pre-release

What's new in 0.14.1

Bug fixes

  • Since v0.9.0, there has been a bug in merging paired reads (function merge_rels) preventing a match in one read from compensating for ambiguity in the other. For instance, if at position 45, read 1 had a low-quality N while read 2 had a high-quality match, then the result would be a low-quality N, instead of the intended high-quality match. This bug did not prevent matches from compensating for blank positions, nor did it prevent non-matches from compensating for other types of relationships. This bug was found by @justinaruda and has now been patched by @matthewfallan and @justinaruda.

Full Changelog: v0.14.0...v0.14.1

Passing Lists

25 Feb 19:32
Compare
Choose a tag to compare
Passing Lists Pre-release
Pre-release

What's new in 0.14.0

New Features

  • Introduce command +listpos to list positions from table files that meet certain criteria.
  • Introduce file format for listing positions.
  • In mask command, introduce --exclude-file which accepts a file containing a list of positions and excludes them; the preceding option --exclude-pos has been removed.

Full Changelog: v0.13.1...v0.14.0

Lazy Graphs

23 Feb 03:03
Compare
Choose a tag to compare
Lazy Graphs Pre-release
Pre-release

What's new in 0.13.1

Performance

  • Graphs now check whether their output files exist before computing their data, which can save a lot of time for expensive graphs (e.g. rolling AUC-ROC).

Full Changelog: v0.13.0...v0.13.1

Joinery

18 Jan 21:58
Compare
Choose a tag to compare
Joinery Pre-release
Pre-release

What's new in 0.13.0

New Features

  • Join sections (horizontally) from Mask or Cluster reports using new command seismic join.

Full Changelog: v0.12.2...v0.13.0

Cluster Waves

16 Jan 22:59
Compare
Choose a tag to compare
Cluster Waves Pre-release
Pre-release

What's new in 0.12.2

New Features

  • The new command seismic +delclust deletes clusters of order higher than a maximum you specify.

Removed Features

  • For consistency with the other commands (especially with --force), +addclust and +delclust no longer make backups. This feature will probably be reimplemented in a future update that causes all commands to write to a temporary directory first before moving all files to the output directory; but this priority is low, since SEISMIC-RNA has worked well enough without this safety feature.

Full Changelog: v0.12.1...v0.12.2

Safer Add Clusters

14 Jan 02:54
Compare
Choose a tag to compare
Safer Add Clusters Pre-release
Pre-release

What's new in 0.12.1

CLI changes

  • +clustup has been renamed to +addclust to make its purpose more obvious.

New Features

  • +addclust now makes a temporary backup of every file in a clustering directory before modifying the existing files (batches, counts, and report). If a fatal error occurs while modifying the existing files, then the original files are restored from the backup. Without this feature, the original data could be corrupted and rendered unusable.

Full Changelog: v0.12.0...v0.12.1

Update: Cluster Update

13 Jan 10:48
Compare
Choose a tag to compare
Pre-release

What's new in 0.12.0

New Features

  • New command seismic +clustup ("cluster update") lets you increase the number of clusters in a clustered dataset after you have already run clustering.
    • This feature is most useful if you want to check if clustering works with a small number of clusters (e.g. 2) and then try with more clusters (without needing to rerun clustering with 2 clusters).
    • Only the new clusters run; previously run clusters are kept, saving time and energy.
    • New clusters are simply appended to the report, the batches, and the counts file, as if you had run clustering with that number of clusters initially.
    • There is no limit (beyond computer resources) to the number of clusters you can add, or to the number of times you can run seismic +clustup on the same dataset.

Bug Fixes

  • Fixed bug where NaNs would not be removed properly when calculating normalized RMSDs.

Full Changelog: v0.11.5...v0.12.0

v0.11.5

12 Jan 02:38
Compare
Choose a tag to compare
v0.11.5 Pre-release
Pre-release

What's new in 0.11.5

New Features

  • Cluster reports now include two more fields to indicate the reproducibility of clustering: "NRMSD Between Runs" and "Correlation Between Runs". The nonfunctional "Variation of Information" field has been removed.
  • Pool now includes a safeguard to prevent accidentally overwriting Relate reports.

Full Changelog: v0.11.4...v0.11.5