Releases: rouskinlab/seismic-rna
Faster Folding
What's new in 0.15.1
Performance Enhancements
seismic fold
can now predict secondary structures using multiple threads with the RNAstructure commandFold-smp
(instead of justFold
), which speeds up structure prediction almost linearly with respect to the number of threads.Fold-smp
is used by default unless the maximum number of processes allowed for folding is 1, in which caseFold
is used. See https://rna.urmc.rochester.edu/Text/Fold.html for documentation onFold
andFold-smp
.
Bug Fixes
- Some incompatibilities with prior versions in report field names have been fixed.
- In v0.15.0, the observer bias correction would crash if any position had 0 probability of being covered by a read (i.e. all end coordinate probabilities spanning that position were 0), causing division by zero resulting in NaN values propagating to the objective function. This problem has been fixed by setting the NaN values to 0.
- A new function has been implemented to avoid multiplying invalid values (and generating warnings about doing so).
- In EM clustering, the likelihood should theoretically never decrease between iterations. If it does, then it's either due to a bug in the algorithm or to the accumulation of rounding errors during floating-point arithmetic. In all prior versions, if the likelihood decreased, the algorithm would issue a warning and continue iterating until the likelihood increased by a positive amount less than the threshold. However, this could cause the likelihood to decrease for several consecutive iterations, yielding a sub-optimal solution. In this version, EM clustering will terminate the first time the likelihood increases by less than the positive threshold, including if it decreases.
- In all prior versions,
samtools sort
would use its own default choice for where to write temporary files produced during sorting. In this version, it uses an appropriate sub-directory within the user-specified temporary directory (with--temp-dir
).
What's Changed
- 0.15.1 by @matthewfallan in #13
Full Changelog: v0.15.0...v0.15.1
Unbias Unbounded
What's new in 0.15.0
Clustering sections longer than the reads
With version 0.15, you can correctly cluster sections longer than the reads. In previous versions such as 0.14, if the section was longer than the reads, then the reads would be clustered by their location rather than on their actual mutations.
For example, I simulated 1 million 200-300 nt reads over a 600 nt reference that come from two clusters, and then clustered them over the full reference using --min-mut-gap 3
.
With v0.14.1, the reads are clustered based on the location to which they align, resulting in fake clusters:
With v0.15.0, the reads are clustered based on their mutations alone, regardless of their alignment location:
Because the data are simulated, the ground truth mutation rates for both clusters are known. The clustering algorithm in v0.15.0 is able to infer the ground truth mutation rates for both clusters very well:
Cluster 1 (ground truth vs. SEISMIC-RNA)
Cluster 2 (ground truth vs. SEISMIC-RNA)
Masking sections longer than the reads while correcting drop-out bias
Additionally, the algorithm that corrects drop-out bias can now handle reads that are shorter than the section. Previously, if you used --min-mut-gap
in the mask step with a section that was longer than the reads, then the drop-out bias correction would artificially inflate the read coverage and lead to somewhat incorrect mutation rates. In v0.15, this problem has been fixed, so you can safely use --min-mut-gap
with sections of any length. The read coverage inflation can be seen in the mask coverage from the same dataset:
Coverage (v0.14.1)
Note that the peak coverage erroneously exceeds 1 million, the total number of reads in the dataset.
Coverage (v0.15.0)
Note that the peak coverage is now less than 1 million, as it should be.
What's Changed
- 0.15.0 by @matthewfallan in #12
Full Changelog: v0.14.1...v0.15.0
Pair Programming
What's new in 0.14.1
Bug fixes
- Since v0.9.0, there has been a bug in merging paired reads (function
merge_rels
) preventing a match in one read from compensating for ambiguity in the other. For instance, if at position 45, read 1 had a low-quality N while read 2 had a high-quality match, then the result would be a low-quality N, instead of the intended high-quality match. This bug did not prevent matches from compensating for blank positions, nor did it prevent non-matches from compensating for other types of relationships. This bug was found by @justinaruda and has now been patched by @matthewfallan and @justinaruda.
Full Changelog: v0.14.0...v0.14.1
Passing Lists
What's new in 0.14.0
New Features
- Introduce command
+listpos
to list positions from table files that meet certain criteria. - Introduce file format for listing positions.
- In
mask
command, introduce--exclude-file
which accepts a file containing a list of positions and excludes them; the preceding option--exclude-pos
has been removed.
Full Changelog: v0.13.1...v0.14.0
Lazy Graphs
What's new in 0.13.1
Performance
- Graphs now check whether their output files exist before computing their data, which can save a lot of time for expensive graphs (e.g. rolling AUC-ROC).
Full Changelog: v0.13.0...v0.13.1
Joinery
What's new in 0.13.0
New Features
- Join sections (horizontally) from Mask or Cluster reports using new command
seismic join
.
Full Changelog: v0.12.2...v0.13.0
Cluster Waves
What's new in 0.12.2
New Features
- The new command
seismic +delclust
deletes clusters of order higher than a maximum you specify.
Removed Features
- For consistency with the other commands (especially with
--force
),+addclust
and+delclust
no longer make backups. This feature will probably be reimplemented in a future update that causes all commands to write to a temporary directory first before moving all files to the output directory; but this priority is low, since SEISMIC-RNA has worked well enough without this safety feature.
Full Changelog: v0.12.1...v0.12.2
Safer Add Clusters
What's new in 0.12.1
CLI changes
+clustup
has been renamed to+addclust
to make its purpose more obvious.
New Features
+addclust
now makes a temporary backup of every file in a clustering directory before modifying the existing files (batches, counts, and report). If a fatal error occurs while modifying the existing files, then the original files are restored from the backup. Without this feature, the original data could be corrupted and rendered unusable.
Full Changelog: v0.12.0...v0.12.1
Update: Cluster Update
What's new in 0.12.0
New Features
- New command
seismic +clustup
("cluster update") lets you increase the number of clusters in a clustered dataset after you have already run clustering.- This feature is most useful if you want to check if clustering works with a small number of clusters (e.g. 2) and then try with more clusters (without needing to rerun clustering with 2 clusters).
- Only the new clusters run; previously run clusters are kept, saving time and energy.
- New clusters are simply appended to the report, the batches, and the counts file, as if you had run clustering with that number of clusters initially.
- There is no limit (beyond computer resources) to the number of clusters you can add, or to the number of times you can run
seismic +clustup
on the same dataset.
Bug Fixes
- Fixed bug where NaNs would not be removed properly when calculating normalized RMSDs.
Full Changelog: v0.11.5...v0.12.0
v0.11.5
What's new in 0.11.5
New Features
- Cluster reports now include two more fields to indicate the reproducibility of clustering: "NRMSD Between Runs" and "Correlation Between Runs". The nonfunctional "Variation of Information" field has been removed.
- Pool now includes a safeguard to prevent accidentally overwriting Relate reports.
Full Changelog: v0.11.4...v0.11.5