Skip to content

v0.4.0

Latest
Compare
Choose a tag to compare
@zhengzhenxian zhengzhenxian released this 11 Oct 14:38
· 13 commits to main since this release

This version is a major update. The new features and benchmarks are explained in a technical note titled “Improving the performance of ClairS and ClairS-TO with new real cancer cell-line datasets and PoN”. A summary of changes:

  1. Starting from this version, ClairS will provide two model types. ssrs is a model trained initially with synthetic samples and then real samples augmented (e.g., ont_r10_dorado_sup_5khz_ssrs), ss is a model trained from synthetic samples (e.g., ont_r10_dorado_sup_5khz_ss). The ssrs model provides better performance and fits most usage scenarios. ss model can be used when missing a cancer-type in model training is a concern. In v0.4.0, four real cancer cell-line datasets (HCC1937/BL, HCC1954/BL, H1437/BL, and H2009/BL) covering two cancer types (breast cancer, lung cancer) published by Park et al. were used for ssrs model training.
  2. Added BQ jittering in model training to address the BQ distribution difference between the training and calling datasets that leads to performance drop.
  3. Added the --indel_min_af option and adjusted the default minimum allelic fraction requirement to 0.1 for Indels in ONT platform.