All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- BED files for the VNTR regions in hg19/hg38 from Severus.
- The appropriate file will be automatically selected for the appropriate genome, unless a user provides a custom bed with
--tr_bed
.
- The appropriate file will be automatically selected for the appropriate genome, unless a user provides a custom bed with
- IGV configuration supporting VCF files from
--snv
and--sv
. - Aligned BAM to the output, when re-alignment is required.
- Tweaked parameters for Severus to refine SV calling.
- More informative log when a BAM called with an invalid basecaller model is provided.
- Failures of processes involved in differentially modified loci and regions detection will not cause workflow to fail.
- Updated
modkit
to v0.3.3. - Reconciled workflow
_ingress.nf
from wf-human-variation v2.4.0 and wf-template v5.2.6.
-resume
failing for somesnv
processes.- Excessive memory usage for sample_probs process when using
--mod
leading to exit code 137. - Erroneous handling of inputs for the joint report.
- Missing re-aligned XAM files in the output directory.
--override_basecaller_cfg
parameter allows users to provide a basecall configuration name in cases where automatic basecall model detection fails.--diff_mod
option to allow users to turn off differential modified loci (DMR) and regions (DMR) analysis with DSS by setting--diff_mod false
.
- Updated
modkit
to v0.3.0. - Reconciled workflow
_ingress.nf
from wf-human-variation v2.3.1.
--snv
crashing with--include_all_ctgs true
.--snv
always referring to model in--override_basecaller_cfg
when deciding whether to call Indels.- Automated basecaller detection not finding a basecaller model.
--basecaller_cfg
as the workflow now automatically detects the basecaller model from the input data.
- Tumor-only mode for the base workflow, small variant calling with ClairS-TO, modified base aggregation and somatic SV calling.
- Parts of the documentation still referring to nanomonsv.
- If available
basecaller_cfg
will be inferred from thebasecall_model
DS key of input read groups.- Providing
--basecaller_cfg
will not be required ifbasecall_model
is present in the DS tag of the read groups of the input BAM. basecaller_cfg
will be ignored if abasecall_model
is found in the input BAM.- The workflow will fail if the tumor and normal BAM files have not been called with the same
basecall_model
.
- Providing
- Updated to Severus v1.1.
- Workflow crashing when the input BED file has overlapping intervals.
- Returning error in
annotate_sv
whenEND
position smaller thanPOS
. - Workflow not starting in nextflow v24.04.
- Update to ClairS v0.2.0.
- The workflow now uses normal heterozygote sites to haplotag reads.
- This behaviour can be changed using
--use_normal_hets_for_phasing
and--use_tumor_hets_for_phasing
. - Moreover, it uses the indels for the phasing as default; this can be changed with
--use_het_indels_for_phasing
. - The workflow now uses longphase to haplotag reads; this can be changed with
--use_longphase_haplotag
. - The workflow now accepts a
--liquid_tumor
option, enabling presets and, where available, models specific for liquid tumors.
- Update to Clair3 v1.0.8.
- Added
--clair3_base_err
and--clair3_gq_bin_size
options.
- Added
modkit
now runs by contig.modkit
bedMethyl are now in the top level output directory.
- A report with name
[sample name].wf-somatic-variation-report.html
, linking the individual detailed reports.
- Use
ezcharts SeqCompare
to in QC report. - Memory usage of alignment report reduced by using histograms.
- Retry process when
clairs.py predict
crashes with error 134.
- Support for input folders of BAM files for
--bam_tumor
and--bam_normal
(instead of only allowing single BAM files).
- ClinVar version in SnpEff container updated to version 20240307
- Update to Clair3 v1.0.7.
- Update to modkit v0.2.6.
- Improved modkit runtime by increasing the default interval size.
- Increased minimum CPU requirement for the workflow to 16.
- bedMethyl output files now follow the pattern
{{ alias }}.wf-somatic-mods.{{ type }}.bedmethyl.gz
. - Structural variant (SV) calling is now performed with Severus (v0.1.2).
- ARM-compatible base workflow and modified base calling.
minimap2
alignments will be in BAM format when--sv
is set.
- Force minimap2 to clean up memory more aggressively. Empirically this reduces peak-memory use over the course of execution.
- Workflow occasionally repeating QC analyses when resuming, even if successful.
- Alignment report script using too much memory.
- HAC models not recognised as valid.
- Spurious
bamstats
crashes with implausible alignment information.
- CRAM as supported input format.
- Reference genome and its indexes from the output directory.
- Insert classification and support for mismatching panel of control.
- Options
--qv
,--classify_insert
,--min_ref_support
,--genotype_sv
and--control_panel
, as these are no longer used by the workflow.
- Updated ClairS to v0.1.7, with the new dorado 4KHz/5KHz HAC models.
- Several performance improvements which should noticeably reduce the running time of the workflow
makeQCreport
allows for one retry to prevent the workflow failing on report generation
- Tumor-only mode for the base workflow and modified base aggregation.
--control_panel
option to provide a non-matching control panel produced with nanomonsv merge_control.- Memory directives for every process.
- Run
ClairS
haplotype_filter
by contig. --sv
will no longer emit genotypes by default, and only save the same fields fromnanomonsv
.- Users can still request a genotype using
--genotype_sv
. --min_ref_support
defines the minimum number of REF-supporting reads to call a heterozygote site.
- Users can still request a genotype using
--snv
calling genome-wide variants when--bed
is specified.snv:makeReport
crashing when--annotation false
.
--annotation_threads
removed assnpEff
v5 uses only one thread.
- New documentation
- Updated ClairS to v0.1.6
- ClinVar annotation of SVs has been temporarily removed due to not being correctly incorporated. SnpEff annotations are still produced as part of the final SV VCF.
- rVersion retried also upon success
- Default local executor CPU and RAM limits
- List of reads supporting the SV events is now emitted in
{params.output}/{params.sample_name}/sv/txt
- Running in
--sv
mode does not resume properly. somatic_sv:report
process failing due to name collisions when running with--annotation false
.- Modifed base calling report showing overlapping lines in the DMR plot.
- Automated annotation of SNVs, small indels and SVs.
- Option to skip germline calling and variant phasing with
--germline false
. - Workflow can now emit germline GVCFs for tumor/normal samples with
--GVCF
. - Option
--normal_vcf
to provide a pre-computed normal VCF file. - Add genotyping and hybrid mode.
- The workflow saves the haplotagged cram files if
--germline true
.
- Updated
modkit
to v0.1.13,ClairS
to v0.1.5,Clair3
to v1.0.4, and added support for 4KHz and 5KHz dorado models. - Runs
ClairS
haplotype_filter
on the indels in addition to the SNVs. - SV VCF now report a single sample with Tumor/Normal formats collected, rather than two distinct samples
- This facilitates merging multiple samples thanks to unambiguous sample labelling
- The original VCFs generated by nanomonSV are now saved in
${outdir}/${sample_name}/sv/vcf
- More consistent output naming formatting.
- When
--bed
and--<tumor|normal>_min_coverage
are specified, the workflow will process the regions with coverage above the threshold- The filtering will retain the union (
bedtools merge -d 0
) of the tumor and normal filtered regions
- The filtering will retain the union (
- snpEff crashing due to running out of memory.
- Error in
annotate_sv
whenEND
position smaller thanPOS
- Automated annotation of SNVs and small indels.
- Disable with
--annotation false
- Disable with
- ARM-compatible base workflow and modified base calling
- Option
--qv
, to specify the expected quality value for nanomonsv
Input options
andOutput options
have been combined in theMain options
category- Updated
DSS
to v2.38.0 - Updated
modkit
to v0.1.12 - Updated
nanomonsv
to v0.7.1
- The workflow is now mostly species agnostic (with the exception of
--annotation
and--classify_insert
)
- Add modified base calling with
--mod
- Updated to nanomonsv v0.6.0.
- Refine change type counts and plot; file named [SAMPLE]_spectrum.csv renamed to [SAMPLE]_changes.csv.
- Coverage plots to alignment stats report
- Improved documentation and new structure of the output sub-directories
- Variant allele frequency representation is now a scatterplot showing the relationship between normal and tumor VAF
- Created a subworkflow to call somatic SV (
somatic_sv
inworkflows/wf-somatic-sv.nf
) - Add nanomonsv soft filtering SV when providing a bed file specifying the tandem repeat with
--tr_bed
- Add nanomonsv insert classification with
--classify_insert
, to add RepeatMasker annotation to the SVs - Add
report_sv
, that generates a report of the SV detected - Enum choices are enumerated in the
--help
output - Enum choices are enumerated as part of the error message when a user has selected an invalid choice
- Bumped minimum required Nextflow version to 22.10.8
- Replaced
--threads
option in fastqingress with hardcoded values to remove warning about undefinedparam.threads
- Depth plot with overlapping chromosomal coverage
- Demo data
- Fix occasional crash when candidate nested tuple is found
- Configuration for running demo data in AWS
- Fast mode
- Updated to ClairS v0.1.1
- Initialised wf-somatic-variation from wf-template
- Implemented wf-somatic-snp module, that runs ClairS in a highly parallelised way (v0.1.0).
- Implemented some accessory modules to visualise the results from ClairS (mutation counts, variant allele frequency).
- Implemented report of alignment statistics
- Update reporting script to use ezcharts
- Customizable thread count for haplotype filtering stage
- Mosdepth per-base statistics are now optional, and are not shown in the report
- Workflow crashing when predicting in regions without variants
- Workflow crashing when concatenating SNP and Indel VCF files
- Workflow interrupting when an empty Indel VCF file is generated
- Extremely slow reporting of alignment statistics