Releases: PacificBiosciences/HiFi-human-WGS-WDL
v2.1.0
What's Changed
- Support input BAMs without basemods. Reads without basemods will not contribute to basemod pileups.
- Support input BAMs without rq tags.
rq
will not be calculated for these reads, but the reads will be aligned for downstream processes.
- Updated to TRGT v1.4.1. by @williamrowell in #173
- Increased memory for trgt task to better support larger catalogs with lower complexity motifs.
- Added the
max_depth
parameter and limited to 50 to reduce memory usage for high depth low complexity loci. - Reduced the number of
samtools sort
threads to lower total memory used for sort.
- Update to pbmm2 v1.16.99 prerelease. by @williamrowell in #177 Fixes behavior where SA tags were not stripped from aligned input.
- Remove StarPhase outputs that are not allowed as PharmCAT input (PharmGKB/PharmCAT/issues/204)
NO_READS
is used in StarPhase output to indicate no-calls.NO_MATCH
is used when StarPhase can't match the variants to a known allele.- This workaround removes
NO_READS
/NO_MATCH
outputs.
- Gracefully handle low depth samples with StarPhase/PharmCAT by @williamrowell in #176
Full Changelog: v2.0.7...v2.1.0
v2.0.7
Thanks for bearing with us through the last few releases as we addressed issues affecting specific backends and combinations of inputs. We should start to see more stability now.
What's Changed
- Testing: Added unit tests for all tasks (except
read_pbsv_splits
) using DNAstack/wdl-ci #157 - Fix:
write_ped_phrank
bug affectingsingleton.wdl
entrypoint in Cromwell.
Full Changelog: v2.0.6...v2.0.7
v2.0.6
This is a bugfix release.
What's Changed
- Revert SC1117 fixes that caused issues with miniwdl. by @williamrowell in #169
Full Changelog: v2.0.5...v2.0.6
v2.0.5
This patch primarily addresses an issue with provisioning write_ped_phrank on GCP and Terra.
- cleaned up documentation
- cleaned up input templates
- addressed shellcheck issues for several command calls
Full Changelog: v2.0.4...v2.0.5
v2.0.4
What's Changed
This release addresses an issue with how the workflow_version output string was generated. This was not detected by linting tools and only caused issues at runtime on Cromwell.
- v2.0.4 by @williamrowell in #156
Full Changelog: v2.0.3...v2.0.4
v2.0.3
What's Changed
- a change to
write_ped_phrank
(for wdlTools/DNAnexus compatibility) in v2.0.2 broke Cromwell compatibility; this has been fixed - a change to
pbsv_call
(for miniwdl compatibility) in v2.0.2 broke wdlTools/DNAnexus compatibility; this has been fixed - @informationsea fixed a typo in the stats output
- wiki submodule was removed, docs are in
docs
subdirectory
Full Changelog: v2.0.2...v2.0.3
v2.0.2
v1.2.1
Updated json2ped.py to handle missing sample sex input. 3d62f62
Note: The v1 branch will not receive new features.
Clone this branch with:
git clone \
--depth 1 --branch v1.2.1 \
--recursive \
https://github.com/PacificBiosciences/HiFi-human-WGS-WDL.git
Full Changelog: v1.2.0...v1.2.1
v2.0.1
- Fixed unreachable
pbsv_call
bug. - Modified
write_ped_phrank
behavior to correctly handle missing (null
) sample sex.
Full Changelog: v2.0.0...v2.0.1
v2.0.0
PacBio WGS Variant Pipeline v2.0.0
This is a major restructuring of the v1 workflow. Please read the documentation before filing issues.
Structural changes
- There are two entry-points, singleton.wdl and family.wdl.
singleton.wdl
has a flattened input/output structure that should have better compatibility with platforms like Terra.family.wdl
includes joint calling tasks for small variants and structural variants.- The
family.wdl
entrypoint can be used for both single sample (singleton) and multisample (duo, trio, quad, etc.) inputs, allowing for a single workflow to be used for all analyses. The per-sample outputs will be arrays in the same order as the sample input. Thesingleton.wdl
entrypoint will be maintained for backends that need flattened inputs and outputs.
- phenotype field has been changed from Array[String] to String, a comma-delimited string, e.g., "HP:0000118,HP:0000001"
- Static inputs like reference FASTA and BED files are now referenced through new "map" files to simplify inputs.json structure.
- Workflow
inputs.json
files have been greatly simplified. - Most tasks have been moved to the
wdl-common
submodule for reuse. - AWS AGC has been deprecated by AWS, and support has been removed.
- AWS HealthOmics support has been added (needs improved documentation). Added script to deploy container to private ECR repo for HealtOmics.
New features:
- If aligned BAMs are provided as input to the workflow, alignment and phasing information will be stripped and the reads will be realigned. If the input BAM has consensus kinetics tags, these will be stripped as well.
- Sex (or more specifically, presence or absence of chrY) is inferred by relative chrY aligned depth. This will never override user-defined sex, but is used if the sex is not provided by user.
- HiPhase now jointly phases small variants (DeepVariant), structural variants (PBSV), and tandem repeats (TRGT).
- Merged TRGT VCF will be generated by the family workflow.
- Pharmacogenomics analysis with StarPhase and PharmCAT.
- Updated tertiary analysis with gnomAD v4.1 and CoLoRSdb population datasets.
- High level summary statistics (e.g., mean depth, variant counts by type, etc) output directly by workflow in the form of workflow metadata output (e.g. miniwdl
outputs.json
) and a flatstats.txt
TSV. - Many QC plots have been added:
- read length histogram
- read quality histogram
- aligned depth distribution and cumulative depth distribution
- alignment MAPQ histogram
- alignment gap compressed identity histogram
- SNV distribution heatmap
- small indel size histogram
Tool updates
pbmm2 1.16.0
mosdepth v0.3.9
DeepVariant v1.6.1
pbsv v2.10.0
Paraphase v3.1.1
TRGT v1.2.0
HiPhase v1.4.5
HiFiCNV v1.0.1
pb-StarPhase v1.0.0
PharmCAT v2.15.4
slivar v0.3.1
CoLoRSdb v1.1.0
Thanks to: