Yes, freebayes-parallel
does work, but will only parallelize across a single node.
Ideally, and what we're doing here, is allowing Nextflow to handle what jobs are
split across -- whether it's a node or multiple nodes.
You provide freebayes-nf
with some intervals, like the .fai of your reference and
specify a width. Sub-intervals are created of width across your original intervals
upon which freebayes
will operate.
The resultants VCFs are merged and the final VCF is decomposed and normalized using
bcftools
.
In practice this looks like:
nextflow run brwnj/freebayes-nf -latest -resume -profile docker \
--alignments '*.cram' \
--fasta human.fasta
--alignments
- Aligned sequences in .bam and/or .cram format. Indexes (.bai/.crai) must be present.
--fasta
- Reference FASTA. Index (.fai) must exist in same directory.
--outdir
- Base results directory for output.
- Default: '/.results'
--project
- File prefix for merged and annotated VCF files.
- Default: 'variants'
--width
- The genomic window size per variant calling job.
- Default: 5000000
--options
- Arguments to be passed to freebayes command in addition to those already supplied like
--bam
,--region
, and--fasta-reference
. - Single quote these when specifying on the command line, e.g. --options '--pooled-discrete'.
- Default: '--pooled-continuous --pooled-discrete --genotype-qualities --report-genotype-likelihood-max --allele-balance-priors-off --min-alternate-fraction 0.03 --min-repeat-entropy 1 --min-alternate-count 2'
- Arguments to be passed to freebayes command in addition to those already supplied like
--intervals
- Picard-style intervals file to use rather than intervals defined in .fai.
- Something like Broad's interval lists work here if you want to omit masked regions.
- See: https://software.broadinstitute.org/gatk/download/bundle
- And use the wgs_calling_regions file for your genome build.