Skip to content

Cookbook

Jim Shaw edited this page Oct 18, 2024 · 6 revisions

Standard run (BAM + VCF)

devider -b sorted_indexed.bam -v snps.vcf.gz -r reference.fa -o output -t 10 --preset nanopore-r9
  • -b: sorted and indexed BAM file
  • -v: VCF file with header
  • -r: reference file (same reference as your BAM file)
  • -t: number of threads

Tip

You may need to run bgzip snps.vcf; tabix snps.vcf.gz if your vcf does not have a header.

About --preset: we offer four presets parameter choices. old-long-reads, nanopore-r9, nanopore-r10, hi-fi. See the advanced usage manual for more information. In short, use:

  • old-long-reads for ~90% identity long reads
  • nanopore-r9 for ~95% (default)
  • nanopore-r10 for ~98%
  • hi-fi for true high fidelity reads (> 99.9% accuracy).

Pipeline run

run_devider_pipeline -i reads.fq.gz -r reference.fa -o pipeline_output -t 10 --devider-options "min-cov=10 allele-output preset=nanopore-r10"

This is just a light wrapper around minimap2, LoFreq, and devider. This comes with the conda install or is available through the scripts/ folder.

  • -i: your reads
  • -r: reference file to map and call variants
  • -o: output folder
  • -t: number of threads for each pipeline step (minimap2, lofreq, then devider)
  • --devider-options: you can set options for devider by writing the long-form option and using long-form=value for options with values.

Haplotagged BAM for IGV visualizations

haplotag_bam sorted_indexed.bam -i output/ids.txt

#visualize the tagged bam file in IGV.
ls sorted_indexed.bam.tagged.bam
  • sorted_indexed.bam - your original bam file input to devider. If you used the pipeline, see output/pipeline/mapping.bam.
  • -i - the output/ids.txt file output by devider, assigning reads to each haplotype.

haplotag_bam is available in the conda install (or in scripts/). This generates a haplotagged bam file with HP:i tags for each read, allowing easy visualization in IGV.

Haplotyping only specific regions

devider ... --bed-file regions.bed
devider ... --sequence-to-phase NC_001802.1:1-1000,NC_001592.1:5000-6000

# pipeline version
run_devider_pipeline --devider-options="bed-file=regions.bed"
  • --bed-file: A >= 3 column bed file (only the first 3 matter) with contig id, start position, and end position. Each region will be haplotyped. You can specify multiple regions per contig.
  • --sequenced-to-phase: specify the region/contig to phase in contig:start-end format. Multiple regions can be separated by a comma.

Output base-level alleles and reads

devider ... -o output --allele-output --output-reads
head output/reads.fq

# obtain all reads for Haplotype:1
grep -A3 Haplotype:1 devider_pipeline_output/reads.fq | grep -v '^--' > haplotype_1_reads.fq
  • --allele-output: output base-level alleles instead of 0,1 alleles for snp_haplotypes.fasta. See the page on devider's outputs for more info.
  • --output-reads: outputs the reads in fastq format in output/reads.fq. These reads are tagged by their haplotype.