Output folder directory structure

In the output folder (specified by the -o option, or devider_output by default), the following files will be available.

  |--- snp_haplotypes.fasta <- multiple sequence alignment of SNPs
  |--- majority_vote_haplotypes.fasta <- base-level haplotypes 
  |--- ids.txt <- assignment of reads to haplotypes
  |--- hap_info.txt <- more information about haplotypes
  |--- intermediate/ <- files for debugging (not important)
  |--- pipeline_files/ <- bam + vcf files (only present if using run_devider_pipeline)

snp_haplotypes.fasta - sequences of SNPs as an multiple sequence alignment



This is a valid multiple sequence alignment in fasta format.

  1. The > line contains haplotype information delimited by commas.
  • Contig: represents the contig identifier
  • Range: indicates the coordinates that were haplotyped, e.g., 3000-6000. ALL-ALL indicates no coordinates specified.
  • Haplotype: is a haplotype identifier starting from 0.
  • Abundance: indicates the normalized depth (grouped by Contig and Range) times 100.
  • Depth: is the approximate depth of coverage; will be underestimated slightly if reads are erroneous
  1. The 0, 1, ... represent reference or alternate alleles within this haplotype. - indicates this SNP is not covered by reads within the haplotype. The base position of each SNP is indicated in the hap_info.txt file.


Use --allele-output to output the actual base-level alleles instead of 0 or 1. This output can be fed into MSA visualizers.

You can also use this to build a phylogenetic tree, but you may have to change the ids because : and , are not valid for many tree building software.

majority_vote_haplotypes.fasta - base-level consensus sequence for haplotypes

>Contig:OR483991.1,Range:ALL-ALL,Haplotype:0,Abundance:5.51839673547632,Depth:8.431375873903942 SimpleConsensus

This is a fasta file (but not an MSA) representing base-level haplotypes. The bases are obtained by taking the majority base at each position according to the alignment against the reference.

  1. The header line is the same as in snp_haplotypes.fasta.
  2. N is output is the coverage at the base is < --min-cov OR the fraction of bases supporting the majority base is < --n-fraction.

ids.txt - assignments of reads to haplotypes

Contig:OR483991.1       Range:ALL-ALL   Haplotype:0     61440ba2-e383-ee56-9dcb-d15b0797ea01    8f58b524-0a68-aea0-447a-dd5d2d68925d    79c785fe-1e86-29ed-d496-b003898b91d6
Contig:OR483991.1       Range:ALL-ALL   Haplotype:1     41178954-b99c-02ed-164f-45d7e1b37bfd    34bf4b42-b8ef-2e30-7ae9-14e85e0a5395    9a558a06-f0c8-66c9-8033-8d60ae795ddd

This is a tab-delimited file.

  1. First column indicates contig.
  2. Second column indicates the range.
  3. Third column indicates the haplotype id.
  4. Fourth column to last column are identifiers of reads assigned to this haplotype.


A read can possibly be assigned to multiple contigs or haplotypes (e.g., supplementary alignments across contigs).


If you want to haplotag your bam file (i.e., add HP:i flags to the BAM) for visualization, use the script haplotag_bam included in the conda install (or the scripts/ folder)

hap_info.txt - information about SNPs and haplotypes

Contig:OR483991.1,Range:ALL-ALL Haplotype:0     Haplotype:1     Haplotype:2     Haplotype:3     Haplotype:4     Haplotype:5
286     1:0.83  1:1.00  1:0.80  1:1.00  1:1.00  1:1.00
322     1:1.00  0:1.00  1:1.00  1:1.00  1:0.93  1:1.00
476     0:1.00  0:1.00  0:1.00  0:1.00  0:1.00  1:0.89
491     0:1.00  1:0.52  0:1.00  0:1.00  1:0.96  0:1.00
726     0:1.00  0:1.00  0:1.00  0:0.95  1:0.91  1:1.00
756     1:1.00  1:1.00  1:1.00  1:0.85  1:0.98  0:1.00

This is a tab delimited file (a TSV). The first line is a header. The subsequent lines 286 1:0.83 1:1.00 1:0.80 1:1.00 1:1.00 1:1.00 are interpreted as:

  1. 286 - base-level location of the first SNP.
  2. 1:0.83 - For Haplotype:0, 83% of the reads support the 1 allele -- i.e., the first alternate allele.
  3. 1:1.00 - For Haplotype:1, 100% of the reads support the 1 allele.
  4. And so forth

pipeline_files/ - folder with BAM + VCF files from run_devider_pipeline

This folder is present if run_devider_pipeline was used. This contains BAM files from using minimap2 for your input reads against the reference (default minimap2 parameters). This also contains the VCF file from using LoFreq (--B parameter used). You can rerun devider on these files:

devider -b output/pipeline_files/mapping.bam -v output/pipeline_files/lofreq.vcf.gz -r reference.fa (use some other options)