-
Notifications
You must be signed in to change notification settings - Fork 1
Output format
In the output folder (specified by the -o
option, or devider_output
by default), the following files will be available.
devider_output/
|
|--- snp_haplotypes.fasta <- multiple sequence alignment of SNPs
|
|--- majority_vote_haplotypes.fasta <- base-level haplotypes
|
|--- ids.txt <- assignment of reads to haplotypes
|
|--- hap_info.txt <- more information about haplotypes
|
|--- intermediate/ <- files for debugging (not important)
|
|--- pipeline_files/ <- bam + vcf files (only present if using run_devider_pipeline)
>Contig:OR483991.1,Range:ALL-ALL,Haplotype:0,Abundance:5.52,Depth:8.43
11000110011010000110000100011011011000011000001110101101011121000100010010110100
1110100100011010011111111010000110010---------
>Contig:OR483991.1,Range:ALL-ALL,Haplotype:1,Abundance:9.27,Depth:14.17
10010110000010010110111010111000011111000001011110110001100001000001100000100110
1011100110011011111000010000111110011101010010
This is a valid multiple sequence alignment in fasta format.
- The
>
line contains haplotype information delimited by commas.
-
Contig:
represents the contig identifier -
Range:
indicates the coordinates that were haplotyped, e.g.,3000-6000
.ALL-ALL
indicates no coordinates specified. -
Haplotype:
is a haplotype identifier starting from 0. -
Abundance:
indicates the normalized depth (grouped byContig
andRange
) times 100. -
Depth:
is the approximate depth of coverage; will be underestimated slightly if reads are erroneous
- The
0
,1
, ... represent reference or alternate alleles within this haplotype.-
indicates this SNP is not covered by reads within the haplotype. The base position of each SNP is indicated in thehap_info.txt
file.
Tip
Use --allele-output
to output the actual base-level alleles instead of 0
or 1
. This output can be fed into MSA visualizers.
You can also use this to build a phylogenetic tree, but you may have to change the ids because :
and ,
are not valid for many tree building software.
>Contig:OR483991.1,Range:ALL-ALL,Haplotype:0,Abundance:5.51839673547632,Depth:8.431375873903942 SimpleConsensus
....NNNNNNNNNNNNNNNNNNNNNNNNNNNNAAAATTCGGCTAAGGCCANGGGGACGTNNAAAATATCAACTAAAACATT...
This is a fasta file (but not an MSA) representing base-level haplotypes. The bases are obtained by taking the majority base at each position according to the alignment against the reference.
- The header line is the same as in
snp_haplotypes.fasta
. -
N
is output is the coverage at the base is <--min-cov
OR the fraction of bases supporting the majority base is <--n-fraction
.
Contig:OR483991.1 Range:ALL-ALL Haplotype:0 61440ba2-e383-ee56-9dcb-d15b0797ea01 8f58b524-0a68-aea0-447a-dd5d2d68925d 79c785fe-1e86-29ed-d496-b003898b91d6
Contig:OR483991.1 Range:ALL-ALL Haplotype:1 41178954-b99c-02ed-164f-45d7e1b37bfd 34bf4b42-b8ef-2e30-7ae9-14e85e0a5395 9a558a06-f0c8-66c9-8033-8d60ae795ddd
...
This is a tab-delimited file.
- First column indicates contig.
- Second column indicates the range.
- Third column indicates the haplotype id.
- Fourth column to last column are identifiers of reads assigned to this haplotype.
Warning
A read can possibly be assigned to multiple contigs or haplotypes (e.g., supplementary alignments across contigs).
Tip
If you want to haplotag your bam file (i.e., add HP:i
flags to the BAM) for visualization, use the script haplotag_bam
included in the conda install (or the scripts/
folder)
Contig:OR483991.1,Range:ALL-ALL Haplotype:0 Haplotype:1 Haplotype:2 Haplotype:3 Haplotype:4 Haplotype:5
286 1:0.83 1:1.00 1:0.80 1:1.00 1:1.00 1:1.00
322 1:1.00 0:1.00 1:1.00 1:1.00 1:0.93 1:1.00
476 0:1.00 0:1.00 0:1.00 0:1.00 0:1.00 1:0.89
491 0:1.00 1:0.52 0:1.00 0:1.00 1:0.96 0:1.00
726 0:1.00 0:1.00 0:1.00 0:0.95 1:0.91 1:1.00
756 1:1.00 1:1.00 1:1.00 1:0.85 1:0.98 0:1.00
This is a tab delimited file (a TSV). The first line is a header. The subsequent lines 286 1:0.83 1:1.00 1:0.80 1:1.00 1:1.00 1:1.00
are interpreted as:
-
286
- base-level location of the first SNP. -
1:0.83
- For Haplotype:0, 83% of the reads support the1
allele -- i.e., the first alternate allele. -
1:1.00
- For Haplotype:1, 100% of the reads support the1
allele. - And so forth
This folder is present if run_devider_pipeline
was used. This contains BAM files from using minimap2 for your input reads against the reference (default minimap2 parameters). This also contains the VCF file from using LoFreq (--B
parameter used). You can rerun devider on these files:
devider -b output/pipeline_files/mapping.bam -v output/pipeline_files/lofreq.vcf.gz -r reference.fa (use some other options)