-
Notifications
You must be signed in to change notification settings - Fork 1
Advanced usage manual
There are four preset parameters offered in devider v0.0.1.
-
old-long-reads
- ~ 90% accuracy rates (old technologies) -
nanopore-r9
(default) - ~95% accuracy rates -
nanopore-r10
- ~ 98% accuracy rates -
hi-fi
- high-fideltiy with > 99.9% accurate rates
The four presets affect the following parameters (discussed further below):
-
k
: for more accurate technologies, higher-k
(k-mer length) is allowed. -
resolution
: the more accurate the technology, the lower the--resolution
parameter is. - SNP-downsampling (no option): we downsample SNPs if too many are present. This depends on the accuracy of the preset.
-
-k
- the length of the SNP-encoded k-mers.
If -k
is not set, devider chooses -k
automatically. This is done by looking at the # of SNPs contained in each read, and taking the 33rd percentile. However, devider makes sure to pick -k
such that -k
does not span > 3/4s of the reference. This works pretty well in general.
For the different presets, we enforce an additional constraint: the maximum value of -k
must be <= 10, 20, 35, 100 in order of the accuracy of the preset. This is to avoid very long k-mers for noisy reads.
If you believe that -k
is not chosen correctly, you can set -k
to a specific value. This will bypass all of the automatic selection.
-
--min-cov
- minimum coverage of reported haplotypes -
--min-abund
- minimum abundance of reported haplotypes
Only haplotypes with coverage > --min-cov
and abundance > --min-abund
are reported. devider's coverage slightly underestimates the true coverage when the reads are noisy.
Abundance is calculated as the normalized coverage times 100.
Note
The abundances across all haplotypes will may not sum to 100% if low-coverage or low-abundance haplotypes are filtered.
-
--resolution
- haplotypes that differ by a fraction of SNPs less than--resolution
are merged.
The resolution is set to 0.02, 0.01, 0.005, 0.001 for the four presets (in order of increasing accuracy). So for nanopore-r9
, if two haplotypes differ by only 1 SNP per 100 SNPs, then these two haplotypes will be merged.
If you truely believe there are very similar haplotypes, then you can set --resolution
to 0. However, systematic errors in long-read sequencing (e.g. methylation, homopolymer errors, context-specific errors) are inevitable, so you should be careful.
-
--strand-bias-fdr
- SNPs are filtered out if (1) they have FDR adjusted p-values (Fisher's exact test) <--strand-bias-fdr
and (2) if the 2x2 table odds ratio is > 1.5 or < 1/1.5.
Strand-specific systematic errors can lead to SNPs. This is the leading cause of false SNPs for nanopore sequencing, so it is crucial that these false SNPs are filtered out.
If you have a VCF file that is already filtered, you can turn this filtering off with --strand-bias-fdr 0
.
-
--mapq-cutoff
: only consider primary alignments with MAPQ <--mapq-cutoff
. -
--supp-mapq-cutoff
: only use supplementary alignments if MAPQ <--supp-mapq-cutoff
. -
--dont-use-supp-aln
: don't use supplementary alignments. -
--min-qual
: only consider bases with base quality >--min-qual
.
These parameters are straightforward. Use higher MAPQ cutoffs if you want more stringent alignments.
Note that we always filter out secondary alignments; if you want to use secondary alignments, you will have to change them to supplementary alignments manually.