MMULT is a scalable and efficient solution for DNA methylation analysis of large-scale whole-genome bisulfite sequencing (WGBS) data. MMULT consists of several submodules for analysis of multiple samples in single biological group and multiple groups. MMULT is specifically efficient for large-scale samples (e.g., hundreds of samples).
- BBGLM
BBGLM resolves DNA methylation dynamics using beta-binomial generalized linear model. Utilization of stan library (https://github.com/stan-dev/cmdstan) enables fast execution of model fit for CpG dinucleotides in the whole-genome scale. An example p-value and q-value distributions by BBGLM is as below.
Allowed options:
-h [ --help ] Produce help message.
-m [ --methfile ] arg Methylation BED files. The BED file is
generated by `MCALL` in MOABS. Replicates in
a group are concatenated by comma `,`.
Multiple groups can be specified. For
example, `-m g1_r1.bed,g1_r2.bed -m g2_r1.bed
-m g3_r1.bed,g3_r2.bed,g3_r3.bed`.
-c [ --chrom ] arg A specific chromosome for analysis. Can be
specified multiple times for multiple
chromosomes. The size can be encoded for a
chromosome. For example, `-c chr1:248956422
-c chr2:242193529`. The size can be used to
split a chromosome for running in small
batches. Default: all chromosomes appear in
methylation files.
-l [ --length ] arg (=20000000) Split length of coordinates in a chromomsome.
This is necessary for many replicates with a
limited memory. To enable small-batch
running, size info should be specificed by
`-c|--chrom`. Because the size of chr1 in
hg38 is >200 million, 1/10th (20M) can be
good to go. Default: 20000000.
-d [ --mindepth ] arg (=1) Minimum depth for a CpG coverage. Default: 1.
-t [ --readthreads ] arg (=10) Number of read threads. Default: 10.
-b [ --batchthreads ] arg (=5) Number of batch threads. Default: 5.
--qval arg (=0.05) Q-value threshold for DMC. Default: 0.05.
--nominaldiff arg (=0.2) Nominal methylation difference threshold for
DMC. Default: 0.2.
--maxdistdmcs arg (=300) Maximum distance between consecutive DMCs for
DMR. Default: 300.
--mindmc arg (=3) Minimum number ofDMCs in a DMR. Default: 3.
-o [ --outfile ] arg Output file.
Examples:
bbglm -m g1_r1.bed,g1_r2.bed -m g2_r1.bed -m g3_r1.bed,g3_r2.bed,g3_r3.bed -o output.txt
Date: 2020/08/19
Authors: Jin Li <[email protected]>
- CpGCDIFEnrich
The CpGCDIFEnrich module consolidates methylation differences from individual comparisons using KL-divergence. Credible methylation difference (CDIF, https://github.com/sunnyisgalaxy/moabs) represents methylation difference in an individual comparison between two biological conditions. For a CpG site, CpGCDIFEnrich formulates the consolidated methylation difference using KL-divergence of sample CDIFs compared to the background distirbution of CDIFs among whole-genome CpGs. Below image shows the background distribution of CDIFs, one positive CpG with a high KL-divergence value, and a scatterplot of KL-divergence and sum of CDIFs.
Allowed options:
-h [ --help ] Produce help message.
-c [ --compfile ] arg Comparison files. The comparison file is
generated by `MCOMP` in MOABS. For example,
`-c H001VsNL -c H002VsNL`.
-r [ --chrom ] arg A specific chromosome for analysis. Can be
specified multiple times for multiple
chromosomes. The size can be encoded for a
chromosome. For example, `-c chr1:248956422
-c chr2:242193529`. The size can be used to
split a chromosome for running in small
batches. Default: all chromosomes appear in
comparison files.
-l [ --length ] arg (=20000000) Split length of coordinates in a chromomsome.
This is necessary for many replicates with a
limited memory. To enable small-batch
running, size info should be specificed by
`-r|--chrom`. Because the size of chr1 in
hg38 is >200 million, 1/10th (20M) can be
good to go. Default: 20000000.
-b [ --numbins ] arg (=100) Number of bins. Default: 100.
-t [ --numthreads ] arg (=10) Number of threads. Default: 10.
--kldthr arg (=0.67957) KL-divergence threshold for a DMC. A quarter
of nats. Default: 0.67957.
--cdifthr arg (=0.2) CDIF threshold for a DMC. Default: 0.2.
--maxzerocdif arg (=0.05) Maximum percent of zero CDIFs for a DMC. A
CpG with both positive and negative CDIFs
will be ignored. A negative value will not
check zero CDIFs. Default: 5%.
--maxdistdmcs arg (=300) Maximum distance between consecutive DMCs for
a DMR. Default: 300.
--mindmc arg (=3) Minimum number ofDMCs in a DMR. Default: 3.
-o [ --outfile ] arg Output file.
Examples:
cpgcdifenrich -c H001VsNL -c H002VsNL -o output.txt
Date: 2020/08/19
Authors: Jin Li <[email protected]>
- VMCVMRNME
This module aims to detect variable methylated CpGs (VMCs) and variable methylation regions (VMRs) of samples in a single biological group. A VMC is denoted under less randomness (smaller normalized entropy) and large variation. VMCs and VMRs enable a feasible solution for subtype detection in large-scale samples using methylation profiles. An example subtyping solution using VMCs is as below.
Allowed options:
-h [ --help ] Produce help message.
-m [ --methfile ] arg Methylation BED files. The BED file is
generated by `MCALL` in MOABS. Replicates are
concatenated by comma `,`. For example, `-m
r1.bed,r2.bed,r3.bed`.
-c [ --chrom ] arg One specific-chromosome for analysis. Can be
specified multiple times for multiple
chromosomes. Default: all chromosomes appear
in methylation BED files.
-o [ --outfile ] arg Output file.
-k [ --state ] arg (=2) Number of discretization states. Default: 2.
-w [ --window ] arg (=150) Window size for genome scan. Default: 150.
-b [ --mincpg ] arg (=3) Minimum CpGs in a window. Default: 3.
-d [ --mindepth ] arg (=3) Minimum depth for a CpG coverage. Default: 3.
-t [ --numthreads ] arg (=8) Number of threads. Default: 8.
-v [ --vmrmethod ] arg (=0) VMR detection method. 0: identify VMCs first
and detect VMRs from consecutive VMCs; 1:
Genome scan method by fixed-size windows.
Default: 0.
-s [ --sd ] arg (=0.2) sd for VMC. Default: 0.2.
-n [ --nme ] arg (=0.25) NME for VMC. Default: 0.25.
-x [ --maxdistvmcs ] arg (=300) Maximum distance between consecutive VMCs for
VMR. Default: 300.
--minsample arg (=5) Minimum samples for a CpG. Default: 5.
--vmcfile arg VMC file.
--windowfile arg VMR file by genome scan.
Examples:
vmcvmrnme -m r1.bed,r2.bed,r3.bed -o output.txt
Date: 2020/05/20
Authors: Jin Li <[email protected]>
It is encouraged to install MMULT via Bioconda due to runtime dependencies will be installed automatically by Conda. Namely,
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda install MMULT
where dependent software are below.
Software | URL |
---|---|
boost | https://www.boost.org |
sundials | https://github.com/LLNL/sundials |
rapidjson | https://github.com/Tencent/rapidjson |
eigen | http://eigen.tuxfamily.org |
tbb-devel | https://github.com/oneapi-src/oneTBB |
Maintainer: Jin Li, [email protected]. PI: De-Qiang Sun, [email protected].