Regression Models for Localised Mutations (RM2) is a tool for evaluating differential mutation rates and processes across classes of functional sites. RM2 uses negative binomial regression to assess whether sites of interest show an enrichment or depletion of mutations compared to flanking control regions of the same length. Further, RM2 was designed to test whether subclasses of mutations, like those of specific signatures, strandedness, transcription direction, and other features, show differential patterns. This allows for quick and systematic characterization of mutational processes and can be easily extended to site-based studies like pathway analysis.
As input, RM2 requires:
- Mutation file with optional annotation columns
- Set of genomic regions containing the chr, start and end position of the sites
Using the R package devtools, run: devtools::install_github('https://github.com/reimandlab/RM2')
Clone the repository: https://github.com/reimandlab/RM2.git
Open R in the directory you cloned the package in and run install.packages("RM2", repos=NULL, type="source")
Required packages
Biostrings
BSgenome
BSgenome.Hsapiens.UCSC.hg19
GenomeInfoDb
GenomicRanges
ggplot2
MASS
parallel
reshape2
S4Vectors
Load the RM2 package
library(RM2)
Load sites
data("ctcf_chr3_4")
head(ctcf_chr3_4, 2)
# chr start end
# 1 chr4 29087 29584
# 2 chr4 236026 236407
Load mutations
data("mutations_chr3_4")
head(mutations_chr3_4, 2)
# chr start end ref alt
# 253901 chr4 128796154 128796154 G T
# 253910 chr4 132966670 132966670 C A
Add mutation annotations
muts = cbind(mutations_chr3_4, get_mut_trinuc_strand(mutations_chr3_4))
head(muts, 2)
# chr start end ref alt mut_trinuc mut_strand ref_alt
# 253901 chr4 128796154 128796154 G T ACA_A c C_A
# 253910 chr4 132966670 132966670 C A ACA_A w C_A
Run regression for total mutations
window_size = 25
results = RM2(maf = muts,
sites = ctcf_chr3_4,
window_size = window_size)
results
# mut_type pp this_coef obs_mut exp_mut exp_mut_lo exp_mut_hi fc pp_cofac this_coef_cofac n_sites_tested
# 1 total_muts__total_muts 1.288732e-10 0.5575766 273 197 170 226.025 1.380711 NA NA 7194
Visualize results
n_patients = 150
dfr = get_mutations_in_flanked_sites(muts, ctcf_chr3_4, window_size, n_patients)
plot_mutations_in_flanked_sites(dfr, window_size)
Run regression with additional mutation subclasses
mut_class_columns = c(NA, "mut_strand", "ref_alt")
results = RM2(maf = muts,
sites = ctcf_chr3_4,
window_size = window_size,
mut_class_columns = mut_class_columns)
Run regression with mutation co-factor
muts$cofactor_col = sample(c(0,1), nrow(muts), replace=T)
results = RM2(maf = muts,
sites = ctcf_chr3_4,
window_size = window_size,
cofactor_column = "cofactor_col")
Run downsampled RM2
n_iterations = 10
n_sites_sampled = 5000
results = RM2_downsample(maf = muts,
sites = ctcf_chr3_4,
window_size = window_size,
n_sites_sampled = n_sites_sampled,
n_iterations = n_iterations)