Skip to content

Analysis scripts for processing Abeta deep mutational scanning (DMS) data

License

Notifications You must be signed in to change notification settings

martabadiagraset/abetadms

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

Analysis scripts for processing Abeta deep mutational scanning (DMS) data.

Required Software

To run the abetadms pipeline you will need the following software and associated packages:

  • R >=v3.5.2 (Biostrings, caTools, corpcor, cowplot, data.table, gdata, ggplot2, GGally, hexbin, lemon, optparse, parallel, pdist, plyr, ppcor, raster, reshape2, Rpdb, RColorBrewer)

The following packages are optional:

  • DiMSum (pipeline for pre-processing deep mutational scanning data i.e. FASTQ to counts)
  • DMS2structure (scripts used for epistasis and structure analysis of deep mutational scanning data in Schmiedel & Lehner, bioRxiv 2018)

Installation and loading

Open R and enter:

# Install
if(!require(devtools)) install.packages("devtools")
devtools::install_github("lehner-lab/abetadms")

# Load
library(abetadms)

# Help
?abetadms

Required Data

DiMSum fitness estimates and required miscellaneous files should be downloaded from here to your project directory (see 'base_dir' argument) i.e. where output files should be written, and unzipped.

Running

There are a number of options available for running the abetadms pipeline depending on user requirements.

  • Basic (default)

Default pipeline functionality uses DiMSum fitness estimates (see 'Required Data'). Neither DiMSum nor DMS2structure packages are required for this default functionality.

  • Raw read processing

Raw read processing is not handled by the abetadms pipeline. FastQ files from paired-end sequencing of replicate deep mutational scanning (DMS) libraries before ('input') and after selection ('output') were processed using DiMSum (manuscript in prep.), an R package that wraps common biological sequence processing tools.

  • Preprocess fitness

Pipeline stage 1 ('abetadms_preprocess_fitness') reformats DiMSum files and re-estimates fitness of doubles mutants using a bayesian framework ('bayesian_double_fitness = T'). The latter is computationally intensive (~30minutes on 10 cores) and is therefore not run by default.

  • Epistasis analysis

Pipeline stage 9 ('abetadms_epistasis_analysis') performs epistasis calculations. This stage is computationally intensive (~30minutes on 10 cores) and is therefore not run by default. Note: 'Required Data' (see above) already includes precomputed results of the epistasis analysis. However, to force re-execution of this stage set 'rerun_epistasis = T'. Additionally, the correct path to your local copy of the DMS2structure repository must be specified with 'DMS2structure_path = MY_LOCAL_PATH'.

  • Structure analyses

Pipeline stage 10 ('abetadms_secondary_structure_predictions') performs secondary structure predictions. Secondary structure predictions are computationally intensive and are therefore not re-run by default. Note: 'Required Data' (see above) already includes precomputed results of the structure analyses. To force re-execution set 'rerun_structure = T'. Additionally, the correct path to your local copy of the DMS2structure repository must be specified with 'DMS2structure_path = MY_LOCAL_PATH'.

Pipeline

The top-level function abetadms() is the recommended entry point to the pipeline. See section on "Required Data" above for instructions on how to obtain all required data and miscellaneous files before running the pipeline.

Stage 1: Preprocess fitness

This stage ('abetadms_preprocess_fitness') reformats DiMSum files and re-estimates fitness of doubles mutants using a bayesian framework ('bayesian_double_fitness = T'). The latter is computationally intensive (~30minutes on 10 cores) and is therefore not run by default.

Stage 2: Quality control plots

This stage ('abetadms_quality_control') produces quality control plots of fitness estimates.

Stage 3: Combine fitness estimates

This stage ('abetadms_combine_fitness') performs normalisation of fitness estimates (based on silent mutants), fitness distribution plots and position-wise fitness plots.

Stage 4: Calculate single and double mutant effects from AA PCA

This stage ('abetadms_aa_properties_mutant_effects') performs principal component analysis (PCA) of a curated collection of numerical indices representing various physicochemical and biochemical properties of amino acid (AA) properties. AA property feature values represent the difference between the WT and mutant PC scores.

Stage 5: Calculate single and double mutant effects from aggregation tool predictions

This stage ('abetadms_agg_tools_mutant_effects') calculates aggregation / disorder algorithm feature values for single and double mutant variants (similar to stage 4).

Stage 6: Single mutant heatmaps

This stage ('abetadms_single_mutant_heatmaps') produces single mutant heatmaps of fitness effects.

Stage 7: Human disease mutations

This stage ('abetadms_human_disease_mutations') tests whether human disease mutations have biased fitness estimates.

Stage 8: Dot plots showing explained variance of models to predict variant fitness

This stage ('abetadms_fitness_model_summary') produces plots of results from simple linear regression models to predict variant fitness.

Stage 9: Epistasis analysis

This stage ('abetadms_epistasis_analysis') performs epistasis calculations. This stage is computationally intensive (~30minutes on 10 cores) and is therefore not run by default. To force re-execution of this stage set 'rerun_epistasis = T'. Additionally, the corect path to your local copy of the DMS2structure repository must be specified with 'DMS2structure_path = MY_LOCAL_PATH'.

Stage 10: Secondary structure predictions

This stage ('abetadms_secondary_structure_predictions') performs secondary structure predictions and produces combined summary plots. Secondary structure predictions are computationally intensive and are therefore not re-run by default. To force re-execution of secondary structure predictions set 'rerun_structure = T'. Additionally, the corect path to your local copy of the DMS2structure repository must be specified with 'DMS2structure_path = MY_LOCAL_PATH'.

Stage 11: PWI heatmaps

This stage ('abetadms_PWI_heatmaps') plots pair-wise interaction (PWI) score heatmaps.

About

Analysis scripts for processing Abeta deep mutational scanning (DMS) data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%