OVERLAY Algorithm is now a part of PROSO Toolbox. Please Redirect to the PROSO Toolbox Repo for new features and updates.
A Computational Toolbox for Context-Specific Genome-Scale Modelling
Project Wiki »
Report Issues »
NOTE: Please always use and refer to QCSB Release
Table of Contents
OVERLAY is a collection of functions used to process, interpret, and study cellular multi-omics data under the scope of genome-scale modelling (GEM).
What OVERLAY offers:
- Automatic implementing protein constraints to any genome-scale metabolic model (M-model)
- System-level enzymatic constant estimation
- Incorporating gene expression data onto GEM for context-specific modelling
- Suggesting synthetic biology strategies for biotechnology, infectious disease, cancer research, and more
More information on OVERLAY's intuition, formulation, and execution is available in our publication.
OVERLAY can be setup easily as follows.
This is an example of how to list things you need to use the software and how to install them.
-
MATLAB (R2015a or later). Extra add-ons:
- Bioinformatics Toolbox
- Statistics and Machine Learning Toolbox
- etc.
-
- Please refer to openCOBRA for details on installation and troubleshooting.
-
An Optimization Solver. We only support:
- Gurobi Optimizer (Preferred)
- IBM CPLEX Optimization Studio
-
Clone the current repo to your PC
-
In MATLAB command window, add OVERLAY to path
>> addpath("Path-to-OVERLAY-Folder") >> savepath
-
It's good to go
Here we only demonstrate a simple PC-model construction from Pseudomonas aeruginosa M-model. Despite not being context-specific by itself, PC-model is an 'upgraded' M-model and can serve important purposes in research.
-
Prepare Data or find them under OVERLAY/tutorial. Make sure they are in path or in your working directory:
-
P. aeruginosa metabolic reconstruction iSD1509 (doi: https://doi.org/10.1101/2021.04.15.439930)
-
Download P. aeruginosa protein sequence FASTA (.faa): Pseudomonas Genome DB
-
-
Construct draft PC-model from M-model
-
Open MATLAB, make sure all installations are done correctly. Initialize Cobra Toolbox and change the default solver to Gurobi (or IBM CPLEX).
>> initCobraToolbox(false); >> changeCobraSolver('gurobi','all',0);
-
Construct the draft PC-model
We are implementing protein constraints onto iSD1509, with a protein budget of 150mg/gDW.
>> model_ori = readCbModel('iSD1509.xml'); >> [model_pc_draft,fullProtein,fullCplx,C_matrix,K_matrix,proteinMM] = pcModel(model_ori,'Pseudomonas_aeruginosa_UCBPP-PA14_109.faa',150);
This will take several minutes to complete.
The M-model has 1510 genes (with one dummy gene), 1642 metabolites, and 2023 reactions.
Note that the resulting draft PC-model has 7519 'metabolites' (1642 true metabolites + 1510 proteins + 1250 complexes + 1558 forward enzymes + 1558 reverse enzymes + proteinWC) and 12487 'reactions' (2023 true reactions + 1510 protein dilutions + 1250 complex formations + 4588 enzyme formations + 3116 enzyme dilutions). This structure will not be changed during tuning, only the coefficient will be modified.
-
-
Tune the draft PC-model for better performance
-
Manually adjust protein complex stoichiometry
This step is usually conducted using some database. For example, from MetaCyc PA14 database we can extract complex information to curate the draft PC-model. It is important for the user to appropriately appreciate the accuracy of each source, as almost nothing is guaranteed completely accurate.
ATP synthase complex is a large protein complex with 9 subunits. Use surfNet to inspect it in PC-model:
>> surfNet(model_pc_draft,'cplxForm_x(193)x(197)x(195)x(198)x(200)x(199)x(196)x(194)x(192)');
You can use keep track of complex -> enzyme -> reaction to make sure it is the ATPS complex, or going in reverse direction to find complexes for a certain reaction.
For example, If I want to change it so each one of ATPS complex has two copies of subunit alpha (atpA, PA14_73260), I first need to locate both complex and protein in their respective list:
>> pIdx = find(strcmp(fullProtein,'PA14_73260')); >> cIdx = find(C_matrix(pIdx,:));
The change to make is protein #193 and complex #178. I change the coefficient from 1 to 2:
>> C_matrix(pIdx,cIdx) = 2;
I want to finish all subunit modifications before proceed to next step.
-
Estimate enzymatic rate constants using SASA
Now we have modified all protein complexes (C_matrix), their rate constants can be automatically estimated as below.
>> K_matrix = estimateKeffFromMW(C_matrix,K_matrix,proteinMM);
This gives us an updated kinetic matrix to implement.
-
Update PC-model
Implement new C_matrix and K_matrix back to PC-model.
>> model_pc = adjustStoichAndKeff(model_pc_draft,C_matrix,K_matrix);
This will take some time to complete.
-
-
What does PC-model does
PC-model 'soft-cap' the system-level activity by constraining the total amount of proteins in the system.
>> FBAsol = optimizeCbModel(model_ori,'max'); >> FBAsol_pc = optimizeCbModel(model_pc,'max');
The optimal growth rate of PC-model (FBAsol_pc.v) is smaller than the one of M-model (FBAsol.v). In general, PC-FBA better resembles organism's true exponential phase metabolism.
These are only the most basic functions. For more examples, please refer to the Project wiki
OVERLAY is a on-going project with future plans to refine and expand the scope.
-
Version 1.0
- Automated PC-model Construction from M-model
- Convex QP for expression data incorporation
- Nonconvex QP for kinetic parameter estimation
- Debottlenecking algorithm
- Finishing README, wiki, license, etc.
-
Version 2.0
- Implementing more mechanistic details
- Allowing incorporation of other omics data
- Other approach for kinetic parameter estimation
Please cite our recent publication
work-in-progress
Herbert Yao - [email protected]
Queen's Computational Systems Biology Group, Department of Chemical Engineering, Queen's University at Kingston, Canada