Skip to content

Commit

Permalink
Human 1.7
Browse files Browse the repository at this point in the history
  • Loading branch information
haowang-bioinfo committed Apr 19, 2021
2 parents f8c74a5 + 6891e8d commit 0608b4a
Show file tree
Hide file tree
Showing 189 changed files with 22,634 additions and 21,733 deletions.
7 changes: 7 additions & 0 deletions .deprecated/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Deprecated

This directory contains old scripts, data, models, log files, etc. that are **no longer maintained or used**. They are stored here rather than deleted to maintain a historical account of repository activity in an easily accessible and searchable location.

Code in this directory is **unlikely to function as expected**, and should not be modified. If a script or dataset is to be revived or updated in some way, it should be moved from this directory to the appropriate location in the repository.


File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,6 @@
**I hereby confirm that I have:**

- [ ] Tested my code on my own computer for running the model
- [ ] Selected `devel` as a target branch
- [ ] Selected `develop` as a target branch

*Note: replace [ ] with [X] to check the box. PLEASE DELETE THIS LINE*
4 changes: 2 additions & 2 deletions .github/workflows/yaml-validation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ name: YAML validation

on:
push:
branches: [ devel ]
branches: [ develop ]
pull_request:
branches: [ master, devel ]
branches: [ master, develop ]

jobs:
yaml-validation:
Expand Down
6 changes: 3 additions & 3 deletions .standard-GEM.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,13 @@ The URL can be the _doi_.

Repository workflow
-------------------
- [ ] 🟥 Git branches
- [x] 🟥 Git branches
The GEM repository must have at least two branches: _master_ and _develop_.

- [ ] 🟥 Releases
Releases must use the tag format `X.X.X` where X are numbers, according to [semantic versioning principles](https://semver.org/). The last field (“patch”) can also be used to indicate changes to the repository that do not actually change the GEM itself. The use of a `v` before the version number (`v1.0`) is [discouraged](https://semver.org/#is-v123-a-semantic-version).

- [ ] 🟨 Commits
- [x] 🟨 Commits
Commit messages can follow the style of semantic commits.


Expand All @@ -69,7 +69,7 @@ This file is provided by the template, but it is empty. It must be filled in wit
- [x] 🟥 `/code/README.md`
The repository must contain a `/code` folder. This folder must contain all the code used in generating the model. It must also include a `README.md` file that describes how the folder is organized.

- [ ] 🟥 `/data/README.md`
- [x] 🟥 `/data/README.md`
The repository must contain a `/data` folder. This folder contains the data used in generating the model. It must also include a `README.md` file that describes how the folder is organized.

- [x] 🟥 `/model`
Expand Down
17 changes: 4 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ Detailed instructions on the installation and use of the Human-GEM model and rep

## Model Files

The model is available as `.xml`, `.xlsx`, `.txt`, `.yml`, and `.mat` in the `model/` directory. Note that only the `.yml` version is available on branches other than `master` (e.g., `devel`), to facilitate tracking of model changes.
The model is available as `.xml`, `.xlsx`, `.txt`, `.yml`, and `.mat` in the `model/` directory. Note that only the `.yml` version is available on branches other than `master` (e.g., `develop`), to facilitate tracking of model changes.


## Reaction, Metabolite, and Gene Annotations
Expand All @@ -77,9 +77,9 @@ Additional annotation information and external identifiers for Human-GEM reactio
`Human-GEM.mat` (Recommended if on `master` branch)
* Load and save using the built-in MATLAB `load()` and `save()` functions.

`Human-GEM.yml` (Recommended if on `devel` or other branches)
* Load using the `importHumanYaml.m` function (in `code/io/`)
* Save using the `writeHumanYaml.m` function (in `code/io/`)
`Human-GEM.yml` (Recommended if on `develop` or other branches)
* Load using the `importYaml.m` function (in `code/io/`)
* Save using the `exportYaml.m` function (in `code/io/`)

`Human-GEM.xml` (SBML format)
* Load using the `importModel.m` function (from [RAVEN Toolbox](https://github.com/SysBioChalmers/RAVEN))
Expand All @@ -104,12 +104,3 @@ A collection of manually curated 2D metabolic maps associated with Human-GEM are

Contributions are always welcome! Please read the [contribution guidelines](https://github.com/SysBioChalmers/Human-GEM/blob/master/.github/CONTRIBUTING.md) to get started.



## Contributors

- [Jonathan L. Robinson](https://www.chalmers.se/en/Staff/Pages/jonrob.aspx), National Bioinformatics Infrastructure Sweden (NBIS), Science for Life Laboratory (SciLifeLab), Chalmers University of Technology, Gothenburg Sweden
- [Pÿnar Kocabaÿ](https://www.chalmers.se/en/staff/Pages/kocabas.aspx), Chalmers University of Technology, Gothenburg Sweden
- [Pierre-Etienne Cholley](https://www.chalmers.se/en/staff/Pages/cholley.aspx), Chalmers University of Technology, Gothenburg Sweden
- [Avlant Nilsson](https://www.chalmers.se/en/staff/Pages/avlant-nilsson.aspx), Chalmers University of Technology, Gothenburg Sweden
- [Hao Wang](https://www.chalmers.se/en/staff/Pages/hao-wang.aspx), Chalmers University of Technology, Gothenburg Sweden
File renamed without changes.
9 changes: 2 additions & 7 deletions code/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,6 @@ code
├── io
├── misc
├── modelCuration
│ ├── GPRs
│ │ └── EnzymeComplexes
│ ├── MetAssociation
│ ├── RxnAssociation
│ └── modelIntegration
├── qc
├── tINIT
└── test
Expand All @@ -28,9 +23,9 @@ Functions associated with input/output of files into and out of MATLAB, such as
Code used for technical purposes, typically to augment missing or inconvenient functionalities of MATLAB.

### modelCuration
Contains curation-related scripts and functions that were used to make changes to the Human-GEM model. These model curation scripts help to improve clarity of what changes were made to the model when the number of changes is too large to view practically, or when the changes were made directly to the Human-GEM `.mat` file (done before implementing the `.yml` workflow). Their only remaining purpose is for transparency and re-tracing the steps of the curation process.
Contains curation-related scripts and functions used to make changes to the Human-GEM model. These model curation scripts help to improve transparency of changes made to the model when the number of changes is too large to view practically. Their only remaining purpose is for transparency and re-tracing the steps of the curation process.

Note that all code in this directory is considered deprecated and will not be updated with later versions of Human-GEM.
Note that code in this directory is often one-time use and will not be updated with later versions of Human-GEM. Deprecated code is moved to the `.deprecated` folder in the root directory of this repository.

### qc
Functions to help with quality control (QC) of Human-GEM, such as checking for duplicate reactions or mass imbalances.
Expand Down
4 changes: 2 additions & 2 deletions code/annotateModel.m → code/annotateGEM.m
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
function annModel = annotateModel(model,annType,addMiriams,addFields,overwrite)
function annModel = annotateGEM(model,annType,addMiriams,addFields,overwrite)
% Add reaction, metabolite, and/or gene annotation to a model.
%
% Input:
Expand Down Expand Up @@ -35,7 +35,7 @@
%
% Usage:
%
% annModel = annotateMets(model,annType,addMiriams,addFields);
% annModel = annotateGEM(model,annType,addMiriams,addFields,overwrite);
%


Expand Down
159 changes: 159 additions & 0 deletions code/evalGeneEssentialityPred.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
function [metrics, modelEssential] = evalGeneEssentialityPred(model, expData, metabTasks, modelEssential)
% Compare GEM-based gene essentiality predictions with experimental data
%
% Inputs:
%
% model Genome-scale metabolic model structure.
%
% NOTE: The model should include boundary metabolites! These
% can be added using the addBoundaryMets function.
%
% expData A two-column cell array, where the first column contains
% gene names or IDs (of the same type as those used in the
% model), and the second column indicates whether the gene
% was found as essential or non-essential.
%
% The second column can be text (e.g., 'essential', 'non') or
% numeric (0 = non-essential, 1 = essential). For example:
%
% {'gene1' 'essential' } {'gene1' [1]}
% {'gene2' 'non' } {'gene2' [0]}
% {'gene3' 'conditional'} {'gene3' [1]}
% {... ... } {... ...}
%
% NOTE: If essentiality is provided as strings, only genes
% starting with "non" will be treated as non-essential. This
% means that genes annotated as "essential" or "conditional"
% will both be treated as essential.
%
% NOTE: ALL genes that were tested for essentiality should be
% included in expData. If expData contains only essential
% genes, then it will be assumed that all genes in the genome
% (and thus all genes in the model) were tested.
%
% metabTasks Either the filename or structure containing the metabolic
% tasks that will be tested when using the model to predict
% gene essentiality. The task structure can be loaded using
% the RAVEN "parseTaskList" function.
%
% Genes will be predicted as essential if their deletion
% impairs ANY of the metabolic tasks in metabTasks. If the
% model can perform ALL metabolic tasks upon the deletion of
% a gene, that gene will be predicted as non-essential.
%
% modelEssential (Optional) A list of model-predicted essential genes.
% If provided, the model gene essentiality prediction
% will be skipped and the provided list used instead.
%
% Outputs:
%
% metrics A result structure with the following fields:
% sensitivity
% specificity
% accuracy
% F1 statistic
% Matthew's Correlation Coefficient (MCC)
% p-value associated with a hypergeometric test of the
% true and false positives and negatives
% (2x2 contingency table).
%
% modelEssential List of model-predicted essential genes.
%

if nargin < 4
modelEssential = [];
end


%% Pre-process expData

% extract information from expData and convert essentiality to 0, 1
ex.genes = expData(:,1);
if length(ex.genes) > length(unique(ex.genes))
error('Gene names or IDs in expData are not unique. Duplicated entries must be removed.');
end
if isnumeric(expData{1,2})
ex.essentiality = double(cell2mat(expData(:,2)));
else
ex.essentiality = double(~startsWith(lower(expData(:,2)), 'non'));
end

% check that gene IDs/names in the model are the same type as in expData
if sum(ismember(ex.genes, model.genes)) < 3
error('The gene name or ID type in expData seem to differ from those used in the model.');
end

% define essential and non-essential gene lists
ex.essential = ex.genes(ex.essentiality == 1);
if all(ex.essentiality == 1)
fprintf('NOTE: All genes in expData are marked as essential; it will therefore be assumed that all genes in the genome were tested.\n\n');
ex.nonessential = setdiff(model.genes, ex.essential);
ex.genes = [ex.genes; ex.nonessential];
else
ex.nonessential = ex.genes(ex.essentiality == 0);
end


%% Run gene essentiality predictions with the model

if isempty(modelEssential)

if ischar(metabTasks)
taskStruct = parseTaskList(metabTasks);
else
taskStruct = metabTasks;
end

% first confirm that the model can perform all the tasks
taskReport = checkTasks(model,[],false,false,false,taskStruct);
if ~all(taskReport.ok)
fprintf('\nThe provided model could not perform the following tasks:\n');
fprintf('\t> %s\n', taskReport.description{~taskReport.ok});
fprintf('\n');
error('Model failed task(s) before gene deletion!');
end

% get essential genes for each task in taskStruct
[~,essentialGeneMat] = checkTasksGenes(model,[],false,false,true,taskStruct);

% essential genes are counted as those that are essential for ANY task
essentialGeneVect = any(essentialGeneMat > 0, 2);
pred.essential = model.genes(essentialGeneVect);
pred.nonessential = setdiff(model.genes, pred.essential);

else
pred.essential = modelEssential;
pred.nonessential = setdiff(model.genes, modelEssential);
end


%% Evaluate prediction performance

TP = sum(ismember(pred.essential, ex.essential)); % true positives
TN = sum(ismember(pred.nonessential, ex.nonessential)); % true negatives
FP = sum(ismember(pred.essential, ex.nonessential)); % false positives
FN = sum(ismember(pred.nonessential, ex.essential)); % false negatives

% calculate some metrics
sensitivity = TP./(TP + FN);
specificity = TN./(TN + FP);
accuracy = (TP + TN)./(TP + TN + FP + FN);
F1 = 2*TP./(2*TP + FP + FN);
MCC = ((TP.*TN) - (FP.*FN))./sqrt((TP+FP).*(TP+FN).*(TN+FP).*(TN+FN));
[~, p_hyper] = fishertest([TP, FP; FN, TN], 'tail', 'right');

% combine metrics into output structure
metrics.sensitivity = sensitivity;
metrics.specificity = specificity;
metrics.accuracy = accuracy;
metrics.F1 = F1;
metrics.MCC = MCC;
metrics.p_hyper = p_hyper;

% assign output
modelEssential = pred.essential;


end


6 changes: 3 additions & 3 deletions code/io/exportHumanGEM.m
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@

% Write YML format
if ismember('yml', formats)
writeHumanYaml(ihuman,fullfile(path,'model',strcat(prefix,'.yml')));
exportYaml(ihuman,fullfile(path,'model',strcat(prefix,'.yml')));
end

% Write MAT format
Expand All @@ -100,13 +100,13 @@

% Write XLSX format
if ismember('xlsx', formats)
model = annotateModel(ihuman); % add annotation data to structure
model = annotateGEM(ihuman); % add annotation data to structure
exportToExcelFormat(model,fullfile(path,'model',strcat(prefix,'.xlsx')));
end

% Write XML format
if ismember('xml', formats)
model = annotateModel(ihuman); % add annotation data to structure
model = annotateGEM(ihuman); % add annotation data to structure
model.id = regexprep(model.id,'-',''); % remove dash from model ID since it causes problems with SBML I/O
exportModel(model,fullfile(path,'model',strcat(prefix,'.xml')));
end
Expand Down
11 changes: 5 additions & 6 deletions code/io/writeHumanYaml.m → code/io/exportYaml.m
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
function writeHumanYaml(model,name)
% writeHumanYaml Write a yaml file for HumanGEM.
%
% Writes a yaml file that is similar to the cobrapy yaml structure, but
% contains some minor changes and a header file for compatibility with
function exportYaml(model,name)
% exportYaml
% Exports a yaml file matching (roughly) the cobrapy yaml structure, but
% contains some changes and a 'metadata' section for compatibile with
% the Metabolic Atlas. Adapted from RAVEN's "writeYaml" function.
%
% Usage:
% writeYaml(model,name);
% exportYaml(model,name)
%
% Input:
% model a model structure
Expand Down
8 changes: 4 additions & 4 deletions code/io/importHumanYaml.m → code/io/importYaml.m
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
function model=importHumanYaml(yamlFilename, silentMode)
% importHumanYaml
function model=importYaml(yamlFilename, silentMode)
% importYaml
% Imports a yaml file matching (roughly) the cobrapy yaml structure
%
% Input:
% yamlFile a file in yaml model structure. As defined in HumanGEM, the
% yamlFile a file in yaml model structure. As defined in Human-GEM, the
% yaml file contains 5 sections: metaData, metabolites,
% reactions, genes and compartments
% silentMode set as true to turn off notificaiton messages (opt, default
Expand All @@ -12,7 +12,7 @@
% Output:
% model a model structure
%
% Usage: model=importYaml(yamlFilename, silentMode)
% Usage: model = importYaml(yamlFilename, silentMode)
%
% This function is to reverse engineer the RAVEN function `writeYaml`
%
Expand Down
2 changes: 1 addition & 1 deletion code/io/increaseHumanGEMVersion.m
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ function increaseHumanGEMVersion(bumpType)

%Load model:
ymlFile=fullfile(modelPath,'model','Human-GEM.yml');
ihuman = importHumanYaml(ymlFile);
ihuman = importYaml(ymlFile);

%Include tag and save model:
ihuman.version = newVersion;
Expand Down
Binary file removed code/modelCuration/GPRs/EnzymeComplexes/CORUM.mat
Binary file not shown.
Binary file removed code/modelCuration/MetAssociation/Recon3Mets2MNX.mat
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file removed code/modelCuration/MetAssociation/uniqueMets.mat
Binary file not shown.
5 changes: 5 additions & 0 deletions code/modelCuration/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Model curation scripts

This directory contains curation-related scripts and functions used to make changes to the Human-GEM repository. These curation scripts help to improve transparency of changes made to the model when the number of changes is too large to view practically.


Binary file not shown.
Binary file removed code/modelCuration/RxnAssociation/mapRxnResults.mat
Binary file not shown.
Binary file removed code/modelCuration/RxnAssociation/mergedModel.mat
Binary file not shown.
Loading

0 comments on commit 0608b4a

Please sign in to comment.