Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update with rxnMetDuplicates branch #11

Merged
merged 41 commits into from
Aug 6, 2018

Conversation

haowang-bioinfo
Copy link
Member

No description provided.

pinarkocabas and others added 30 commits March 5, 2018 15:33
A 'python' subfolder is generated for saving relevant scripts and files.
1. Create 'GPRs' subfolder;
2. Move the 'python' folder under 'GPRs'.
Update the GPRs branch with the latest progress
Prepare Human enzyme complexes data and associate to Ensembl ids, and add the generating script.
cleanModelGeneRules removes unnecessary parentheses, trailing ANDs/ORs, and duplicated genes in model grRules.
Text file containing a mapping between Ensembl (ENSG, ENSP, ENST), Entrez (NCBI), UniProt, and gene abbreviations. This mapping information was retrieved from the Ensembl database.
translateGeneRules converts the genes, grRules, and rxnGeneMat from a model to six different gene/transcript/protein ID types, where the original model gene list must correspond to one of the six ID types. The recognized gene ID types are Ensembl (ENSG, ENST, ENSP), UniProt, Entrez (NCBI), or Name (gene symbol). All trailing decimals after a gene ID will be removed, and the grRules will be cleaned/simplified to remove duplicate genes, extra parentheses, etc.
Function now runs "deep clean" by default.
When a gene ID is encountered that cannot be translated, the function now defaults to deleting that gene, rather than simply replacing it with the original geneID.
The Ensembl database gene/transcript/protein ID conversion file has now been appended with a date, and this function was updated to use the dated form of the filename.
This script geneAssoc.m attempts to generate comprehensive gene associations between HMR2 and Recon3D (i.e. relate genes from Entrenz ids to Ensembl ids). And the results are saved to geneAssoc.mat.

Two information sources were utilized. One-to-one relation was preferred in obtaining the results when checking the two sources. For the cases of one Entrez id vs. multiple Ensembl ids, the one from primary assembly is kept and the rest from alternate assemblies are ignored. Regarding the five Entrez ids without Ensembl association, they should be excluded from the grRule translation and subsequent model integration. Because these Entrez ids are either outdated or pseudogenes.
Apply the changes in GPRs branch to modelIntegration
Convert Recon3D grRules to be compatible with HMR, by replacing with Ensembl ids and converging multiple suffix versions into one in the field. The fields genes and rxnGeneMat are also updated.
Upload associated reaction pairs between HMR2 and Recon3D that are detected from the merged model of the two, in which the Recon3D model was modified into RAVEN format with applying comprehensive association to HMR metabolites.
There are some Recon3D mets are found as duplicates after associating with HMR2 mets. This script aims to remove the duplicate mets according to met ids.
This model structure has updated met relevant fields. The S matrix was also updated by converging multiple rows of duplicate mets into one.
In HMR2, metabolite m00196 refers to 'general protein'. It was directly taken by Recon3D and adopted as 'M00196'. However, Recon3D includes another 'general protein' metabolite as 'protein_c'. Here the two mets are associated together to m00196 in HMR2.
To assist convenient model integration, this script aims to generate a data structure (metAssoc) with one-to-one met assocation between HMR2 and Recon3D from the manually curated data structure metAssocHMR2Recon3.mat.
Split the script by moving the metAssoc.mat generating section out and turn it into a separate script.
Change metNames for 'phacgly_s' and 'phacgly_c' from 'Phenylacetylglycine' to Phenylacetylglycine_phacgly' to avoid duplication
@haowang-bioinfo haowang-bioinfo merged commit a304512 into rxnMetDuplicates Aug 6, 2018
@haowang-bioinfo haowang-bioinfo changed the title Update Recon4 model to rxnMetDuplicates Update with rxnMetDuplicates branch Apr 24, 2019
haowang-bioinfo added a commit that referenced this pull request Mar 28, 2022
- This is to fix the issue reported in Mouse-GEM issue #11
@haowang-bioinfo haowang-bioinfo mentioned this pull request Jun 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants