-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update with rxnMetDuplicates branch #11
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
A 'python' subfolder is generated for saving relevant scripts and files.
1. Create 'GPRs' subfolder; 2. Move the 'python' folder under 'GPRs'.
Update the GPRs branch with the latest progress
Prepare Human enzyme complexes data and associate to Ensembl ids, and add the generating script.
cleanModelGeneRules removes unnecessary parentheses, trailing ANDs/ORs, and duplicated genes in model grRules.
Text file containing a mapping between Ensembl (ENSG, ENSP, ENST), Entrez (NCBI), UniProt, and gene abbreviations. This mapping information was retrieved from the Ensembl database.
translateGeneRules converts the genes, grRules, and rxnGeneMat from a model to six different gene/transcript/protein ID types, where the original model gene list must correspond to one of the six ID types. The recognized gene ID types are Ensembl (ENSG, ENST, ENSP), UniProt, Entrez (NCBI), or Name (gene symbol). All trailing decimals after a gene ID will be removed, and the grRules will be cleaned/simplified to remove duplicate genes, extra parentheses, etc.
Function now runs "deep clean" by default.
When a gene ID is encountered that cannot be translated, the function now defaults to deleting that gene, rather than simply replacing it with the original geneID.
The Ensembl database gene/transcript/protein ID conversion file has now been appended with a date, and this function was updated to use the dated form of the filename.
This script geneAssoc.m attempts to generate comprehensive gene associations between HMR2 and Recon3D (i.e. relate genes from Entrenz ids to Ensembl ids). And the results are saved to geneAssoc.mat. Two information sources were utilized. One-to-one relation was preferred in obtaining the results when checking the two sources. For the cases of one Entrez id vs. multiple Ensembl ids, the one from primary assembly is kept and the rest from alternate assemblies are ignored. Regarding the five Entrez ids without Ensembl association, they should be excluded from the grRule translation and subsequent model integration. Because these Entrez ids are either outdated or pseudogenes.
Apply the changes in GPRs branch to modelIntegration
Convert Recon3D grRules to be compatible with HMR, by replacing with Ensembl ids and converging multiple suffix versions into one in the field. The fields genes and rxnGeneMat are also updated.
Upload associated reaction pairs between HMR2 and Recon3D that are detected from the merged model of the two, in which the Recon3D model was modified into RAVEN format with applying comprehensive association to HMR metabolites.
There are some Recon3D mets are found as duplicates after associating with HMR2 mets. This script aims to remove the duplicate mets according to met ids.
This model structure has updated met relevant fields. The S matrix was also updated by converging multiple rows of duplicate mets into one.
In HMR2, metabolite m00196 refers to 'general protein'. It was directly taken by Recon3D and adopted as 'M00196'. However, Recon3D includes another 'general protein' metabolite as 'protein_c'. Here the two mets are associated together to m00196 in HMR2.
To assist convenient model integration, this script aims to generate a data structure (metAssoc) with one-to-one met assocation between HMR2 and Recon3D from the manually curated data structure metAssocHMR2Recon3.mat.
Split the script by moving the metAssoc.mat generating section out and turn it into a separate script.
Change metNames for 'phacgly_s' and 'phacgly_c' from 'Phenylacetylglycine' to Phenylacetylglycine_phacgly' to avoid duplication
haowang-bioinfo
changed the title
Update Recon4 model to rxnMetDuplicates
Update with rxnMetDuplicates branch
Apr 24, 2019
haowang-bioinfo
added a commit
that referenced
this pull request
Mar 28, 2022
- This is to fix the issue reported in Mouse-GEM issue #11
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.