Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: disable adding prefix M_, R_, C_ and G_ in exportModel function #360

Merged
merged 22 commits into from
Oct 2, 2024

Conversation

edkerk
Copy link
Member

@edkerk edkerk commented Aug 13, 2021

Main improvements in this PR:

Duplicate of #356, reverting the revert by #359 :) (see explanation there).

The changes made in PR are based on the discussion and consensus in #353

  • Features:
    • checkModelStruct checks if any metabolite, reaction, gene or compartment identifier starts with a digit, and provides instructions on how to proceed.
    • exportModel no longer adds M_, R_, G_, and C_ to metabolite, reaction, gene and compartment identifiers by default. The option still exists, by setting the COBRAstyle flag as true.
    • importModel no longer removes M_, R_, G_, and C_ to metabolite, reaction, gene and compartment identifiers by default. The option still exists, by setting the COBRAstyle flag as true. A warning is thrown if it appears that the model is of COBRA style but COBRAstyle was not set as true.
    • exportForGit also has new COBRAstyle flag.
    • dispEM wraps warnings text into command window.
  • Fixes:
    • readYAMLmodel correct parsing of ec.rxnEnzMat if last row is empty.
    • checkModelStruct only warns about irreversible reactions if bounds indicate reversibility, and not the other way around.

I hereby confirm that I have:

@edkerk edkerk marked this pull request as ready for review August 13, 2021 21:35
@mihai-sysbio mihai-sysbio changed the title fix: disable adding prefix 'M_' and 'R_' in exprotModel function fix: disable adding prefix 'M_' and 'R_' in exportModel function Aug 20, 2021
@edkerk edkerk mentioned this pull request Sep 3, 2021
3 tasks
@edkerk edkerk added this to the 2.6.0 milestone Sep 5, 2021
As discussed in #353, this commit introduce regular expressions for checking whether the IDs are in compliance with libSBML specification.
- Enable throwing errors when running `checkModelStruct` in `exportModel`
- Provide informative error message when reporting invalid ids
@edkerk
Copy link
Member Author

edkerk commented Sep 25, 2021

Line 118-132 should be removed, no longer required as invalid IDs would already have thrown an error.

RAVEN/io/exportModel.m

Lines 118 to 132 in a459d6c

%Convert ids to SBML-convenient format. This is to avoid the data loss when
%unsupported characters are included in ids. Here we are using part from
%convertSBMLID, originating from the COBRA Toolbox
model.rxns=regexprep(model.rxns,'([^0-9_a-zA-Z])','__${num2str($1+0)}__');
model.mets=regexprep(model.mets,'([^0-9_a-zA-Z])','__${num2str($1+0)}__');
model.comps=regexprep(model.comps,'([^0-9_a-zA-Z])','__${num2str($1+0)}__');
if isfield(model,'genes')
problemGenes=find(~cellfun('isempty',regexp(model.genes,'([^0-9_a-zA-Z])')));
originalGenes=model.genes(problemGenes);
replacedGenes=regexprep(model.genes(problemGenes),'([^0-9_a-zA-Z])','__${num2str($1+0)}__');
model.genes(problemGenes)=replacedGenes;
for i=1:numel(problemGenes)
model.grRules = regexprep(model.grRules, ['(^|\s|\()' originalGenes{i} '($|\s|\))'], ['$1' replacedGenes{i} '$2']);
end
end

@haowang-bioinfo
Copy link
Member

agree

@mihai-sysbio
Copy link
Member

no longer required as invalid IDs would already have thrown an error.

@edkerk this got me thinking - are invalid reported throwing errors for all import formats, including importExcelModel?

core/checkModelStruct.m Outdated Show resolved Hide resolved
@edkerk
Copy link
Member Author

edkerk commented Sep 27, 2021

no longer required as invalid IDs would already have thrown an error.

@edkerk this got me thinking - are invalid reported throwing errors for all import formats, including importExcelModel?

@mihai-sysbio What I meant is that exportModel will throw the error via checkModelStruct. importModel would not need to call checkModelStruct, as libSBML would have failed to load a model with illegal characters. For importExcelModel though, and a future readYaml function (which at this time does not exist in RAVEN 2, but is planned to be included) it indeed would make sense to call checkModelStruct.

@edkerk
Copy link
Member Author

edkerk commented Sep 28, 2021

Should not be merged before #364 is merged and pushed from devel to main. Release 2.5.4 has then minor changes, while release 2.6.0 (this PR) will have the significant change that I/O is no longer resulting in the same model in MATLAB as in previous versions, as it no longer uses the COBRA-style prefixes.

@mihai-sysbio
Copy link
Member

Resolving the conflict and the merging of #364 brings this PR in a merge-able state.

Release 2.5.4 has then minor changes, while release 2.6.0 (this PR) will have the significant change that I/O is no longer resulting in the same model in MATLAB as in previous versions, as it no longer uses the COBRA-style prefixes.

I guess we then expect one more PR that releases 2.5.4. before merging this one though @edkerk ?

@edkerk
Copy link
Member Author

edkerk commented Dec 6, 2021

I'm still hesitant with this, as it can break backwards compatibility with people's scripts. It changes the identifier representation (e.g. M_glc6p instead of glc6p) when changing RAVEN version. Might it not be so drastic that it should (part of a) major release?

edit: I moved my further discussion to issue #353.

@edkerk edkerk modified the milestones: 2.6.0, 2.7.0 Jan 13, 2022
@simas232
Copy link
Collaborator

I tried to test this PR with the iAL1006 model from tutorials. Due to the suggested changes in this PR we have a new problem - there are many rxns and mets that begin with a numeric char so the model is no longer compliant with SBML specifications and the exportModel function therefore fails. I am not sure how common it is for metabolite and reaction IDs to start with a number, but this is definitely problematic for GEMs that use BiGG IDs for mets and/or rxns.

@edkerk
Copy link
Member Author

edkerk commented Apr 24, 2022

I agree @simas232, this should be properly resolved, also discussed in #353. To avoid confusion, I'll change this PR to draft for now.

@edkerk edkerk marked this pull request as draft April 24, 2022 20:03
@mihai-sysbio
Copy link
Member

mihai-sysbio commented Apr 25, 2022

Good catch @simas232.

this is definitely problematic for GEMs that use BiGG IDs for mets and/or rxns.

To me, this sounds very much like a problem with the GEMs themselves not being SBML-compatible. The main question who's responsibility it is to address it.

I could imagine a version of the tutorial that starts with a Colab notebook with cobrapy that imports and exports the model, just to obtain a valid SBML (assuming cobrapy also implements this prefixing functionality). Alternatively, such a "patched" GEM could be delivered with the tutorial, as long as this is documented.

Copy link

github-actions bot commented Sep 18, 2024

This PR has been automatically tested with GH Actions. Here is the output of the tests:

 > Installation type                    Advanced (via git)
> Installing from location /home/m/actions-runner/_work/RAVEN/RAVEN
> Checking RAVEN release DEVELOPMENT
> Checking MATLAB release 2020b
> Checking system architecture glnxa64
> Set RAVEN in MATLAB path Pass
> Save MATLAB path Pass
> Make binaries executable Pass

> Add Java paths for Excel format Pass
> Checking libSBML version 5.20.2
> Checking model import and export
> Import Excel format Pass
> Export Excel format Pass
> Import SBML format Pass
> Export SBML format Pass

> Checking for LP solvers
> glpk Pass
> gurobi Pass
> scip Fail
> cobra Pass
> Set RAVEN solver glpk

> Checking BLAST+ Pass
> Checking DIAMOND Pass
> Checking HMMER Pass

> Checking function uniqueness Pass

*** checkInstallation complete ***

SCIP MEX binary not installed or not functional, some fillGapsLargeTests skipped.
SCIP MEX binary not installed or not functional, some fillGapsSmallTests skipped.
Running blastPlusTests
.
Done blastPlusTests

Running cdhitTests
.
Done cdhitTests

Running checkTasksTests
.
Done checkTasksTests

Running diamondTests
.
Done diamondTests

Running fillGapsLargeTests
.
Done fillGapsLargeTests

Running fillGapsSmallTests
.
Done fillGapsSmallTests

Running hmmerTests
.
Done hmmerTests

Running importExportTests
....
Done importExportTests

Running mafftTests
.
Done mafftTests

Running miriamTests
.
Done miriamTests

Running modelAbilitiesTests
........
Done modelAbilitiesTests

Running modelConversionTests
.
Done modelConversionTests

Running modelCurationTests
.......... ........
Done modelCurationTests

Running modelSortingTests
..
Done modelSortingTests

Running solverTests
..
Error occurred in solverTests/testSCIP and it did not run to completion.
---------
Error ID:
---------
''
--------------
Error Details:
--------------
Error using solverTests>testSCIP (line 77)
SCIP MEX binary not installed or not functional.
..
Done solverTests

Running tinitTests
..........
Done tinitTests

Failure Summary:

Name Failed Incomplete Reason(s)
=====================================================
solverTests/testSCIP X X Errored.

Note: In the case of multiple test runs, this post will be edited.

@edkerk edkerk changed the title fix: disable adding prefix 'M_' and 'R_' in exportModel function fix: disable adding prefix M_, R_, C_ and G_ in exportModel function Sep 19, 2024
@edkerk edkerk marked this pull request as ready for review September 22, 2024 13:03
@edkerk edkerk added this to the 2.10.0 milestone Sep 22, 2024
@edkerk edkerk merged commit 943fe1c into develop Oct 2, 2024
1 check passed
@edkerk edkerk deleted the fix/exprotModel branch October 2, 2024 16:49
@mihai-sysbio
Copy link
Member

This has been a long time in the works, great job @edkerk !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants