Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: gene name compliance with cobrapy #216

Merged
merged 2 commits into from
Apr 24, 2020
Merged

feat: gene name compliance with cobrapy #216

merged 2 commits into from
Apr 24, 2020

Conversation

BenjaSanchez
Copy link
Contributor

Main improvements in this PR:

This partly addresses #102: After the merge of opencobra/cobrapy#685, every component in yeast-GEM is properly parsed when loaded into cobrapy, with the exception of gene names (notebook with the 1st analysis here): this is because gene names are stored by cobratoolbox in the fbc:label field in the xml file, whereas cobrapy looks for these in the fbc:name field. So instead, I've adapted the Matlab saving wrapper so that it copies model.geneNames onto model.proteins, as that is translated to the compliant fbc:name field. By doing this we won't loose any information, as in model.proteins we had placeholders until now. The only downside is that the xml will now have duplicated info:

<fbc:geneProduct metaid="G_YPL078C" fbc:id="G_YPL078C" fbc:name="ATP4" fbc:label="ATP4"/>

But as we now over-write the protein info with the wrapper, and cobrapy only reads one of them, we should not have any issues. Also, from now on no need to redefine each time the model.proteins field @feiranl.

I hereby confirm that I have:

  • Tested my code with all requirements for running the model
  • Selected devel as a target branch (top left drop-down menu)

test if all model components are  loaded when cobrapy is used
for compliance with cobrapy
@BenjaSanchez BenjaSanchez added the enhancement new field/feature label Apr 21, 2020
@BenjaSanchez BenjaSanchez requested review from edkerk and feiranl April 21, 2020 14:39
@BenjaSanchez
Copy link
Contributor Author

@Midnighter do you foresee any potential issues with this decision? were you aware of this difference between cobrapy and cobratoolbox on how the gene names are stored in the xml file?

@Midnighter
Copy link

I was not aware of this difference. Taking a look at the FBC specification, the label field is a required attribute so practically it seems like the better choice. Since it is required, cobrapy must also set it when writing to SBML, though, doesn't it?

Can you please raise an issue on cobrapy about it?

@BenjaSanchez
Copy link
Contributor Author

BenjaSanchez commented Apr 23, 2020

@Midnighter it does indeed set it, just equal to id from what I see in other models. I believe what cobrapy does currently (storing the gene name in the name field) is the better suited choice, and therefore this issue should maybe be raised on cobratoolbox instead. But let me know if you have a different view.

@edkerk
Copy link
Member

edkerk commented Apr 23, 2020

IMHO, the proposed solution here should just be a workaround, as geneNames and proteins are not necessarily the same thing (alternative splicing can give different proteins). To solve this, either COBRA Toolbox or cobrapy should modify their code. FYI, RAVEN also stores it in label, to follow compatibility with COBRA toolbox.

@BenjaSanchez
Copy link
Contributor Author

@edkerk I agree, this is a temporal workaround to guarantee that cobrapy users can get all model fields they need. The permanent solution is to have cobratoolbox and cobrapy agree with one format, and as soon as that happens we can prescind from this change and entirely remove the proteins field (which holds no info atm). I can open an issue in our repo to remember to do that. Let me know of you think another approach would be better.

@Midnighter
Copy link

From my perspective it's a good idea to coordinate on this. I don't really care if it's an issue on the toolbox or on cobrapy.

@edkerk
Copy link
Member

edkerk commented Apr 23, 2020

Following @Midnighter's earlier comment, label is required, which means that it might be best if cobrapy adjusts to adhere to this?

@BenjaSanchez
Copy link
Contributor Author

@edkerk I have now opened opencobra/cobrapy#950 for discussion and #217 as a to-do in this repo once a solution is found. If anything else is needed let me know in a review :)

Copy link
Collaborator

@feiranl feiranl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@BenjaSanchez BenjaSanchez merged commit 4722c45 into devel Apr 24, 2020
@BenjaSanchez BenjaSanchez deleted the fix/gene-names branch April 24, 2020 08:09
@BenjaSanchez BenjaSanchez mentioned this pull request Apr 24, 2020
31 tasks
BenjaSanchez added a commit that referenced this pull request May 20, 2020
PR #216 created conflicts in master
@BenjaSanchez BenjaSanchez mentioned this pull request Jun 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new field/feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants