Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: log duplicated ids to the column of rxnRetired #666

Merged
merged 5 commits into from
Jun 25, 2023

Conversation

haowang-bioinfo
Copy link
Member

@haowang-bioinfo haowang-bioinfo commented Jun 23, 2023

Main improvements in this PR:

This PR improves documentation of deprecated reactions in #613 by adding their ids to corresponding rows in reactions.tsv, as a tentative implementation according to the discussion in #615

Note: all the pairs listed in #607 have been recorded except MAR00483 vs. MAR00482/MAR00449/MAR01169, because I don't know how to document this irregular pair

I hereby confirm that I have:

  • Tested my code on my own computer for running the model
  • Selected develop as a target branch
  • Any removed reactions and metabolites have been moved to the corresponding deprecated identifier lists

Copy link
Collaborator

@feiranl feiranl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Except the pair of MAR03784 and MAR03783 is not documented. We also performed this kind of documentation in the #646 and should performed in every PR with removing duplicates both for Rxns and Mets

@@ -819,7 +819,7 @@ rxns rxnKEGGID rxnBiGGID rxnEHMNID rxnHepatoNET1ID rxnREACTOMEID rxnRecon3DID rx
"MAR04232" "" "" "RE0549C" "" "" "RE0549C" "MNXR103449" "HMR_4232" "RCR10536" "" 0 "" "" "HMR_4232"
"MAR04233" "R03889" "" "" "r0645" "" "r0645" "MNXR95796" "HMR_4233" "RCR11396" "" 0 "" "RHEA:14469" "HMR_4233"
"MAR04235" "R01938" "AMCOXO" "" "r0449" "" "AMCOXO" "MNXR95807" "HMR_4235" "RCR11397" "" 0 "" "" "HMR_4235"
"MAR04243" "R02487" "GLUTCOADHm" "R02487M" "r0541" "" "GLUTCOADHm;r0541" "MNXR100293" "HMR_4243;HMR_4242" "RCR11398;RCR14356" "" 0 "" "RHEA:30847" "HMR_4243;HMR_4242"
"MAR04243" "R02487" "GLUTCOADHm" "R02487M" "r0541" "" "GLUTCOADHm;r0541" "MNXR100293" "HMR_4243;HMR_4242;MAR04242" "RCR11398;RCR14356" "" 0 "" "RHEA:30847" "HMR_4243;HMR_4242"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the reason this one appears not in the last column

Copy link
Member Author

@haowang-bioinfo haowang-bioinfo Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good spot! this is actually my mistake, now is corrected in 9ce2c5c

@haowang-bioinfo
Copy link
Member Author

haowang-bioinfo commented Jun 23, 2023

Except the pair of MAR03784 and MAR03783 is not documented.

thank you for the careful inspection that really helps in avoiding mistakes!

We also performed this kind of documentation in the #646 and should performed in every PR with removing duplicates both for Rxns and Mets

yes, we probably can follow this practice for all pending PRs and onward. What do you think?

@mihai-sysbio
Copy link
Member

mihai-sysbio commented Jun 23, 2023

From my side, this requires a bit more thinking. Let's take MAR03784 as an example:

  • what is the meaning that for a given reaction the same entries HMR_3784;HMR_3783 exist in two columns rxnHMR2ID and rxnRetired?
  • even though we now know if MAR03783 has been retired, there is no reference as to who/what/when/why, so in essence the core dilemma persists
  • the TSV files in model/ are used for id mapping, but this new column doesn't quite have the same purpose
  • for accuracy, shouldn't one look back in past PRs to populate this new column?

@haowang-bioinfo
Copy link
Member Author

haowang-bioinfo commented Jun 23, 2023

what is the meaning that for a given reaction the same entries HMR_3784;HMR_3783 exist in two columns rxnHMR2ID and rxnRetired?

no immediate answer, but this can definitely be figured out given the radically transparent history of data/code in Human-GEM (an investigation to the .deprecated folder is likely needed)

even though we now know if MAR03783 has been retired, there is no reference as to who/what/when/why, so in essence the core dilemma persists

no, the current action doesn't claim to solve everything, the aim is to move one step forward, and hopefully in the right direction.

the TSV files in model/ are used for id mapping, but this new column doesn't quite have the same purpose

I don't get your point (there is no new column added so far), please clarify what do you mean

for accuracy, shouldn't one look back in past PRs to populate this new column?

agree

@mihai-sysbio
Copy link
Member

mihai-sysbio commented Jun 23, 2023

what is the meaning that for a given reaction the same entries HMR_3784;HMR_3783 exist in two columns rxnHMR2ID and rxnRetired?

no immediate answer, but this can definitely be figured out given the radically transparent history of data/code in Human-GEM (an investigation to the .deprecated folder is likely needed)

To me, the purpose of the newly added column should be clear - almost self-explanatory even to a new user. This duplication of data to me signals that the purpose is not clear enough.

even though we now know if MAR03783 has been retired, there is no reference as to who/what/when/why, so in essence the core dilemma persists

no, the current action doesn't claim to solve everything, the aim is to move one step forward, and hopefully in the right direction.

The proposed solution is indeed a step forward. The question is, should it end up immediately in develop? If it does, it will be in main with the next release. In my view, more steps forward are needed before this ends up in main. Since none defined as of now, my recommendation is for this PR to be kept in draft mode until a more robust solution is found (edit: by continuing the work in this PR).

the TSV files in model/ are used for id mapping, but this new column doesn't quite have the same purpose

I don't get your point, please clarify what do you mean

As a general logic, the TSV files contain a mapping between Human-GEM ids and ids in other databases/models. The newly introduced column is not following this logic, so I am questioning if it really belongs in this file.

@haowang-bioinfo
Copy link
Member Author

it should be noted that there is no new column added so far @mihai-sysbio

@haowang-bioinfo haowang-bioinfo marked this pull request as draft June 23, 2023 11:09
@mihai-sysbio
Copy link
Member

it should be noted that there is no new column added so far @mihai-sysbio

Indeed! Sorry about the mistake, I relied on the online diff too much - it makes it look like the new column is called rx
Screenshot 2023-06-23 at 19 24 34

@haowang-bioinfo haowang-bioinfo marked this pull request as ready for review June 24, 2023 17:42
@haowang-bioinfo
Copy link
Member Author

haowang-bioinfo commented Jun 24, 2023

what is the meaning that for a given reaction the same entries HMR_3784;HMR_3783 exist in two columns rxnHMR2ID and rxnRetired?

no immediate answer, but this can definitely be figured out given the radically transparent history of data/code in Human-GEM (an investigation to the .deprecated folder is likely needed)

To me, the purpose of the newly added column should be clear - almost self-explanatory even to a new user. This duplication of data to me signals that the purpose is not clear enough.

now figured out that the introduction of column rxnRetired was discussed in #265 and implemented #270, it basically holds the original HMR ids, therefore appears to be duplicate to rxnHMR2ID column

even though we now know if MAR03783 has been retired, there is no reference as to who/what/when/why, so in essence the core dilemma persists

no, the current action doesn't claim to solve everything, the aim is to move one step forward, and hopefully in the right direction.

The proposed solution is indeed a step forward. The question is, should it end up immediately in develop? If it does, it will be in main with the next release. In my view, more steps forward are needed before this ends up in main. Since none defined as of now, my recommendation is for this PR to be kept in draft mode until a more robust solution is found (edit: by continuing the work in this PR).

both columns rxnHMR2ID and rxnRetired already existed in main since v1.8.0, so the current step should be fine

the TSV files in model/ are used for id mapping, but this new column doesn't quite have the same purpose

I don't get your point, please clarify what do you mean

As a general logic, the TSV files contain a mapping between Human-GEM ids and ids in other databases/models. The newly introduced column is not following this logic, so I am questioning if it really belongs in this file.

so far no new column has been introduced. Regarding the duplication between columns rxnRetired and rxnHMR2ID, the latter probably can be deprecated at some point while the former may become more and more useful once populated over time

@haowang-bioinfo
Copy link
Member Author

haowang-bioinfo commented Jun 25, 2023

based on discussions on #615, from this PR (#666) and #663 onward, we started to document removed duplicate reactions and metabolites by appending their IDs to the rows of kept ones in the columns rxnRetired and metRetired, respectively

@haowang-bioinfo haowang-bioinfo merged commit 97ba938 into develop Jun 25, 2023
@haowang-bioinfo haowang-bioinfo deleted the feat/logDeprecatedIDsIn#613 branch June 25, 2023 16:56
@mihai-sysbio
Copy link
Member

This PR improves documentation of deprecated reactions in #613 by adding their ids to corresponding rows in reactions.tsv, as a tentative implementation according to the discussion in #615

The discussion about the implementation has yet to lead to a concrete description of the solution (= an issue). In my view tentative implementations have their place in feature branches, not in develop.

both columns rxnHMR2ID and rxnRetired already existed in main since v1.8.0, so the current step should be fine

I see, yet I want to remember that this rxnRetired exists as consequence of the adoption of MA ids. If so, then to me this feels like a repurposing of that column, the vision for which was to be deleted.

  • new PRs should follow this practice as much as possible

There is no documentation of this practice in CONTRIBUTING.md, and we cannot expect new users to randomly stumble upon the discussion/PR.

@haowang-bioinfo
Copy link
Member Author

haowang-bioinfo commented Jun 27, 2023

This PR improves documentation of deprecated reactions in #613 by adding their ids to corresponding rows in reactions.tsv, as a tentative implementation according to the discussion in #615

The discussion about the implementation has yet to lead to a concrete description of the solution (= an issue). In my view tentative implementations have their place in feature branches, not in develop.

In my view, tentative implementation can be merged into develop if discussed (#615) and approved (#666). Of course, discussion and further change is still open as always

both columns rxnHMR2ID and rxnRetired already existed in main since v1.8.0, so the current step should be fine

I see, yet I want to remember that this rxnRetired exists as consequence of the adoption of MA ids. If so, then to me this feels like a repurposing of that column, the vision for which was to be deleted.

rxnRetired (metRetired for mets) does exist as consequence of the adoption of MA ids, and now repurposed to document removed duplicate ids (#615). This appears to be a good solution, what do you think?

  • new PRs should follow this practice as much as possible

There is no documentation of this practice in CONTRIBUTING.md, and we cannot expect new users to randomly stumble upon the discussion/PR.

good point - go ahead to modify CONTRIBUTING.md please

@mihai-sysbio
Copy link
Member

mihai-sysbio commented Jul 6, 2023

good point - go ahead to modify CONTRIBUTING.md please

Sorry, I can't do that, as I don't fully understand how the different columns are meant to be used.

@haowang-bioinfo haowang-bioinfo mentioned this pull request Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants