You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My opinion reading your code is that there is no real check whether a particular protein complex, as it is defined in a particular model for a particular reaction, is really satisfied or not. This in my opinion can lead to wrong GPRs.
I report here below your code contained in carveme/reconstruction/scoring.py , converting the table 'gene_scores' in 'protein_scores', and finally converting 'protein_scores' in 'reaction_scores'. Then I try to give a practical example.
Applying the above code would result in this 'protein_score' table:
protein
reaction
model
GPR
score
P_STM0843+STM0844
R_PFL
STM_v1_0
(gene_13306 and gene_7500)
470.0
P_STM0970+STM0973
R_PFL
STM_v1_0
gene_18818
432.0
P_STM0970+STM3241
R_PFL
STM_v1_0
gene_18818
423.0
P_STM4114+STM4115
R_PFL
STM_v1_0
None
0.0
And finally in this 'reaction_score' table:
reaction
GPR
score
normalized_score
R_PFL
((gene_13306 and gene_7500) or gene_18818)
470.0
1.0
As you can read , the final GPR is ((gene_13306 and gene_7500) or gene_18818).
The member “(gene_13306 and gene_7500)” is correct, as the both the components of the complex P_STM0843+STM0844 were well matched by Diamond.
Instead, the member “gene_18818” should not appear in the final GPR in my opinion, because neither the complex P_STM0970+STM0973 nor the complex P_STM0970+STM3241 were satisfied, since Diamond didn’t really catch the gene STM_v1_0.STM0970 (score = Nan).
This simplified example pretend to work with a 'gene_scores' table having just 1 model with 1 reaction, anyway I think it is enough to highlight the issue.
With a real-life 'gene_scores' table, there is the chance that other models containing the same reaction can make the final GPR actually correct, balancing the errors. But it is just a chance...
Maybe it could be useful to have a dedicated option, like carve --strictgpr, if the user wants the original protein complex definitions to be strictly satisfied.
The text was updated successfully, but these errors were encountered:
Using carveme v1.5.2.
My opinion reading your code is that there is no real check whether a particular protein complex, as it is defined in a particular model for a particular reaction, is really satisfied or not. This in my opinion can lead to wrong GPRs.
I report here below your code contained in carveme/reconstruction/scoring.py , converting the table 'gene_scores' in 'protein_scores', and finally converting 'protein_scores' in 'reaction_scores'. Then I try to give a practical example.
From gene to protein scores:
From protein to reaction scores:
Suppose now this is my 'gene_scores' table:
Applying the above code would result in this 'protein_score' table:
And finally in this 'reaction_score' table:
As you can read , the final GPR is
((gene_13306 and gene_7500) or gene_18818)
.The member “(gene_13306 and gene_7500)” is correct, as the both the components of the complex P_STM0843+STM0844 were well matched by Diamond.
Instead, the member “gene_18818” should not appear in the final GPR in my opinion, because neither the complex P_STM0970+STM0973 nor the complex P_STM0970+STM3241 were satisfied, since Diamond didn’t really catch the gene STM_v1_0.STM0970 (score = Nan).
This simplified example pretend to work with a 'gene_scores' table having just 1 model with 1 reaction, anyway I think it is enough to highlight the issue.
With a real-life 'gene_scores' table, there is the chance that other models containing the same reaction can make the final GPR actually correct, balancing the errors. But it is just a chance...
Maybe it could be useful to have a dedicated option, like
carve --strictgpr
, if the user wants the original protein complex definitions to be strictly satisfied.The text was updated successfully, but these errors were encountered: