CarveMe doesn’t satisfy the original protein complex definitions during the building of GPRs #182

lazzarigioele · 2023-06-15T11:40:17Z

Using carveme v1.5.2.

My opinion reading your code is that there is no real check whether a particular protein complex, as it is defined in a particular model for a particular reaction, is really satisfied or not. This in my opinion can lead to wrong GPRs.

I report here below your code contained in carveme/reconstruction/scoring.py , converting the table 'gene_scores' in 'protein_scores', and finally converting 'protein_scores' in 'reaction_scores'. Then I try to give a practical example.

From gene to protein scores:

def merge_subunits(genes):
    genes = genes.dropna()
    if len(genes) == 0:
        return None
    else:
        protein = ' and '.join(sorted(genes))
        if len(genes) > 1:
            return '(' + protein + ')'
        else:
            return protein

def merge_subunit_scores(scores):
    return scores.fillna(0).mean()

protein_scores = gene_scores.groupby(['protein', 'reaction', 'model'], as_index=False) \
        .agg({'query_gene': merge_subunits, 'score': merge_subunit_scores})
protein_scores.rename(columns={'query_gene': 'GPR'}, inplace=True)

From protein to reaction scores:

def merge_proteins(proteins):
    proteins = set(proteins.dropna())
    if not proteins:
        return None
    gpr_str = ' or '.join(sorted(proteins))
    if len(proteins) > 1:
        return '(' + gpr_str + ')'
    else:
        return gpr_str

def merge_protein_scores(scores):
    return scores.max(skipna=True)

reaction_scores = protein_scores.groupby(['reaction'], as_index=False) \
        .agg({'GPR': merge_proteins, 'score': merge_protein_scores}).dropna()
avg_score = reaction_scores.query('score > 0')['score'].median()
reaction_scores['normalized_score'] = (reaction_scores['score'] / avg_score).apply(lambda x: round(x, 1))

Suppose now this is my 'gene_scores' table:

query_gene	BiGG_gene	score	gene	protein	reaction	model
NaN	STM_v1_0.STM0970	NaN	G_STM0970	P_STM0970+STM0973	R_PFL	STM_v1_0
gene_7500	STM_v1_0.STM0843	784.0	G_STM0843	P_STM0843+STM0844	R_PFL	STM_v1_0
gene_18818	STM_v1_0.STM0973	864.0	G_STM0973	P_STM0970+STM0973	R_PFL	STM_v1_0
NaN	STM_v1_0.STM0970	NaN	G_STM0970	P_STM0970+STM3241	R_PFL	STM_v1_0
gene_18818	STM_v1_0.STM3241	846.0	G_STM3241	P_STM0970+STM3241	R_PFL	STM_v1_0
NaN	STM_v1_0.STM4114	NaN	G_STM4114	P_STM4114+STM4115	R_PFL	STM_v1_0
NaN	STM_v1_0.STM4115	NaN	G_STM4115	P_STM4114+STM4115	R_PFL	STM_v1_0
gene_13306	STM_v1_0.STM0844	156.0	G_STM0844	P_STM0843+STM0844	R_PFL	STM_v1_0

Applying the above code would result in this 'protein_score' table:

protein	reaction	model	GPR	score
P_STM0843+STM0844	R_PFL	STM_v1_0	(gene_13306 and gene_7500)	470.0
P_STM0970+STM0973	R_PFL	STM_v1_0	gene_18818	432.0
P_STM0970+STM3241	R_PFL	STM_v1_0	gene_18818	423.0
P_STM4114+STM4115	R_PFL	STM_v1_0	None	0.0

And finally in this 'reaction_score' table:

reaction	GPR	score	normalized_score
R_PFL	((gene_13306 and gene_7500) or gene_18818)	470.0	1.0

As you can read , the final GPR is ((gene_13306 and gene_7500) or gene_18818).
The member “(gene_13306 and gene_7500)” is correct, as the both the components of the complex P_STM0843+STM0844 were well matched by Diamond.

Instead, the member “gene_18818” should not appear in the final GPR in my opinion, because neither the complex P_STM0970+STM0973 nor the complex P_STM0970+STM3241 were satisfied, since Diamond didn’t really catch the gene STM_v1_0.STM0970 (score = Nan).

This simplified example pretend to work with a 'gene_scores' table having just 1 model with 1 reaction, anyway I think it is enough to highlight the issue.
With a real-life 'gene_scores' table, there is the chance that other models containing the same reaction can make the final GPR actually correct, balancing the errors. But it is just a chance...

Maybe it could be useful to have a dedicated option, like carve --strictgpr, if the user wants the original protein complex definitions to be strictly satisfied.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CarveMe doesn’t satisfy the original protein complex definitions during the building of GPRs #182

CarveMe doesn’t satisfy the original protein complex definitions during the building of GPRs #182

lazzarigioele commented Jun 15, 2023

CarveMe doesn’t satisfy the original protein complex definitions during the building of GPRs #182

CarveMe doesn’t satisfy the original protein complex definitions during the building of GPRs #182

Comments

lazzarigioele commented Jun 15, 2023