Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new interface for samples and fba, fva solutions and workaround for #96 #98

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from
79 changes: 75 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -218,6 +218,27 @@ max_biomass_objective = fva_output[3]

The output of FVA method is tuple that contains `numpy` arrays. The vectors `min_fluxes` and `max_fluxes` contains the minimum and the maximum values of each flux. The vector `max_biomass_flux_vector` is the optimal flux vector according to the biomass objective function and `max_biomass_objective` is the value of that optimal solution.

```python
fva_output_df = model.fva_to_df()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Do we still need the old method model.fva() that changes the internal state of model? I suppose that one function is enough. Then the new one could be called fva() it is more clear I think. And the old one should be made private. Actually it is even simpler if there exist only one function fva() with the new interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comments hold for fva.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit tricky as the original fva returns a set of 4 things that we use in other parts of the PolytopeSampler class. Yet, I changes those parts with an _fva and _fba and we now have fva() and fba() to return the dataframes. 😉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but as the name suggests _fva and _fba should be private, right?

Also, I still cannot see the need of two fva (and fba) functions. I think it complicates the interface. Is it possible to keep the old functions and then the user (if needed) compute a dataframe on demand (it is one line of code).

```

returns the solution is an easier to query dataframe with the model's reactions as indices:
```python
>>> fva_output_df
minimum maximum
PFK 7.477306 7.477485
PFL 0.000000 0.000066
PGI 4.860632 4.861110
PGK -16.023610 -16.023451
PGL 4.959736 4.960214
>>>
>>> df.loc["NH4t"]
minimum 4.765316
maximum 4.765324
Name: NH4t, dtype: float64
```


To apply FBA method,

```python
@@ -229,6 +250,16 @@ max_biomass_objective = fba_output[1]

while the output vectors are the same with the previous example.

Again, one may use the `fba_to_df()` to get the solution as a pandas dataframe with model's reactions as indices.
```python
>>> model.fba_to_df()
fluxes
PFK 7.477382
PFL 0.000000
PGI 4.860861
PGK -16.023526
PGL 4.959985
```


### Set the restriction in the flux space
@@ -266,6 +297,46 @@ sampler = polytope_sampler(model)
steady_states = sampler.generate_steady_states()
```

### Change the medium on your model

To do that, you need to first describe the new medium and then assign it on your `model`.
For example:

```python
initial_medium = model.medium
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be simplified avoiding to copy the entire dictionary (see #95 (comment))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check for my reply on the issue.
if you have any ideas for alternatives, let me know!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see so there is still an issue in setting the model correctly. Should we address it in a different PR?

model.medium
# {'EX_co2_e': 1000.0, 'EX_glc__D_e': 10.0, 'EX_h_e': 1000.0, 'EX_h2o_e': 1000.0, 'EX_nh4_e': 1000.0, 'EX_o2_e': 1000.0, 'EX_pi_e': 1000.0}
model.fba()[-1]
# 0.8739215069684305
new_medium = initial_medium.copy()

# Set anoxygenic conditions
new_medium['EX_o2_e'] = 0
model.medium = new_medium
model.fba()[-1]

# Check the difference in the optimal value
# 0.21166294973531055
```

### Who-is-who

Models may use ids for metabolites and reactions hard to interpret.
You may use the `reactions_map` and the `metabolites_map` that return the reactions/metabolites ids along with their corresponding names.
For example:

```python
>>> model.reactions_map
reaction_name
PFK Phosphofructokinase
PFL Pyruvate formate lyase
PGI Glucose-6-phosphate isomerase
PGK Phosphoglycerate kinase
PGL 6-phosphogluconolactonase
```





### Plot flux marginals
@@ -282,8 +353,8 @@ steady_states = sampler.generate_steady_states(ess = 3000)
# plot the histogram for the 14th reaction in e-coli (ACONTa)
reactions = model.reactions
plot_histogram(
steady_states[13],
reactions[13],
steady_states.loc["ACONTa"],
"ACONTa",
n_bins = 60,
)
```
@@ -306,8 +377,8 @@ steady_states = sampler.generate_steady_states(ess = 3000)
# plot the copula between the 13th (PPC) and the 14th (ACONTa) reaction in e-coli
reactions = model.reactions

data_flux2=[steady_states[12],reactions[12]]
data_flux1=[steady_states[13],reactions[13]]
data_flux2=[steady_states.loc["ACONTa"], "ACONTa"]
data_flux1=[steady_states.loc["PPC"], "PPC"]

plot_copula(data_flux1, data_flux2, n=10)
```
118 changes: 99 additions & 19 deletions dingo/MetabolicNetwork.py
Original file line number Diff line number Diff line change
@@ -7,12 +7,15 @@
# Licensed under GNU LGPL.3, see LICENCE file

import numpy as np
import pandas as pd
import sys
from typing import Dict
import cobra
from dingo.loading_models import read_json_file, read_mat_file, read_sbml_file, parse_cobra_model
from dingo.fva import slow_fva
from dingo.fba import slow_fba
import logging
logger = logging.getLogger(__name__)

try:
import gurobipy
@@ -33,10 +36,12 @@ def __init__(self, tuple_args):
import gurobipy

self._parameters["fast_computations"] = True
self._parameters["tol"] = 1e-06
except ImportError as e:
self._parameters["fast_computations"] = False
self._parameters["tol"] = 1e-03

if len(tuple_args) != 10:
if len(tuple_args) != 12:
raise Exception(
"An unknown input format given to initialize a metabolic network object."
)
@@ -51,6 +56,8 @@ def __init__(self, tuple_args):
self._medium = tuple_args[7]
self._medium_indices = tuple_args[8]
self._exchanges = tuple_args[9]
self._reactions_map = tuple_args[10]
self._metabolites_map = tuple_args[11]

try:
if self._biomass_index is not None and (
@@ -104,34 +111,55 @@ def from_cobra_model(cls, arg):

return cls(parse_cobra_model(arg))

def fva(self):
def _fva(self):
"""A member function to apply the FVA method on the metabolic network."""

if self._parameters["fast_computations"]:
return fast_fva(
min_fluxes, max_fluxes, max_biomass_flux_vector, max_biomass_objective = fast_fva(
self._lb,
self._ub,
self._S,
self._biomass_function,
self._parameters["opt_percentage"],
)
else:
return slow_fva(
min_fluxes, max_fluxes, max_biomass_flux_vector, max_biomass_objective = slow_fva(
self._lb,
self._ub,
self._S,
self._biomass_function,
self._parameters["opt_percentage"],
)
self._min_fluxes = min_fluxes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need them to be stored in the model?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thought was about the sample_from_fva_output function that used to get as arguments the min_fluxes and the max_fluxes.

At first, I thought of keeping those in the model to provide them in the function but then I thought it would be simpler if the fva is performed with the sample_from_fva_output function.

Yet, maybe it's practical to keep them. No strong arguments on that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was lost a bit. Are those class variables used even after that changes of this PR? As far as understand sample_from_fva_output function is not using them.

self._max_fluxes = max_fluxes
self._opt_vector = max_biomass_flux_vector
self._opt_value = max_biomass_objective
return min_fluxes, max_fluxes, max_biomass_flux_vector, max_biomass_objective

def fba(self):
def _fba(self):
"""A member function to apply the FBA method on the metabolic network."""

if self._parameters["fast_computations"]:
return fast_fba(self._lb, self._ub, self._S, self._biomass_function)
opt_vector, opt_value = fast_fba(self._lb, self._ub, self._S, self._biomass_function)
else:
return slow_fba(self._lb, self._ub, self._S, self._biomass_function)
opt_vector, opt_value = slow_fba(self._lb, self._ub, self._S, self._biomass_function)
self._opt_vector = opt_vector
self._opt_value = opt_value
return opt_vector, opt_value

def fba(self):
if not hasattr(self, '_opt_vector'):
self._fba()
fba_df = pd.DataFrame({'fluxes': self._opt_vector}, index=self._reactions)
return fba_df

def fva(self):
if not hasattr(self, '_min_fluxes'):
self._fva()
fva_df = pd.DataFrame({'minimum': self._min_fluxes, 'maximum': self._max_fluxes}, index=self._reactions)
return fva_df

# Descriptors
@property
def lb(self):
return self._lb
@@ -172,6 +200,30 @@ def exchanges(self):
def parameters(self):
return self._parameters

@property
def reactions_map(self):
return self._reactions_map

@property
def metabolites_map(self):
return self._metabolites_map

@property
def opt_value(self):
return self._opt_value

@property
def opt_vector(self):
return self._opt_vector

@property
def min_fluxes(self):
return self._min_fluxes

@property
def max_fluxes(self):
return self._max_fluxes

@property
def get_as_tuple(self):
return (
@@ -183,16 +235,11 @@ def get_as_tuple(self):
self._biomass_index,
self._biomass_function,
self._medium,
self._inter_medium,
self._exchanges
self._exchanges,
self._reactions_map,
self._metabolites_map
)

def num_of_reactions(self):
return len(self._reactions)

def num_of_metabolites(self):
return len(self._metabolites)

@lb.setter
def lb(self, value):
self._lb = value
@@ -221,7 +268,6 @@ def biomass_index(self, value):
def biomass_function(self, value):
self._biomass_function = value


@medium.setter
def medium(self, medium: Dict[str, float]) -> None:
"""Set the constraints on the model exchanges.
@@ -275,17 +321,51 @@ def set_active_bound(reaction: str, reac_index: int, bound: float) -> None:
# Turn off reactions not present in media
for rxn_id in exchange_rxns - frozen_media_rxns:
"""
is_export for us, needs to check on the S
order reactions to their lb and ub
is_export for us, needs to check on the S
order reactions to their lb and ub
"""
# is_export = rxn.reactants and not rxn.products
reac_index = self._reactions.index(rxn_id)
products = np.any(self._S[:,reac_index] > 0)
products = np.any(self._S[:,reac_index] > 0)
reactants_exist = np.any(self._S[:,reac_index] < 0)
is_export = True if not products and reactants_exist else False
set_active_bound(
rxn_id, reac_index, min(0.0, -self._lb[reac_index] if is_export else self._ub[reac_index])
)
self._medium = medium
self._opt_vector = None

@reactions_map.setter
def reactions_map(self, value):
self._reactions_map = value

@metabolites_map.setter
def metabolites_map(self, value):
self._metabolites_map = value

@min_fluxes.setter
def min_fluxes(self, value):
self._min_fluxes = value

@max_fluxes.setter
def max_fluxes(self, value):
self._max_fluxes = value

@opt_value.setter
def opt_value(self, value):
self._opt_value = value

@opt_vector.setter
def opt_vector(self, value):
self._opt_vector = value


# Attributes
def num_of_reactions(self):
return len(self._reactions)

def num_of_metabolites(self):
return len(self._metabolites)

def set_fast_mode(self):

Loading