Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pan draft #205

Merged
merged 19 commits into from
Mar 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
2e808c5
first triall addition of pan-draft.R
nicola-debernardini Dec 1, 2023
6255e07
add the functions for pan-Draft.R
nicola-debernardini Dec 1, 2023
e050843
add option to call the panDraft module
nicola-debernardini Dec 6, 2023
511d94c
add files for panDraft example
nicola-debernardini Dec 6, 2023
d1951b6
add plotting functions for panDraft extended output
nicola-debernardini Dec 6, 2023
479126b
add missing files for panDraft example, RDS were ignored
nicola-debernardini Dec 6, 2023
ae391db
added documentation panDraft.md
nicola-debernardini Dec 7, 2023
45ee2f0
minor changes on index.rst
nicola-debernardini Dec 7, 2023
eda43b9
micropan installation on gapseq environment
nicola-debernardini Dec 7, 2023
503af0e
modified updating rxn weigths strategy: brought it back from varying …
nicola-debernardini Feb 8, 2024
97322c3
minor changes
nicola-debernardini Feb 8, 2024
a3c2a3c
moved files from the subfolder into the toy folder
nicola-debernardini Mar 25, 2024
cde9116
added the gapseq version to the description of pan.mod
nicola-debernardini Mar 26, 2024
b26b72c
remove the plotting options and the associated libraries from the env…
nicola-debernardini Mar 26, 2024
2890428
added new input options (comma separated files, wildcards and directo…
nicola-debernardini Mar 27, 2024
222bdc9
add files to toy/ excluded due to gitignore
nicola-debernardini Mar 27, 2024
f272d60
.gitignore back to original
nicola-debernardini Mar 27, 2024
56ec2ec
minor changes to input file handling
jotech Mar 27, 2024
c246303
minor changes: fix interpretation of extended options
nicola-debernardini Mar 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ Installation
/usage/medium
/usage/output
/usage/adapt
/usage/panDraft


.. Database
Expand Down
62 changes: 62 additions & 0 deletions docs/usage/panDraft.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# panDraft: species-level models

When dealing with environmental samples, it's common for Metagenome-Assembled Genomes (MAGs) obtained from genome centric metagenomics to be incomplete and contaminated. Consequently, Genome-Scale Models (GEMs) derived from these incomplete MAGs lack a substantial portion of the metabolic potential of the corresponding species. In response, once a set of draft models is reconstructed, they can be conbined into a comprehensive model specific to the taxonomic group. This approach aims to fill the gaps present in individual models by combining homology-based searches and pan-reactome analysis.

Using `gapseq pan`, draft species-level models can be generated to enhance the representativeness of the taxa (species) metabolism.

## Build the panDraft model of a species

The files required by *pan-Draft* include the draft models of the MAGs of interest (.RDS), the corresponding reaction weight tables (-rxnWeights.RDS), the gene-to-reaction association tables (-rxnXgenes.RDS), and the pathways tables (-all-Pathways.tbl). Here are two examples of how to run *pan-Draft* using four models for MAGs of *Faecalibacillus intestinalis*. These genomes were selected for demonstration purposes of the module's operation.

Input can be provided as:
1) lists of filepaths separated by commas
2) filepaths using wildcards
3) path to folders containing the desired files
```
# Option 1:
./gapseq pan -m toy/MGYG000008797-draft.RDS,toy/MGYG000009889-draft.RDS,toy/MGYG000167774-draft.RDS,toy/MGYG000173413-draft.RDS -c toy/MGYG000008797-rxnWeights.RDS,toy/MGYG000009889-rxnWeights.RDS,toy/MGYG000167774-rxnWeights.RDS,toy/MGYG000173413-rxnWeights.RDS -g toy/MGYG000008797-rxnXgenes.RDS,toy/MGYG000009889-rxnXgenes.RDS,toy/MGYG000167774-rxnXgenes.RDS,toy/MGYG000173413-rxnXgenes.RDS -w toy/MGYG000008797-all-Pathways.tbl.gz,toy/MGYG000009889-all-Pathways.tbl.gz,toy/MGYG000167774-all-Pathways.tbl.gz,toy/MGYG000173413-all-Pathways.tbl.gz

# Option 2:
./gapseq pan -m toy/M*-draft.RDS -c toy/M*-rxnWeights.RDS -g toy/M*-rxnXgenes.RDS -w toy/M*-all-Pathways.tbl.gz

# Option 3:
mkdir toy/panDraft
mv toy/MGY* toy/panDraft
./gapseq pan -m toy/panDraft -c toy/panDraft -g toy/panDraft -w toy/panDraft
```
The tool generates several outputs, including the draft model of the taxon (`panModel-draft.RDS`), an updated version of the weights (`panModel-rxnWeigths.RDS`), gene-to-reaction (`panModel-rxnXgenes.RDS`) and pathway table (`panModel-tmp-Pathways.tbl`), statistics on the species reactome features (`pan-reactome_stat.tsv`), and a binary matrix summarizing the presence/absence of reactions in each GEMs (`rxnXmod.tsv`).

Additionally, you can opt to perform a simple reactome comparison using the option `--only.binary.rxn.tbl`. This will generate only the binary matrix summarizing the presence/absence of reactions.
```
./gapseq pan -m toy/M*-draft.RDS --only.binary.rxn.tbl TRUE
```
### Suggested number MAGs and mrf threshold value

We recommend utilizing a minimum of 30 MAGs for the reconstruction of a panDraft draft. However, there is no specified lower limit to this number.
Clearly, the minimum reaction frequency threshold (mrf), which determines whether a reaction should be included in the species-level model, is meaningful only when the number of MAGs used is above a certain minimum number. The parameter (mrf) can take values between 0 and 1 (default 0.07). A value of 0 means that all reactions present in any of the input models will be included in the draft model, while a value of 1 means that only reactions present in all input models will be included.

The mrf threshold can be modified using the option `--min.rxn.freq.in.mods`.
```
./gapseq pan -m toy/M*-draft.RDS -c toy/M*-rxnWeights.RDS -g toy/M*-rxnXgenes.RDS -w toy/M*-all-Pathways.tbl.gz --min.rxn.freq.in.mods 0.15
```

## Integration with further features of gapseq

### Generate a functional model

After reconstructing the draft species-level models of the taxa of interest, the gapseq pipeline can proceed with the gap filling step. In this case, the input to the `gapseq fill` module should be the updated version of the reaction weights and reactions to gene tables.
```
./gapseq fill -m toy/panModel-draft.RDS -c toy/panModel-rxnWeigths.RDS -g toy/panModel-rxnXgenes.RDS -n dat/media/TSBmed.csv
```
The resulting model is functional and can be used for further metabolic model simulations.

### Predict the growing medium

The output of **pan-Draft** can be also integrated with the module `gapseq medium` to predict the growing medium of the species-level model. Here it is an example:
```
# Using the species-level draft
./gapseq medium -m panModel-draft.RDS -p panModel-tmp-Pathways.tbl

# Using the species-level functional model
./gapseq medium -m panModel.RDS -p panModel-tmp-Pathways.tbl
```
11 changes: 9 additions & 2 deletions gapseq
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,10 @@ usage()
echo " gapseq find (-p pathways | -e enzymes) [-b bitscore] (genome)"
echo " gapseq find-transport [-b bitscore] (genome)"
echo " gapseq draft (-r reactions | -t transporter -c genome -p pathways) [-b pos|neg|archaea|auto]"
echo " gapseq medium (-m draft -p pathways) [-c manual_fluxes - o output_file]"
echo " gapseq medium (-m draft -p pathways) [-c manual_fluxes -o output_file]"
echo " gapseq fill (-m draft -n medium -c rxn_weights -g rxn_genes)"
echo " gapseq adapt (-a reactions/pathways | -r reactions/pathways| -w growh_compounds) -m model (-g rxn_genes, -c rxn_weights, -b reaction_blast_file)"
echo " gapseq pan (-m draft_list -c rxn_weights_list -g rxn_genes_list -w pathways_list)"
echo -e "\nExamples:"
echo " gapseq test"
echo " gapseq doall toy/ecoli.fna.gz"
Expand All @@ -41,6 +42,7 @@ usage()
echo " gapseq fill -m toy/ecoli-draft.RDS -n dat/media/ALLmed.csv -c toy/ecoli-rxnWeights.RDS -g toy/ecoli-rxnXgenes.RDS"
echo " gapseq adapt -a 14DICHLORBENZDEG-PWY -m toy/myb71.RDS"
echo " gapseq adapt -m toy/myb71.RDS -w cpd00089:TRUE -c toy/myb71-rxnWeights.RDS -g toy/myb71-rxnXgenes.RDS -b toy/myb71-all-Reactions.tbl"
echo " gapseq pan -m toy/MGYG000*-draft.RDS -c toy/MGYG000*-rxnWeights.RDS -g toy/MGYG000*-rxnXgenes.RDS -w toy/MGYG000*.tbl.gz"
echo -e "\nOptions:"
echo " test Testing dependencies and basic functionality of gapseq."
echo " find Pathway analysis, try to find enzymes based on homology."
Expand All @@ -50,6 +52,7 @@ usage()
echo " fill Gap filling of a model."
echo " doall Combine find, find-transport, draft, (medium,) and fill."
echo " adapt Add or remove reactions or pathways."
echo " pan Reconstruct a pan-Draft from a list of models."
echo " -v Show version."
echo " -h Show this screen."
echo " -n Enable noisy verbose mode."
Expand All @@ -75,7 +78,7 @@ if [[ "$1" == "doall" ]]; then
# If yes set OPTIND 2 to avoid reading "gapseq doall" and to parse all the options in the following while.
subcommand=$1
OPTIND=2
elif [[ "$1" =~ ^(test|find|find-transport|draft|adapt|medium|fill)$ ]]; then
elif [[ "$1" =~ ^(test|find|find-transport|draft|adapt|medium|fill|pan)$ ]]; then
# If another command line (like find), the options are not parsed with getopts (as they will be parsed in their own script).
subcommand=$1
parse_options=false
Expand Down Expand Up @@ -147,6 +150,10 @@ elif [ "$subcommand" == "fill" ]; then
shift
Rscript $dir/src/gf.suite.R "$@"

elif [ "$subcommand" == "pan" ]; then
shift
Rscript $dir/src/pan-draft.R "$@"

elif [ "$subcommand" == "doall" ]; then
parm=$(echo $@ | sed 's/doall//')
file=$(readlink -f $2)
Expand Down
2 changes: 1 addition & 1 deletion gapseq_env.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,4 +36,4 @@ dependencies:
- r-renv
- r-glpkapi
- r-rcurl
- r-httr
- r-httr
Loading