MULTISTRAP

Boosting phylogenetic boostrap with structural information

Multistrap is a toolkit designed to calculate and combine phylogenetic bootstrap support values. It generates these support values using both sequence and structural data, and then combines them.

For more details, see the associated manuscript: "Boosting phylogenetic bootstrap with structural information".

Installation

Requirements

Nextflow version >= 23.10.

Make sure you have the right Java version installed. Instructions here.

curl -s https://get.nextflow.io | bash
chmod +x nextflow
sudo mv nextflow /usr/local/bin

Either Docker version >= 20.10 or Singularity version >= 3.7.

! Remember to start Docker before starting the pipeline.

Multistrap was tested on Scientific Linux release 7.2.

Get Multistrap

Multistrap is distributed as a Nextflow pipeline. To obtain the source code:

curl -L -o main.zip https://github.com/l-mansouri/Phylo-IMD/archive/refs/heads/main.zip 
unzip main.zip
cd Phylo-IMD-main

or alternatively you can use wget:

wget https://github.com/l-mansouri/Phylo-IMD/archive/refs/heads/main.zip 
unzip main.zip
cd Phylo-IMD-main

On a normal Desktop computer this step should take seconds. Now you are ready to run Multistrap!

Run Multistrap

Multistrap per default will:

compute the mTMalign MSA
compute the sequence based tree and corresponding bootstrap replicates (ME or ML tree)
compute the IMD tree and corresponding bootstrap replicates
return:
- the ME (or ML) tree with:
  - the combined (multistrap) bootstrap support values
  - the sequence based support values
  - the IMD support values

Please refer to the output section for a precise description of the output file naming.

On a test dataset

nextflow run main.nf -profile multistrap,test,docker

If you want to use singularity:

nextflow run main.nf -profile multistrap,test,singularity

More

This will use the test data to run multistrap. We use --seq_tree ME as ML takes longer and this is meant to be just a basic test. replicatesNum is also set to 10, to speed up the run. In a normal Desktop computer this should take few minutes to complete.

On your dataset

To obtain the combined bootstrap support values in your own dataset please use the multistrap profile as shown in the following lines. To see how to properly prepare the input files, look into the example dataset in the data.

The command line:

nextflow run main.nf -profile multistrap -fasta <id.fasta> -templates <id.template> -pdbs mypdbs/* -seq_tree <ML|ME>

fasta is a fasta file with the sequences you want to build the tree on.
pdbs is all the pdbs associated to the sequences present in your fasta file.
templates is a file with the explicit mapping of each sequence in your fasta file and each pdb you are providing. The template files should follow the corresponding syntax (mTM-align or 3D-Coffee correspondingly). You can find examples for both in the data folder.

Output files

results/dataset_id
- msas/*.fa: alignment files.
- trees_and_replicates/: trees computed using your preferred sequence method (ME or ML) (trees/<ME|ML> folder) and the IMD trees (trees/IMD folder). Tree replicates are found in the replicates folder within the ME|ML|IMD folders respectively.
- tree_supports/ the Bootstrap support values are stored as node labels in the trees found in tree_supports folder. Here you will find one folder with:
  - the trees with the <ME|ML> topology and the <ME|ML> support values (ID_ME|ML_tree_ME|ML_bs.nwk)
  - the IMD support values (ID_ME|ML_tree_IMD_bs.nwk)
  - the multistrap support values (ID_ME|ML_tree_multistrap_bs.nwk).

Pipelines parameters

You can modify the default pipeline parameters by using:

Parameters

Input parameters
- fasta is a fasta file with the sequences you want to build the tree on.
- pdbs is all the pdbs associated to the sequences present in your fasta file.
- templates is a file with the explicit mapping of each sequence in your fasta file and each pdb you are providing. The template files should follow the corresponding syntax (mTM-align or 3D-Coffee correspondingly). You can find examples for both in the data folder.
Parameters for tree computation:
- seq_tree determines the type of sequence based tree to be computed: either ME or ML. Default: ML.
- gammaRate that determines the gamma rate for FastME tree reconstruction. Default: 1.0.
- seedValue that is the random seed for FastME tree reconstruction. Default: 5.
- replicatesNum that determines the number of bootstrap replicates. Default: 100.
- tree_mode that determines the distance mode to run the IMD distance matrix computation. Default: 10.
Output parameter:
- output that determines where to store the outputs that the pipeline publishes. Default: ./results.

Overview of the repository

For a more detailed overiview of the content of the repository please refer to overview

Analysis

In the paper we perform an extensive benchmark and produce accessory analyses to assess the robustness and validity of Multistrap. For more information on how to reproduce this please refer to analysis

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
Dockerfiles		Dockerfiles
analysis		analysis
bin		bin
conf		conf
data		data
modules		modules
subworkflows		subworkflows
workflows		workflows
.gitignore		.gitignore
Analysis.md		Analysis.md
LICENSE		LICENSE
Overview.md		Overview.md
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MULTISTRAP

Boosting phylogenetic boostrap with structural information

Installation

Requirements

Get Multistrap

Run Multistrap

On a test dataset

On your dataset

Pipelines parameters

Overview of the repository

Analysis

About

Releases 1

Packages

Contributors 2

Languages

License

l-mansouri/Phylo-IMD

Folders and files

Latest commit

History

Repository files navigation

MULTISTRAP

Boosting phylogenetic boostrap with structural information

Installation

Requirements

Get Multistrap

Run Multistrap

On a test dataset

On your dataset

Pipelines parameters

Overview of the repository

Analysis

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages