Fasttree python implementation

Fasttree is an algorithm that can be used to create a Phylogenetic tree from just a dataset of genomes. It is specically designed to run faster than the standard nearest neighbor algorithm, which has trouble to create a Phylogentic tree for large genome datasets.

Our implementation of the algorithm is based on the first version of the paper that describes it. We tried to stay as close as possible to the original implementation, but had to deviate from the implementation as implementation details were not often clear.

Setup

We have designed the python implementation with easy of use in mind, after running the requiremetns file with pip install -r requirements.txt the algorithm is ready to be used. The python version should be greater than 3.10

The algorithm can be used via by just running

python main.py [-h] [--input INPUT] [--num NUM] [--nBootstrap NBOOTSTRAP] [--output_img OUTPUT_IMG] [--output_nwk OUTPUT_NWK]
options:
  -h, --help            show this help message and exit
  --input INPUT         Input sequences
  --num NUM             number of sequences, used for large files
  --nBootstrap NBOOTSTRAP
                        bootstrap iterations
  --output_img OUTPUT_IMG
                        path to save the output fasttree image
  --output_nwk OUTPUT_NWK
                        path to save the output fasttree file in newick format

Datasets

A few different datasets are included in this repository, which can be found in the data folder. They considerably differ in length. The visualization at the bottom of the readme is based on the fasttree-input.aln file and is proven to be a correct Phylogenetic tree validating this algorithm.

The algorithm

Parsing data

By default .aln files are supported and can readily be parsed, we extended the algorithm to also support .fna files to support a boarder range of datasets.

Neighborjoining

Neighbor joining is where the biggest difference can be found between the classic neighbor joining algorithm and the fasttree algorithm. Namely fasttree will make use of profiles instead of raw distance calculation, which are being used in the standard neigbor joining algorithms.

The added benefit of using dna profiles is that it gives a probalistic insight in how likely a certain nucleotide can be located at a certain location.

NNI

Nearest neighbor interchange (NNI) is a tree rearrange to find a more optimal tree layout. It is often used to identify the tree that best explains the evolutionary history of a specie among a large amount of trees.

During the tree rearrange phase of the algorithm for a binary Phylogentic tree, two cases can happen.

The first case is where it discovers four different subtrees, it will then calculate the distance between each different combination of subtrees with the help of the log corrected profile distance.
The second case is where it discoveres only tree subtrees, this mostly happens when the discoverer is close to the leaf nodes. It still has the compare the same number of subtrees, but has to choose how to rearrange them in an orderly manner.

Local bootstrapping

Local bootstrapping helps with validating a certain split decision, the split is recalculated N times with differently sampled values, giving and accurate value for the likelihood of a certain split happening.

Appendix

Detailed formulas about distance(also including gaps) can be found in docs/fasttree.md

Results

The final Phylogentic tree that was calculated on the sample dataset with our implementation is given below, it shows the branch distance from each internal node (a common ancestor) to a leaf. Note that the circular tree generation is not actually implemented in the code but can easily be achieved while using the newick format, which is generated in the results forled.

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.vscode		.vscode
data		data
docs		docs
images		images
results		results
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fasttree python implementation

Setup

Datasets

The algorithm

Parsing data

Neighborjoining

NNI

Local bootstrapping

Appendix

Results

About

Releases

Packages

Contributors 3

Languages

License

trdvangraft/fasttree

Folders and files

Latest commit

History

Repository files navigation

Fasttree python implementation

Setup

Datasets

The algorithm

Parsing data

Neighborjoining

NNI

Local bootstrapping

Appendix

Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages