Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
amarinderthind authored Feb 1, 2024
1 parent 405f3e2 commit b104cb0
Showing 1 changed file with 11 additions and 2 deletions.
13 changes: 11 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,19 @@
# CGRphylo Pipeline: Chaos Game Representation for Phylogeny

A CGRphylo pipeline combines the R core module with various packages to compare multiple whole genome sequences using Chaos Game Representation (CGR). CRG core function creates the frequencies object for each sequence which can be used to calculate distances among sequences. Later, CGR-based distance matrices can be converted to a phylogeny tree using neighbour-joining (NJ) or other methods. A major advantage of the CGRphylo R pipeline is its ability to handle large DNA sequences (per a user machine) and its effectiveness at classifying very similar sequences.
## Why CGRphylo?

✨ Precision: Nail the classification of closely related viral strains and recombinants.
✨ Speed: Rapid analysis and classification of massive viral genome datasets.
✨ Needs low-resource settings.

In the world of viral genomics, CGRphylo stands out by accurately classifying closely related viral strains, including tricky recombinants. Imagine its power during epidemic outbreaks, where thousands of viral sequences overwhelm resources. CGRphylo is the superhero pipeline that steps in – efficient, accessible, and designed for both high and low-resource settings.

## Computational efficiency

We assessed the computational efficiency of CGRphylo against Clustal-Omega for multiple sequence alignments (MSA). CGRphylo processed 69 SARS-CoV-2 genomes 5 times faster than Clustal-Omega. For a Dataset of 106 genomes, CGRphylo outpaced Clustal-Omega by a remarkable 13.7 times. In the context of MSA, the computational cost increases as additional sequences require pairwise comparisons with all other sequences, making the process more computationally intensive as the dataset grows. However, this is not the case for CGRphylo. The addition of one sequence simply requires the addition of one frequency matrix calculation and distance calculations are not computationally expensive.
🚀 CGRphylo processed 69 SARS-CoV-2 genomes 5 times faster than Clustal-Omega.
🌐 But wait, there's more! For a dataset of 106 genomes, CGRphylo outpaced Clustal-Omega by an incredible 13.7 times.

In the world of MSAs, computational costs skyrocket as datasets grow. Not for CGRphylo! Adding a sequence is a breeze – just one frequency matrix calculation, breaking free from the computational intensity that others face.

## Cite this pipeline as:
Thind Singh Amarinder and Sinha Somdatta*, Using Chaos-Game-Representation for Analysing the SARS-CoV-2 Lineages, Newly Emerging Strains and Recombinants, Current Genomics 2023; 24 (3) . https://dx.doi.org/10.2174/0113892029264990231013112156
Expand Down

0 comments on commit b104cb0

Please sign in to comment.