-
Notifications
You must be signed in to change notification settings - Fork 7
Home
gappa is a collection of commands for working with phylogenetic data. Its main focus are evolutionary placements of short environmental sequences on a reference phylogenetic tree. See Phylogenetic Placement for an introduction describing a typical pipeline.
Many commands in gappa are implementations of our novel methods. At the same time, it offers some commands that are also implemented in the excellent guppy tool. However, being written in C++, our gappa is much faster and needs less memory for most of the tasks.
gappa is used via its command line interface, with subcommands for each task. The commands have the general structure:
gappa <module> <subcommand> <options>
The modules are simply a way of organizing the commands.
- Module
analyze
: Analyze and compare differentjplace
files, that is, find differences and patterns between different samples. - Module
edit
: Edit, manipulate, and transform files in different formats. - Module
examine
: Examine, visualize, and tabulate information in files. - Module
prepare
: Prepare and generate data and files needed to run typical pipelines and analyses.
Commands for analyzing and comparing placement data, that is, finding differences and patterns.
Subcommand | Description |
---|---|
correlation | Calculate the Edge Correlation of samples and metadata features. |
dispersion | Calculate the Edge Dispersion between samples. |
edgepca | Perform Edge PCA (Principal Component Analysis) for a set of samples. |
imbalance-kmeans | Run Imbalance k-means clustering on a set of samples. |
krd | Calculate the pairwise Kantorovich-Rubinstein (KR) distance matrix between samples. |
phylogenetic-kmeans | Run Phylogenetic k-means clustering on a set of samples. |
placement-factorization | Perform Placement-Factorization on a set of samples. |
squash | Perform Squash Clustering for a set of samples. |
Commands for editing and manipulating files like jplace, fasta or newick.
Subcommand | Description |
---|---|
accumulate | Accumulate the masses of each query in jplace files into basal branches so that they exceed a given mass threshold. |
filter | Filter jplace files according to some criteria, that is, remove all queries and/or placement locations that do not pass the provided filter(s). |
merge | Merge jplace files by combining their pqueries into one file. |
multiplicity | Edit the multiplicities of queries in jplace files. |
split | Split the queries in jplace files into multiple files, for example, according to an OTU table. |
Commands for examining, visualizing, and tabulating information in placement data.
Subcommand | Description |
---|---|
assign | Taxonomically assign placed query sequences and output tabulated summarization. |
edpl | Calcualte the Expected Distance between Placement Locations (EDPL) for all pqueries. |
graft | Make a tree with each of the query sequences represented as a pendant edge. |
heat-tree | Make a tree with edges colored according to the placement mass of the samples. |
info | Print basic information about placement files. |
lwr-distribution | Print a summary table that represents the distribution of the likelihood weight ratios (LWRs) of all pqueries. |
lwr-histogram | Print a table with histograms of the likelihood weight ratios (LWRs) of all pqueries. |
lwr-list | Print a list of all pqueries with their likelihood weight ratios (LWRs). |
Commands for preparing and preprocessing of phylogenetic and placement data.
Subcommand | Description |
---|---|
chunkify | Chunkify a set of fasta files and create abundance maps. |
clean-tree | Clean a tree in Newick format by removing parts that other parsers have difficulties with. |
extract | Extract placements from clades of the tree and write per-clade jplace files. |
phat | Generate consensus sequences from a sequence database according to the PhAT method. |
taxonomy-tree | Turn a taxonomy into a tree that can be used as a constraint for tree inference. |
unchunkify | Unchunkify a set of jplace files using abundance map files and create per-sample jplace files. |
Commands for random generation of phylogenetic and placement data.
Subcommand | Description |
---|---|
random-alignment | Create a random alignment with a given numer of sequences of a given length. |
random-placements | Create a set of random phylogenetic placements on a given reference tree. |
random-tree | Create a random tree with a given numer of leaf nodes. |
Auxiliary commands of gappa.
Subcommand | Description |
---|---|
citation | Print references to be cited when using gappa. |
license | Show the license of gappa. |
version | Extended version information about gappa. |
Module analyze
- correlation
- dispersion
- edgepca
- imbalance-kmeans
- krd
- phylogenetic-kmeans
- placement-factorization
- squash
Module edit
Module examine
Module prepare
Module simulate
Module tools