Skip to content

Subcommand: squash

Lucas Czech edited this page Jan 21, 2021 · 13 revisions

Perform Squash Clustering for a set of samples.

Usage: gappa analyze squash [options]

Options

Input
--jplace-path Required. TEXT:PATH(existing)=[] ...
List of jplace files or directories to process. For directories, only files with the extension `.jplace[.gz]` are processed.
Settings
--exponent FLOAT=1
Exponent for KR integration.
--point-mass Treat every pquery as a point mass concentrated on the highest-weight placement.
--ignore-multiplicities Set the multiplicity of each pquery to 1.
Color
--color-list TEXT=BuPuBk
List of colors to use for the palette. Can either be the name of a color list, a file containing one color per line, or an actual list of colors.
--reverse-color-list If set, the `--color-list` is reversed.
--log-scaling If set, the sequential color list is logarithmically scaled instead of linearily.
Output
--out-dir TEXT=.
Directory to write files to
--file-prefix TEXT
File prefix for output files
--file-suffix TEXT
File suffix for output files
Tree Output
--write-newick-tree If set, the tree is written to a Newick file.
--write-nexus-tree If set, the tree is written to a Nexus file.
--write-phyloxml-tree If set, the tree is written to a Phyloxml file.
--write-svg-tree If set, the tree is written to a Svg file.
Svg Tree Output
--svg-tree-shape TEXT:{circular,rectangular}=circular
Shape of the tree.
--svg-tree-type TEXT:{cladogram,phylogram}=cladogram
Type of the tree.
--svg-tree-stroke-width FLOAT=5
Svg stroke width for the branches of the tree.
--svg-tree-ladderize If set, the tree is ladderized.
Global Options
--allow-file-overwriting Allow to overwrite existing output files instead of aborting the command.
--verbose Produce more verbose output.
--threads UINT
Number of threads to use for calculations.
--log-file TEXT
Write all output to a log file, in addition to standard output to the terminal.

Description

Performs Squash Clustering. The command is a re-implementation of guppy squash, see there for more details.

Details

The main output of the command is a cluster hierarchy tree that shows which input jplace samples are clustered close to each other. Although the tree is written to Newick format, it is not a phylogeny, as its tips represent samples (jplace files). The inner node labels are numbered consecutively starting at n, with n being the number of samples used as input.

If the --write-...-tree options are used, the mass trees representing the samples (tips of the cluster tree) and the mass trees of the inner nodes (average masses of the corresponding tips) are written for visualization. Their numbering is 0 to n-1 for the tips (samples), and n to 2n-2 for the inner nodes (cluster averages). These trees can help to explore how and why the samples were clustered during the algorithm.

Citation

When using this method, please do not forget to cite

Lucas Czech, Pierre Barbera, Alexandros Stamatakis. Genesis and Gappa: Processing, Analyzing and Visualizing Phylogenetic (Placement) Data. Bioinformatics, 2020. doi:10.1093/bioinformatics/btaa070

Frederick Matsen, Steven Evans. Edge Principal Components and Squash Clustering: Using the Special Structure of Phylogenetic Placement Data for Sample Comparison. PLOS ONE, 2013. doi:10.1371/journal.pone.0056859

Clone this wiki locally