-
Notifications
You must be signed in to change notification settings - Fork 7
Subcommand: squash
Perform Squash Clustering for a set of samples.
Usage: gappa analyze squash [options]
Input | |
---|---|
--jplace-path |
Required. TEXT:PATH(existing)=[] ... List of jplace files or directories to process. For directories, only files with the extension .jplace[.gz] are processed. |
Settings | |
--exponent |
FLOAT=1 Exponent for KR integration. |
--point-mass |
FLAG Treat every pquery as a point mass concentrated on the highest-weight placement. In other words, ignore all but the most likely placement location (the one with the highest LWR), and set its LWR to 1.0. |
--ignore-multiplicities |
FLAG Set the multiplicity of each pquery to 1.0. The multiplicity is the equvalent of abundances for placements. |
Color | |
--color-list |
TEXT=BuPuBk List of colors to use for the palette. Can either be the name of a color list, a file containing one color per line, or an actual comma-separated list of colors. Colors can be specified in the format #rrggbb using hex values, or by web color names. |
--reverse-color-list |
FLAG If set, the order of colors of the --color-list is reversed. |
--log-scaling |
FLAG If set, the sequential color list is logarithmically scaled instead of linearily. |
Output | |
--out-dir |
TEXT=. Directory to write files to |
--file-prefix |
TEXT File prefix for output files |
--file-suffix |
TEXT File suffix for output files |
Tree Output | |
--write-newick-tree |
FLAG If set, the tree is written to a Newick file. |
--write-nexus-tree |
FLAG If set, the tree is written to a Nexus file. |
--write-phyloxml-tree |
FLAG If set, the tree is written to a Phyloxml file. |
--write-svg-tree |
FLAG If set, the tree is written to a Svg file. |
Svg Tree Output | |
--svg-tree-shape |
TEXT:{circular,rectangular}=circular Shape of the tree. |
--svg-tree-type |
TEXT:{cladogram,phylogram}=cladogram Type of the tree. |
--svg-tree-stroke-width |
FLOAT=5 Svg stroke width for the branches of the tree. |
--svg-tree-ladderize |
FLAG If set, the tree is ladderized. |
Global Options | |
--allow-file-overwriting |
FLAG Allow to overwrite existing output files instead of aborting the command. |
--verbose |
FLAG Produce more verbose output. |
--threads |
UINT Number of threads to use for calculations. |
--log-file |
TEXT Write all output to a log file, in addition to standard output to the terminal. |
Performs Squash Clustering. The command is a re-implementation of guppy squash
, see there for more details.
The main output of the command is a cluster hierarchy tree that shows which input jplace
samples are clustered close to each other. Although the tree is written to Newick format, it is not a phylogeny, as its tips represent samples (jplace
files). The inner node labels are numbered consecutively starting at n
, with n
being the number of samples used as input.
If the --write-...-tree
options are used, the mass trees representing the samples (tips of the cluster tree) and the mass trees of the inner nodes (average masses of the corresponding tips) are written for visualization. Their numbering is 0
to n-1
for the tips (samples), and n
to 2n-2
for the inner nodes (cluster averages). These trees can help to explore how and why the samples were clustered during the algorithm.
When using this method, please do not forget to cite
Lucas Czech, Pierre Barbera, Alexandros Stamatakis. Genesis and Gappa: Processing, Analyzing and Visualizing Phylogenetic (Placement) Data. Bioinformatics, 2020. doi:10.1093/bioinformatics/btaa070
Frederick Matsen, Steven Evans. Edge Principal Components and Squash Clustering: Using the Special Structure of Phylogenetic Placement Data for Sample Comparison. PLOS ONE, 2013. doi:10.1371/journal.pone.0056859
Module analyze
- correlation
- dispersion
- edgepca
- imbalance-kmeans
- krd
- phylogenetic-kmeans
- placement-factorization
- squash
Module edit
Module examine
Module prepare
Module simulate
Module tools