Skip to content

Subcommand: imbalance kmeans

Lucas Czech edited this page Aug 9, 2020 · 10 revisions

Run Imbalance k-means clustering on a set of samples.

Usage: gappa analyze imbalance-kmeans [options]

Options

Input
--jplace-path Required. TEXT:PATH(existing)=[] ...
List of jplace files or directories to process. For directories, only files with the extension `.jplace[.gz]` are processed.
Settings
--k Required. TEXT
Number of clusters to find. Can be a comma-separated list of multiple values or ranges for k: 1-5,8,10,12
--write-overview-file If provided, a table file is written that summarizes the average distance and variance of the clusters for each k. Useful for elbow plots.
--point-mass Treat every pquery as a point mass concentrated on the highest-weight placement.
--ignore-multiplicities Set the multiplicity of each pquery to 1.
Color
--color-list TEXT=BuPuBk
List of colors to use for the palette. Can either be the name of a color list, a file containing one color per line, or an actual list of colors.
--reverse-color-list If set, the --color-list is reversed.
--log-scaling If set, the sequential color list is logarithmically scaled instead of linearily.
Tree Output
--write-newick-tree If set, the tree is written to a Newick file.
--write-nexus-tree If set, the tree is written to a Nexus file.
--write-phyloxml-tree If set, the tree is written to a Phyloxml file.
--write-svg-tree If set, the tree is written to a Svg file.
Svg Tree Output
--svg-tree-shape TEXT:{circular,rectangular}=circular
Shape of the tree.
--svg-tree-type TEXT:{cladogram,phylogram}=cladogram
Type of the tree.
--svg-tree-stroke-width FLOAT=5
Svg stroke width for the branches of the tree.
--svg-tree-ladderize If set, the tree is ladderized.
Output
--out-dir TEXT=.
Directory to write files to
--file-prefix TEXT=ikmeans_
File prefix for output files
Global Options
--allow-file-overwriting Allow to overwrite existing output files instead of aborting the command.
--verbose Produce more verbose output.
--threads UINT
Number of threads to use for calculations.
--log-file TEXT
Write all output to a log file, in addition to standard output to the terminal.

Description

Imbalance k-means has almost the same usage as Phylogenetic k-means. See there for details. The difference is in the distance measure being used, which is a simple Euclidean distance of the edge imbalances of the samples, instead of using the more involved Phylogenetic KR distance between samples.

Citation

When using this method, please do not forget to cite

Lucas Czech, Pierre Barbera, Alexandros Stamatakis. Genesis and Gappa: Processing, Analyzing and Visualizing Phylogenetic (Placement) Data. Bioinformatics, 2020. doi:10.1093/bioinformatics/btaa070

Lucas Czech, Alexandros Stamatakis. Scalable Methods for Analyzing and Visualizing Phylogenetic Placement of Metagenomic Samples. PLOS ONE, 2019. doi:10.1371/journal.pone.0217050

Clone this wiki locally