Skip to content

Subcommand: imbalance kmeans

Lucas Czech edited this page Jun 8, 2018 · 10 revisions

Run Imbalance k-means clustering on a set of samples.

Usage: gappa analyze imbalance-kmeans [options]

Options

Input
--jplace-path Required. TEXT ...
List of jplace files or directories to process. For directories, only files with the extension .jplace are processed.
Settings
--k Required. TEXT
Number of clusters to find. Can be a comma-separated list of multiple values or ranges for k: 1-5,8,10,12
--write-overview-file If provided, a table file is written that summarizes the average distance and variance of the clusters for each k. Useful for elbow plots.
--point-mass Treat every pquery as a point mass concentrated on the highest-weight placement.
--ignore-multiplicities Set the multiplicity of each pquery to 1.
Color
--color-list TEXT=BuPuBk
List of colors to use for the palette. Can either be the name of a color list, a file containing one color per line, or an actual list of colors.
--reverse-color-list If set, the --color-list is reversed.
--log-scaling If set, the sequential color list is logarithmically scaled instead of linearily.
Tree Output
--write-newick-tree If set, the tree is written to a Newick file.
--write-nexus-tree If set, the tree is written to a Nexus file.
--write-phyloxml-tree If set, the tree is written to a Phyloxml file.
--write-svg-tree If set, the tree is written to a Svg file.
Svg Tree Output
--svg-tree-shape TEXT in {circular,rectangular}=circular
Shape of the tree.
--svg-tree-type TEXT in {cladogram,phylogram}=cladogram
Type of the tree.
--svg-tree-stroke-width FLOAT=5
Svg stroke width for the branches of the tree.
--svg-tree-ladderize If set, the tree is ladderized.
Output
--out-dir TEXT=.
Directory to write files to
--file-prefix TEXT=ikmeans_
File prefix for output files

Description

Imbalance k-means has almost the same usage as Phylogenetic k-means. See there for details. The difference is in the distance measure being used, which is a simple Euclidean distance of the edge imbalances of the samples, instead of using the more involved Phylogenetic KR distance between samples.

Clone this wiki locally