-
Notifications
You must be signed in to change notification settings - Fork 7
Subcommand: imbalance kmeans
Run Imbalance k-means clustering on a set of samples.
Usage: gappa analyze imbalance-kmeans [options]
Input | |
---|---|
--jplace-path |
Required. TEXT:PATH(existing)=[] ... List of jplace files or directories to process. For directories, only files with the extension `.jplace[.gz]` are processed. |
Settings | |
--k |
Required. TEXT Number of clusters to find. Can be a comma-separated list of multiple values or ranges for k, such as `"1-5,8,10,12"` |
--write-overview-file |
If provided, a table file is written that summarizes the average distance and variance of the clusters for each k. Useful for elbow plots. |
--point-mass |
Treat every pquery as a point mass concentrated on the highest-weight placement. |
--ignore-multiplicities |
Set the multiplicity of each pquery to 1. |
Color | |
--color-list |
TEXT=BuPuBk List of colors to use for the palette. Can either be the name of a color list, a file containing one color per line, or an actual list of colors. |
--reverse-color-list |
If set, the `--color-list` is reversed. |
--log-scaling |
If set, the sequential color list is logarithmically scaled instead of linearily. |
Output | |
--out-dir |
TEXT=. Directory to write files to |
--file-prefix |
TEXT=ikmeans_ File prefix for output files |
--file-suffix |
TEXT File suffix for output files |
Tree Output | |
--write-newick-tree |
If set, the tree is written to a Newick file. |
--write-nexus-tree |
If set, the tree is written to a Nexus file. |
--write-phyloxml-tree |
If set, the tree is written to a Phyloxml file. |
--write-svg-tree |
If set, the tree is written to a Svg file. |
Svg Tree Output | |
--svg-tree-shape |
TEXT:{circular,rectangular}=circular Shape of the tree. |
--svg-tree-type |
TEXT:{cladogram,phylogram}=cladogram Type of the tree. |
--svg-tree-stroke-width |
FLOAT=5 Svg stroke width for the branches of the tree. |
--svg-tree-ladderize |
If set, the tree is ladderized. |
Global Options | |
--allow-file-overwriting |
Allow to overwrite existing output files instead of aborting the command. |
--verbose |
Produce more verbose output. |
--threads |
UINT Number of threads to use for calculations. |
--log-file |
TEXT Write all output to a log file, in addition to standard output to the terminal. |
Imbalance k-means has almost the same usage as Phylogenetic k-means. See there for details. The difference is in the distance measure being used, which is a simple Euclidean distance of the edge imbalances of the samples, instead of using the more involved Phylogenetic KR distance between samples.
When using this method, please do not forget to cite
Lucas Czech, Pierre Barbera, Alexandros Stamatakis. Genesis and Gappa: Processing, Analyzing and Visualizing Phylogenetic (Placement) Data. Bioinformatics, 2020. doi:10.1093/bioinformatics/btaa070
Lucas Czech, Alexandros Stamatakis. Scalable Methods for Analyzing and Visualizing Phylogenetic Placement of Metagenomic Samples. PLOS ONE, 2019. doi:10.1371/journal.pone.0217050
Module analyze
- correlation
- dispersion
- edgepca
- imbalance-kmeans
- krd
- phylogenetic-kmeans
- placement-factorization
- squash
Module edit
Module examine
Module prepare
Module simulate
Module tools