Skip to content

Subcommand: krd

Lucas Czech edited this page Jan 4, 2022 · 11 revisions

Calculate the pairwise Kantorovich-Rubinstein (KR) distance matrix between samples.

Usage: gappa analyze krd [options]

Options

Input
--jplace-path Required. TEXT:PATH(existing)=[] ...
List of jplace files or directories to process. For directories, only files with the extension .jplace[.gz] are processed.
Settings
--exponent FLOAT=1
Exponent for KR integration.
--normalize FLAG
Divide the KR distance by the tree length to get normalized values.
--point-mass FLAG
Treat every pquery as a point mass concentrated on the highest-weight placement. In other words, ignore all but the most likely placement location (the one with the highest LWR), and set its LWR to 1.0.
--ignore-multiplicities FLAG
Set the multiplicity of each pquery to 1.0. The multiplicity is the equvalent of abundances for placements, and hence ignored with this flag.
Matrix Output
--out-dir TEXT=.
Directory to write output files to.
--file-prefix TEXT
File prefix for output files. Most gappa commands use the command name as the base name for file output. This option amends the base name, to distinguish runs with different data.
--file-suffix TEXT
File suffix for output files. Most gappa commands use the command name as the base name for file output. This option amends the base name, to distinguish runs with different data.
--compress FLAG
If set, compress the output files using gzip. Output file extensions are automatically extended by .gz.
--matrix-format TEXT:{list,matrix,triangular}=matrix
Format of the output matrix file.
--omit-matrix-labels FLAG
If set, the output matrix is written without column and row labels.
Global Options
--allow-file-overwriting FLAG
Allow to overwrite existing output files instead of aborting the command.
--verbose FLAG
Produce more verbose output.
--threads UINT
Number of threads to use for calculations.
--log-file TEXT
Write all output to a log file, in addition to standard output to the terminal.

Description

Calculates the Kantorovich-Rubinstein distance between a collection of jplace samples. The command is a re-implementation of guppy kr, see there for more details.

Details

The command reads in the jplace samples and calculates their pairwise KR distances. The result is printed to a symmetrical matrix by default, but can also be printed as a list or an upper triangular matrix.

Citation

When using this method, please do not forget to cite

Lucas Czech, Pierre Barbera, Alexandros Stamatakis. Genesis and Gappa: Processing, Analyzing and Visualizing Phylogenetic (Placement) Data. Bioinformatics, 2020. doi:10.1093/bioinformatics/btaa070

Steven Evans, Frederick Matsen. The phylogenetic Kantorovich-Rubinstein metric for environmental sequence samples. Journal of the Royal Statistical Society, 2012. doi:10.1111/j.1467-9868.2011.01018.x

Clone this wiki locally