-
Notifications
You must be signed in to change notification settings - Fork 7
Subcommand: graft
Make a tree with each of the query sequences represented as a pendant edge.
Usage: gappa examine graft [options]
Input | |
---|---|
--jplace-path |
Required. TEXT:PATH(existing)=[] ... List of jplace files or directories to process. For directories, only files with the extension .jplace[.gz] are processed. |
Settings | |
--fully-resolve |
FLAG If set, branches that contain multiple pqueries are resolved by creating a new branch for each of the pqueries individually, placed according to their distal/proximal lengths. If not set (default), all pqueries at one branch are collected in a subtree that branches off from the branch. |
--name-prefix |
TEXT Specify a prefix to be added to all new leaf nodes, i.e., to the query sequence names. |
Output | |
--out-dir |
TEXT=. Directory to write output files to. |
--file-prefix |
TEXT File prefix for output files. Most gappa commands use the command name as the base name for file output. This option amends the base name, to distinguish runs with different data. |
--file-suffix |
TEXT File suffix for output files. Most gappa commands use the command name as the base name for file output. This option amends the base name, to distinguish runs with different data. |
Newick Tree Output | |
--newick-tree-quote-invalid-chars |
FLAG If set, node labels that contain characters that are invalid in the Newick format (i.e., spaces and :;()[],{} ) are put into quotation marks. If not set (default), these characters are instead replaced by underscores, which changes the names, but works better with most downstream tools. |
Global Options | |
--allow-file-overwriting |
FLAG Allow to overwrite existing output files instead of aborting the command. |
--verbose |
FLAG Produce more verbose output. |
--threads |
UINT Number of threads to use for calculations. |
--log-file |
TEXT Write all output to a log file, in addition to standard output to the terminal. |
The command takes the reference tree of the provided placefile(s), and for each pquery, it attaches a new leaf node to the tree, positioned according to its proximal length and pendant length of the most likely placement. The resulting tree is useful to get an overview of the distribution of placements. It is mainly intended to view a few placements. For large samples, it might be a bit cluttered.
Similar trees are produced by RAxML-EPA, where the file is called RAxML_labelledTree
, and by the guppy tog
command. Both programs differ in the exact way the the placements are added as edges. To control this behaviour, use the --fully-resolve
parameter.
The provided jplace
files are processed individually, producing a newick
tree for each of them.
They are named like the input files, but replace the file extension by .newick
.
Important remark: Note that the grafting simply attaches the pqueries to the tree at their most likely placement position. The phylogeny of the pquries itself however is not resolved at all.
If --fully-resolve
is not provided (default), all placements at one edge are collected as children of one central base edge:
This method is similar to the way RAxML-EPA produces a grafted tree, which is there called "labelled tree".
The base edge is positioned on the original edge at the average proximal_length
of the placements. The base edge has a multifurcation if there are more than two placements on the edge.
The pendant length of the placements is used to calculate the branch length of the new placement edges. This calculation subtracts the shortest pendant length of the placements on the edge, so that the base edge is maximally "moved" towards the placement edges. This also implies that at least one of the placement edges has branch length == 0.0. Furthermore, the placements are sorted by their pendant length.
Using this method, the new nodes of the resulting tree are easier to distinguish and collapse, as all placements are collected as children branching off from the base edge. However, this comes at the cost of losing the detailled information of the proximal length of the placements. If you want to keep this information, use --fully-resolve
instead.
If --fully-resolve
is provided, all placements per branch are turned into individual single leaf nodes:
This method is similar to the way guppy tog
produces a grafted tree.
The original edge is split into separate parts where each placement edge is attached. The branch lengths between those parts are calculated using the proximal length of the placements, while the branch lengths of the placement edges use their pendant length.
Using this method gives the most detailled information, but results in a more crowded tree. The new placement edges are "sorted" along the original edge by their proximal length. For this reason in the example image above, "Query 2" is closer to "Node A" then "Query 1": it has a higher proximal length. This information was lost in the multifurcating tree shown before (without --fully-resolve
).
For edges that contain only a single placement (or none at all), both versions (with and without --fully-resolve
) behave the same. In this case, the placement is simply attached using its proximal length and pendant length.
Pqueries with multiple names are treated as if each name is a separate placement, i.e., for each of them, a new (identical) edge is added to the Tree. If using --fully-resolve
, this results in a branch length of 0.0 between the nodes of those placements.
Specify a prefix to be added to all new leaf nodes (the ones that represent placements). This is useful if a pquery name also occurs as a name in the original tree. By default, empty. In order to get the same naming as grafted trees as produced by RAxML, use --name-prefix "QUERY___"
.
When using this method, please do not forget to cite
Lucas Czech, Pierre Barbera, Alexandros Stamatakis. Genesis and Gappa: Processing, Analyzing and Visualizing Phylogenetic (Placement) Data. Bioinformatics, 2020. doi:10.1093/bioinformatics/btaa070
Module analyze
- correlation
- dispersion
- edgepca
- imbalance-kmeans
- krd
- phylogenetic-kmeans
- placement-factorization
- squash
Module edit
Module examine
Module prepare
Module simulate
Module tools