Multitree #1442

jameshadfield · 2021-12-22T05:05:19Z

Description of proposed changes

This PR implements two major concepts:

The ability for multiple trees to be defined via dataset.tree -> Array<Tree>
The ability to cut a tree by a user-definable trait and display these as multiple (sub-trees). AKA Exploded trees, see screen grab 👇
Introduces the concept of defining major / minor parents of nodes (see below).

In addition it includes the removal of a lot of technical debt within Auspice's code; as such extensive testing will be required. See commit messages for more information, however I'll use this PR description to foster discussion.

explode.mov

To-dos in this PR (Updated 2022-01-30)

(Not all of these need to be done in this PR, mind.)

To-dos in subsequent PRs (Updated 2022-01-30)

we should limit the explode options to those colourings which are defined on internal nodes / would not result in the currently displayed subtrees (currently a warning is shown)
genotype should be available to explode on

Testing (updated 2022-01-30)

I've tested this on a bunch of (local) datasets, although I've only created two multi-tree JSONs.

Use the review app -- https://auspice-multitree-h51yyzgffb90.herokuapp.com/ -- to test a variety of datasets
A multi-tree datset has been created from the seattleflu's h3n2/1y dataset. https://auspice-multitree-h51yyzgffb90.herokuapp.com//test-multiple-subtrees
A small toy dataset may be useful as well: https://auspice-multitree-h51yyzgffb90.herokuapp.com/test-simple-exploded-tree

Descriptions of major/minor parents

Commit 1d5f725 introduces the concept of defining major / minor parents of nodes. This may include where in another tree this (sub-)tree originated (referred to here as “major”) or a recombination donor (“minor”). In the future, we may use solid/dashed lines to represent the connection to these parent nodes, using @mlewinsohn, @miparedes, Russell and @cassiawag's recombination project as inspiration. This commit only implements this concept for exploded trees (as we need to know how long the root stem should be). I'd like to make this user definable via a branch_attrs.parent annotation, the details of which haven't been fully worked out. The following situations are possible:

(i) Single tree, not exploded: Minor parents (via branch_attrs.parent) may be used to visualise recombination events (see above). Major parents are not applicable.
(ii) Multi tree, not exploded: Major parents (via branch_attrs.parent) can indicate the placement of the subtree (in another tree). Minor parents may also be applicable here.
(iii) Single tree, exploded: branch_attrs.parent is ignored. We set the major parent of each subtree to be the branch it originated from.
(iv) Multi tree, exploded: Same as (iii).

It'd be great to discuss this further before implementing it in a user-configurable fashion.

cc @evogytis
🙏 many thanks to @frogsquire who's work in #1105 was really helpful here

Variable names changed to better convey that these values represent node order - in rectangular layouts these are the same as y positions, but not for other layouts. Storing these on the <phyloNode> is consistent with other layout (position) variables. We now have: <phyloNode>.displayOrder .displayOrderRange .y: y position in domain. Depends on layout. .py: y position of parent node. .yTip: y position in pixels (i.e. the range) .yBase: yTip of parent node. Note that the untangling code is not currently used, but has been tested here by turning on `globals.attemptUntangle`.

huddlej · 2021-12-22T19:59:51Z

This is so cool, @jameshadfield! I don't know if I'm using this new feature correctly yet, but I noticed a couple of odd behaviors in an H3N2 tree:

explode by region and then filter to region (e.g., Africa) still shows branches for node not assigned to that region
explode and color by clade and then filter to two adjacent clades in the tree does not support zooming to selected (maybe this is expected behavior though?)

cassiawag · 2022-01-11T20:13:50Z

This is all really cool, @jameshadfield! Just a heads up that the review app does not seem to be working in any of the test links provided.

tsibley · 2022-01-26T22:12:07Z

@jameshadfield I couldn't find an example dataset file with multiple trees in the links above (I think most are currently inactive/dead?), but I'm curious if this would mean a new major version of the dataset schema, i.e. to dataset v3?

This introduces the ability to define multiple trees by allowing <json>.tree to be an array (of trees). Internally we add an extra root node (not displayed) whose children are the subtrees; this allows us to reuse all of our machinery which expects to traverse a single tree.

jameshadfield · 2022-01-30T05:16:05Z

Just a heads up that the review app does not seem to be working in any of the test links provided.

I couldn't find an example dataset file with multiple trees in the links above

Yeah, the heroku review app died after some amount of time (a week?) and when it gets recreated the URL changes so those links were dead. I've updated them in the above message.

but I'm curious if this would mean a new major version of the dataset schema, i.e. to dataset v3?

I don't think so, however the schema will have to be extended to allow both a single tree and an array of trees, as well as allowing some extra branch_attrs if that capability gets implemented in this PR.

We represent tree nodes by two different objects, a "PhyloNode" and a "Node", although these names are not formalised. The former holds information necessary to render the phylogenetic tree and is solely used by `PhyloTree` while the latter holds more general information. They are linked: `phyloNode = node.n` and `node = phyloNode.shell`. This commit tackles deeply embedded technical debt where each object held a copy of the same data. Specifically, * `children` is now only stored on a Node * `parent` is similarly stored on a Node * `terminal` is removed from PhyloNode (we can refer to node.hasChildren) There is (at least) one remaining duplicated property - `inView` - which is set on <Node> when the dataset loads due to the corresponding <PhyloNode> not yet existing. This is left for future work as the concept of `inView` may change as we develop further methods for tree zoom / pan.

@frogsquire

Introduces the ability to cut trees at various branches and display the resulting trees as subtrees. These cut points are here defined to be a change in a (user-selected) state trait, which are simply the non-continuous colourings (to start with). Briefly, given a parent node (A) with children (B,C), and a change in trait moving from (A->B), we add B to the children of the overall tree root, and update the children of A to be only C. We store the original children (B,C) so that we can return to the starting state. This implementation, specifically the ability for PhyloTree to update the node ordering, should allow for branch rotations in the future. There are a number of future improvements to be made in subsequent commits: * The subtrees should be ordered on trait (or we may wish to make this a toggle) * the stem length of subtrees is inconsistent * exploded trait cannot be stored in URL state * we should limit the explode options to those colourings which are defined on internal nodes (currently a warning is shown) * genotype should be available to explode on * remove console logging introduced here * consider undefined (internal) traits, e.g. currently nodes (X -> undefined -> Y) would not be exploded * multiple trees are not considered here Prior art: I was first made aware of this style of tree rendering by Gytis Dudas. An implementation was attempted by @frogsquire<https://github.com/frogsquire> in #1105 and this commit uses ideas introduced there.

This introduces the concept of defining additional parents of the node, in addition to the parent used to link all nodes together into one tree. By defining the original parent via `<Node>.parentInfo.original → <Node>` the roots of exploded (sub-)trees keep track of their original parent. This information is used here for branch (stem) length calculations; future work may use visual cues such as dashed lines to link subtrees together. https://cse512-21s.github.io/FP-Pathogen-Phylogenies/ (Lewinsohn, Paredes, Russell and Wagner) is one example of this. This structure will allow definition of recombination events (e.g. donor nodes), original parents of individual trees (in a multi-tree-dataset) etc. All of this is "future work".

We default to zooming out completely whenever we explode the tree. There are nicer behaviours here, such as re-calculating the MRCA of visible nodes, but this comes at the cost of increased complexity. Recalculating tip counts is needed for (a) branch thicknesses and (b) forthcoming work which will hide subtrees with no tips.

Our previous behavior zoomed into the parent node, which made for a nicer UI. However when the zoom node is a subtree with a single tip, zooming into the parent node means zooming into the entire tree, which is not desirable.

Previous behavior would only split trees when the traits of a parent-child link were different _and_ defined. This let to problems where three connected nodes with traits "X -> undefined -> Y" would not be split.

Tanglegrams would be useful for exploded trees but are added complexity for an already complex feature. For the time being, it's acceptable to force users to choose one or the other.

The chosen value is based off testing a few datasets and we may need to tweak this in the future.

Note that in radial view separation is expressed by increasing the angle. If the root of a subtree is near the start (div or time) then changing the angle doesn't convey a sense of separation. One way to improve this would be to add a inner circle which the innermost nodes can then be spaced around, however this wouldn't be appropriate for non-exploded trees. Worth revisiting in the future, perhaps.

This separates out the subtrees in an unrooted subtree using a similar conceptual approach as for radial trees. Code was somewhat cleaned up during this work.

jameshadfield · 2022-02-09T19:01:46Z

This is now (finally) ready for release, however I've added an "experimental" label to the dropdown as these changes push auspice in new directions and this exposes some pre-existing limitations. There may also be some bugs / papercuts as this PR involves a lot of the different parts of the code.

"Zoom to selected" fundamentally doesn't play nicely when viewing subtrees, due to the way we calculate the links between in-view tips. We'll revisit this concept shortly with the accordion zoom & other ways of zooming the tree.
Connecting lines (e.g. between the subtree root and it's parent in the unexploded tree) are for the next PR, however the tree layout knows about this link as it's used to compute the subtree stem length.
Augur needs a corresponding PR to allow the tree to be an array, however this is not blocking.
URL state is not implemented, as I wanted to wait for more testing here
Subtrees are separated in Radial & Unrooted views, however this separation is expressed by (i) a change in angle and (b) the distance from the subtree root to the overall tree root (i.e. divergence or time). Therefore if each subtree root has a similar origin divergence/time, then there will be little to no separation no matter what the angle is! Plenty of work can be done here (see commit messages for more).
Internal branches with no tips (because their children have all have been pruned) are still displayed. This isn't ideal, but is an edge case and will be tackled along with the connecting lines as they have a lot of overlap.

Testing URLs

4-tip tree with a few colourings which is useful to test different behavior is at https://auspice-multitree-s8i6oo6ewp4d.herokuapp.com/test-simple-exploded-tree
A multi-tree dataset (i.e. an array of trees in the dataset JSON) has been created from the seattleflu's h3n2/1y dataset. https://auspice-multitree-s8i6oo6ewp4d.herokuapp.com/test-multiple-subtrees
Or just test a working version of nextstrain.org ;) https://nextstrain-s-auspice-pr-tm15im.herokuapp.com/

P.S. The final commit is to be dropped before merge.

Multiple trees ("subtrees") have been available in Auspice since late 2021¹ and part of the associated schema since early 2022². Despite this there was no way to produce such datasets within Augur itself, and despite the schema changes the associated `augur validate` command was never updated to allow them. This commit adds multi-tree inputs to `augur export v2` as well as allowing them to validate with our associated validation commands. ¹ <nextstrain/auspice#1442> ² <#851>

nextstrain-bot temporarily deployed to auspice-multitree-puhaxxe7v6np December 22, 2021 05:05 Inactive

nextstrain-bot mentioned this pull request Dec 22, 2021

[bot] [DO NOT MERGE] Test auspice PR 1442 nextstrain/nextstrain.org#447

Closed

jameshadfield self-assigned this Dec 22, 2021

jameshadfield temporarily deployed to auspice-multitree-33rarnfdefdk January 12, 2022 20:05 Inactive

jameshadfield force-pushed the multitree branch from dc399ed to bd2e4a6 Compare January 30, 2022 04:59

jameshadfield temporarily deployed to auspice-multitree-h51yyzgffb90 January 30, 2022 05:01 Inactive

jameshadfield force-pushed the multitree branch from bd2e4a6 to efcf523 Compare January 30, 2022 05:06

jameshadfield temporarily deployed to auspice-multitree-h51yyzgffb90 January 30, 2022 05:07 Inactive

jameshadfield temporarily deployed to auspice-multitree-h51yyzgffb90 February 2, 2022 21:08 Inactive

jameshadfield force-pushed the multitree branch from 07e8c42 to 0a868bd Compare February 9, 2022 07:30

jameshadfield added 14 commits February 10, 2022 07:38

Order subtrees to mimic the legend

3577371

[explode] allow zoom to subtree w. single tip

944c013

Our previous behavior zoomed into the parent node, which made for a nicer UI. However when the zoom node is a subtree with a single tip, zooming into the parent node means zooming into the entire tree, which is not desirable.

[explode] Separate out undefined subtrees

ab69fa2

Previous behavior would only split trees when the traits of a parent-child link were different _and_ defined. This let to problems where three connected nodes with traits "X -> undefined -> Y" would not be split.

[explode] don't allow tanglegrams

b8cc27c

Tanglegrams would be useful for exploded trees but are added complexity for an already complex feature. For the time being, it's acceptable to force users to choose one or the other.

[explode] add padding between subtrees

937c68c

The chosen value is based off testing a few datasets and we may need to tweak this in the future.

[explode] don't display subtrees with zero tips

093b823

remove unused code

3a1c8f1

[explode] Separate unrooted subtrees

32593b4

This separates out the subtrees in an unrooted subtree using a similar conceptual approach as for radial trees. Code was somewhat cleaned up during this work.

[explode] add expermimental label

06dfb6b

jameshadfield force-pushed the multitree branch from 0a868bd to eb9eceb Compare February 9, 2022 18:41

jameshadfield requested a review from a team February 9, 2022 18:46

jameshadfield marked this pull request as ready for review February 9, 2022 18:47

jameshadfield temporarily deployed to auspice-multitree-s8i6oo6ewp4d February 9, 2022 19:03 Inactive

jameshadfield force-pushed the multitree branch from eb9eceb to 06dfb6b Compare February 14, 2022 04:54

jameshadfield temporarily deployed to auspice-multitree-s8i6oo6ewp4d February 14, 2022 04:54 Inactive

jameshadfield mentioned this pull request Feb 14, 2022

exploded-tree improvements #1460

Open

7 tasks

jameshadfield merged commit da64ea6 into master Feb 14, 2022

jameshadfield deleted the multitree branch February 14, 2022 05:05

jameshadfield mentioned this pull request Feb 14, 2022

Update dataset schema nextstrain/augur#848

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multitree #1442

Multitree #1442

jameshadfield commented Dec 22, 2021 •

edited

Loading

huddlej commented Dec 22, 2021

cassiawag commented Jan 11, 2022

tsibley commented Jan 26, 2022

jameshadfield commented Jan 30, 2022

jameshadfield commented Feb 9, 2022 •

edited

Loading

Multitree #1442

Multitree #1442

Conversation

jameshadfield commented Dec 22, 2021 • edited Loading

Description of proposed changes

To-dos in this PR (Updated 2022-01-30)

To-dos in subsequent PRs (Updated 2022-01-30)

Testing (updated 2022-01-30)

Descriptions of major/minor parents

huddlej commented Dec 22, 2021

cassiawag commented Jan 11, 2022

tsibley commented Jan 26, 2022

jameshadfield commented Jan 30, 2022

jameshadfield commented Feb 9, 2022 • edited Loading

Testing URLs

jameshadfield commented Dec 22, 2021 •

edited

Loading

jameshadfield commented Feb 9, 2022 •

edited

Loading