Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multitree #1442

Merged
merged 16 commits into from
Feb 14, 2022
Merged

Multitree #1442

merged 16 commits into from
Feb 14, 2022

Conversation

jameshadfield
Copy link
Member

@jameshadfield jameshadfield commented Dec 22, 2021

Description of proposed changes

This PR implements two major concepts:

  1. The ability for multiple trees to be defined via dataset.tree -> Array<Tree>
  2. The ability to cut a tree by a user-definable trait and display these as multiple (sub-trees). AKA Exploded trees, see screen grab 👇
  3. Introduces the concept of defining major / minor parents of nodes (see below).

In addition it includes the removal of a lot of technical debt within Auspice's code; as such extensive testing will be required. See commit messages for more information, however I'll use this PR description to foster discussion.

explode.mov

To-dos in this PR (Updated 2022-01-30)

(Not all of these need to be done in this PR, mind.)

  • add "beta" in the dropdown UI for the first release
  • store exploded trait in URL state
  • consider how to treat undefined (internal) traits, e.g. currently nodes (X -> undefined -> Y) would not be exploded. Update: we break on any trait change, including to/from undefined; a subtree with no tips (e.g. one internal node with an undefined trait) is no longer displayed.
  • multiple trees (i.e. two different tree datasets) are not considered here & have not been tested (much). Update: I've disabled the ability to use both features at once.
  • remove test dataset commit & some console logs
  • add connecting lines to indicate major/minor parents (see below)
  • add more space between subtrees (in rectangular, unrooted & radial layouts)
  • unrooted layout / clock / scatter layout need attention here (I haven't tested much, but they're definitely misleading at the moment)
  • consider the case of orphaned branches (i.e. those whose children have all become subtrees). Currently these are drawn without a tip, but they do have a tip label (actually the node name, e.g. "NODE_1234"). Update: these are no longer rendered.
  • branch thicknesses aren't quite right
  • Filtering may need to change & is different depending on the order it is applied (i.e. explode first vs explode last). This is due to the traversals operating on different trees (as noticed by @huddlej, see below)
  • Zoom-to-selected depends on the MRCA of selected tips, and thus is tips are selected in multiple subtrees the MRCA is __ROOT and thus we can't zoom to selected! (as noticed by @huddlej, see below)
  • branch names are sometimes dissapearing

To-dos in subsequent PRs (Updated 2022-01-30)

  • we should limit the explode options to those colourings which are defined on internal nodes / would not result in the currently displayed subtrees (currently a warning is shown)
  • genotype should be available to explode on

Testing (updated 2022-01-30)

I've tested this on a bunch of (local) datasets, although I've only created two multi-tree JSONs.

Descriptions of major/minor parents

Commit 1d5f725 introduces the concept of defining major / minor parents of nodes. This may include where in another tree this (sub-)tree originated (referred to here as “major”) or a recombination donor (“minor”). In the future, we may use solid/dashed lines to represent the connection to these parent nodes, using @mlewinsohn, @miparedes, Russell and @cassiawag's recombination project as inspiration. This commit only implements this concept for exploded trees (as we need to know how long the root stem should be). I'd like to make this user definable via a branch_attrs.parent annotation, the details of which haven't been fully worked out. The following situations are possible:

(i) Single tree, not exploded: Minor parents (via branch_attrs.parent) may be used to visualise recombination events (see above). Major parents are not applicable.
(ii) Multi tree, not exploded: Major parents (via branch_attrs.parent) can indicate the placement of the subtree (in another tree). Minor parents may also be applicable here.
(iii) Single tree, exploded: branch_attrs.parent is ignored. We set the major parent of each subtree to be the branch it originated from.
(iv) Multi tree, exploded: Same as (iii).

It'd be great to discuss this further before implementing it in a user-configurable fashion.

cc @evogytis
🙏 many thanks to @frogsquire who's work in #1105 was really helpful here

Variable names changed to better convey that these values represent node order - in rectangular layouts these are the same as y positions, but not for other layouts. Storing these on the <phyloNode> is consistent with other layout (position) variables.

We now have:
<phyloNode>.displayOrder
                        .displayOrderRange
                        .y: y position in domain. Depends on layout.
                        .py: y position of parent node.
                        .yTip: y position in pixels (i.e. the range)
                        .yBase: yTip of parent node.

Note that the untangling code is not currently used, but has been tested here by turning on `globals.attemptUntangle`.
@nextstrain-bot nextstrain-bot temporarily deployed to auspice-multitree-puhaxxe7v6np December 22, 2021 05:05 Inactive
@huddlej
Copy link
Contributor

huddlej commented Dec 22, 2021

This is so cool, @jameshadfield! I don't know if I'm using this new feature correctly yet, but I noticed a couple of odd behaviors in an H3N2 tree:

@jameshadfield jameshadfield self-assigned this Dec 22, 2021
@cassiawag
Copy link
Contributor

This is all really cool, @jameshadfield! Just a heads up that the review app does not seem to be working in any of the test links provided.

@jameshadfield jameshadfield temporarily deployed to auspice-multitree-33rarnfdefdk January 12, 2022 20:05 Inactive
@tsibley
Copy link
Member

tsibley commented Jan 26, 2022

@jameshadfield I couldn't find an example dataset file with multiple trees in the links above (I think most are currently inactive/dead?), but I'm curious if this would mean a new major version of the dataset schema, i.e. to dataset v3?

This introduces the ability to define multiple trees by allowing
<json>.tree to be an array (of trees). Internally we add an extra root
node (not displayed) whose children are the subtrees; this allows us to
reuse all of our machinery which expects to traverse a single tree.
@jameshadfield jameshadfield temporarily deployed to auspice-multitree-h51yyzgffb90 January 30, 2022 05:01 Inactive
@jameshadfield jameshadfield temporarily deployed to auspice-multitree-h51yyzgffb90 January 30, 2022 05:07 Inactive
@jameshadfield
Copy link
Member Author

Just a heads up that the review app does not seem to be working in any of the test links provided.

I couldn't find an example dataset file with multiple trees in the links above

Yeah, the heroku review app died after some amount of time (a week?) and when it gets recreated the URL changes so those links were dead. I've updated them in the above message.

but I'm curious if this would mean a new major version of the dataset schema, i.e. to dataset v3?

I don't think so, however the schema will have to be extended to allow both a single tree and an array of trees, as well as allowing some extra branch_attrs if that capability gets implemented in this PR.

@jameshadfield jameshadfield temporarily deployed to auspice-multitree-h51yyzgffb90 February 2, 2022 21:08 Inactive
We represent tree nodes by two different objects, a "PhyloNode" and a "Node", although these names are not formalised. The former holds information necessary to render the phylogenetic tree and is solely used by `PhyloTree` while the latter holds more general information. They are linked: `phyloNode = node.n` and `node = phyloNode.shell`.

This commit tackles deeply embedded technical debt where each object held a copy of the same data. Specifically,
* `children` is now only stored on a Node
* `parent` is similarly stored on a Node
* `terminal` is removed from PhyloNode (we can refer to node.hasChildren)

There is (at least) one remaining duplicated property - `inView` - which is set on <Node> when the dataset loads due to the corresponding <PhyloNode> not yet existing. This is left for future work as the concept of `inView` may change as we develop further methods for tree zoom / pan.
Introduces the ability to cut trees at various branches and display the resulting trees as subtrees. These cut points are here defined to be a change in a (user-selected) state trait, which are simply the non-continuous colourings (to start with).

Briefly, given a parent node (A) with children (B,C), and a change in trait moving from (A->B), we add B to the children of the overall tree root, and update the children of A to be only C. We store the original children (B,C) so that we can return to the starting state.

This implementation, specifically the ability for PhyloTree to update the node ordering, should allow for branch rotations in the future.

There are a number of future improvements to be made in subsequent commits:
* The subtrees should be ordered on trait (or we may wish to make this a toggle)
* the stem length of subtrees is inconsistent
* exploded trait cannot be stored in URL state
* we should limit the explode options to those colourings which are defined on internal nodes (currently a warning is shown)
* genotype should be available to explode on
* remove console logging introduced here
* consider undefined (internal) traits, e.g. currently nodes (X -> undefined -> Y) would not be exploded
* multiple trees are not considered here

Prior art: I was first made aware of this style of tree rendering by Gytis Dudas. An implementation was attempted by @frogsquire<https://github.com/frogsquire> in #1105 and this commit uses ideas introduced there.
This introduces the concept of defining additional parents of the node,
in addition to the parent used to link all nodes together into one tree.

By defining the original parent via `<Node>.parentInfo.original → <Node>`
the roots of exploded (sub-)trees keep track of their original parent.
This information is used here for branch (stem) length calculations;
future work may use visual cues such as dashed lines to link subtrees
together. https://cse512-21s.github.io/FP-Pathogen-Phylogenies/
(Lewinsohn, Paredes, Russell and Wagner) is one example of this.

This structure will allow definition of recombination events (e.g. donor
nodes), original parents of individual trees (in a multi-tree-dataset)
etc. All of this is "future work".
We default to zooming out completely whenever we explode the tree. There are nicer behaviours here, such as re-calculating the MRCA of visible nodes, but this comes at the cost of increased complexity.

Recalculating tip counts is needed for (a) branch thicknesses and (b) forthcoming work which will hide subtrees with no tips.
Our previous behavior zoomed into the parent node, which made for a nicer UI. However when the zoom node is a subtree with a single tip, zooming into the parent node means zooming into the entire tree, which is not desirable.
Previous behavior would only split trees when the traits of a parent-child link were different _and_ defined. This let to problems where three connected nodes with traits "X -> undefined -> Y" would not be split.
Tanglegrams would be useful for exploded trees but are added complexity for an already complex feature. For the time being, it's acceptable to force users to choose one or the other.
The chosen value is based off testing a few datasets and we may need to tweak this in the future.
Note that in radial view separation is expressed by increasing the
angle. If the root of a subtree is near the start (div or time) then
changing the angle doesn't convey a sense of separation. One way to
improve this would be to add a inner circle which the innermost nodes
can then be spaced around, however this wouldn't be appropriate for
non-exploded trees. Worth revisiting in the future, perhaps.
This separates out the subtrees in an unrooted subtree using a similar
conceptual approach as for radial trees. Code was somewhat cleaned up
during this work.
@jameshadfield jameshadfield requested a review from a team February 9, 2022 18:46
@jameshadfield jameshadfield marked this pull request as ready for review February 9, 2022 18:47
@jameshadfield
Copy link
Member Author

jameshadfield commented Feb 9, 2022

This is now (finally) ready for release, however I've added an "experimental" label to the dropdown as these changes push auspice in new directions and this exposes some pre-existing limitations. There may also be some bugs / papercuts as this PR involves a lot of the different parts of the code.

  • "Zoom to selected" fundamentally doesn't play nicely when viewing subtrees, due to the way we calculate the links between in-view tips. We'll revisit this concept shortly with the accordion zoom & other ways of zooming the tree.
  • Connecting lines (e.g. between the subtree root and it's parent in the unexploded tree) are for the next PR, however the tree layout knows about this link as it's used to compute the subtree stem length.
  • Augur needs a corresponding PR to allow the tree to be an array, however this is not blocking.
  • URL state is not implemented, as I wanted to wait for more testing here
  • Subtrees are separated in Radial & Unrooted views, however this separation is expressed by (i) a change in angle and (b) the distance from the subtree root to the overall tree root (i.e. divergence or time). Therefore if each subtree root has a similar origin divergence/time, then there will be little to no separation no matter what the angle is! Plenty of work can be done here (see commit messages for more).
  • Internal branches with no tips (because their children have all have been pruned) are still displayed. This isn't ideal, but is an edge case and will be tackled along with the connecting lines as they have a lot of overlap.

Testing URLs

P.S. The final commit is to be dropped before merge.

@jameshadfield jameshadfield temporarily deployed to auspice-multitree-s8i6oo6ewp4d February 9, 2022 19:03 Inactive
@jameshadfield jameshadfield temporarily deployed to auspice-multitree-s8i6oo6ewp4d February 14, 2022 04:54 Inactive
@jameshadfield jameshadfield mentioned this pull request Feb 14, 2022
7 tasks
@jameshadfield jameshadfield merged commit da64ea6 into master Feb 14, 2022
@jameshadfield jameshadfield deleted the multitree branch February 14, 2022 05:05
jameshadfield added a commit to nextstrain/augur that referenced this pull request Apr 24, 2024
Multiple trees ("subtrees") have been available in Auspice since late
2021¹ and part of the associated schema since early 2022². Despite this
there was no way to produce such datasets within Augur itself, and
despite the schema changes the associated `augur validate` command was
never updated to allow them.

This commit adds multi-tree inputs to `augur export v2` as well as
allowing them to validate with our associated validation commands.

¹ <nextstrain/auspice#1442>
² <#851>
jameshadfield added a commit to nextstrain/augur that referenced this pull request Apr 24, 2024
Multiple trees ("subtrees") have been available in Auspice since late
2021¹ and part of the associated schema since early 2022². Despite this
there was no way to produce such datasets within Augur itself, and
despite the schema changes the associated `augur validate` command was
never updated to allow them.

This commit adds multi-tree inputs to `augur export v2` as well as
allowing them to validate with our associated validation commands.

¹ <nextstrain/auspice#1442>
² <#851>
jameshadfield added a commit to nextstrain/augur that referenced this pull request May 6, 2024
Multiple trees ("subtrees") have been available in Auspice since late
2021¹ and part of the associated schema since early 2022². Despite this
there was no way to produce such datasets within Augur itself, and
despite the schema changes the associated `augur validate` command was
never updated to allow them.

This commit adds multi-tree inputs to `augur export v2` as well as
allowing them to validate with our associated validation commands.

¹ <nextstrain/auspice#1442>
² <#851>
jameshadfield added a commit to nextstrain/augur that referenced this pull request May 6, 2024
Multiple trees ("subtrees") have been available in Auspice since late
2021¹ and part of the associated schema since early 2022². Despite this
there was no way to produce such datasets within Augur itself, and
despite the schema changes the associated `augur validate` command was
never updated to allow them.

This commit adds multi-tree inputs to `augur export v2` as well as
allowing them to validate with our associated validation commands.

¹ <nextstrain/auspice#1442>
² <#851>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

5 participants