-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scatterplots #1310
Scatterplots #1310
Conversation
This adds a new layout for scatterplots and allows users to choose the x and y variables from the available colorings. The defaults are the tree metric (x-axis) and the current color-by (y-axis). The layout algorithm is largely unchanged from the root-to-tip layout. This presupposes that node trait values will be numeric, and thus map nicely to an axis. Future work will allow scales to map non-numeric values (e.g. categorical, ordinal, boolean scales) to a d3 domain for rendering. Currently these traits get assigned `0` as their x and/or y values. Similarly, the algorithm presupposes that all nodes (internal and terminal) have values and should be rendered. There will be many cases where nodes (especially internal nodes) do not have traits assigned. In these cases we should hide them from view, and remove any connecting branches. Future work needed: * More testing is needed for rare use cases, e.g. trees without divergence, datasets with no colorings. * Dataset JSONs and URL queries should be able to select the scatterplot variables. This commit is based off previous work by trvrb. Co-authored-by: Trevor Bedford <[email protected]>
5633ec8
to
1bbfca2
Compare
1bbfca2
to
daae5be
Compare
Very excited about this work. Thank you for putting this together James. Notes from review:
I'd recommend addressing 4, 5 and 8 before merging. |
I've updated seasonal flu with nextstrain/seasonal-flu@46f72c3, pushed new JSONs and redeployed the review app. Epitope mutations are now available for scatterplot for H3N2 and H1N1pdm. Fixes 7 above. |
This is so cool, @jameshadfield! This is a killer feature and I can see myself using this all of the time. For instance, this view of antigenic advance (tree model) by date helps me immediately see which clades are more antigenically advanced, the range of the advance values, and how much variation there is. I like the new scatter UI as a new layout option with dropdowns for x and y and the toggle buttons. I also like how the scatterplot view only plots points with assigned values. For example, in H3N2 HA 2y trees, we only calculate fitness for the most recent strains and all other strains do not get a To plot fitness on the y-axis, I get the following view showing only those tips with fitness values: Initially, I was surprised that the branches and all other tips disappeared, but after toggling between these views, the reason became clear. Given the fitness by date view above, I wish I knew how many samples are being shown in the display. Maybe this is an edge case, but by zooming to only points with assigned fitness values, I’ve effectively “filtered” by view without applying an actual filter. I don't know the best way to address this though. I tested the scatterplot layout with a local auspice installation and with Sravani’s H3N2 HA embeddings (PCA, MDS, etc.). The following plot shows t-SNE x and y coordinates from HA sequence inputs. This is exactly what I was hoping to be able to do with this scatterplot interface! This view does highlight the need for x- and y-axis labels (I can imagine making screenshots like this all of the time, or embedding these views in a narrative where the scatterplot controls are not visible). I also find myself wanting to “zoom out” so I can still see the legend but it isn’t obscuring the data points in the top-left. On a related note, I also was surprised by the view when I filtered to show just two clades as below. The x- and y- view didn’t rescale like I expected and it still shows clade labels for data that aren’t being shown. I guess both of these issues are related to the view “acting as is” the branches are still visible? I can turn off clade labels, but it would be cool if I could somehow zoom or pan to center on the current data. I also get that could require a major amount of new code, so it’s not a blocking issue for this PR. In terms of issues that would be important for this release, I second Trevor's issues 5 (loading scatterplot view from URL parameters) and 8 (axis titles). |
I just found one more issue where switching from the scatterplot back to the rectangular tree view defaults to the divergence view instead of time tree view. Steps to recreate:
|
This fixes some issues highlighted by the previous commit to improve rendering of scatterplots. We now limit scatterplot x,y variable choices to continuous-scaled colorings, and leave the display of other scale types to future work as this requires PhyloTree to switch to a new d3 scale. As not all nodes may have traits assigned (contrary to other tree layouts), we detect and hide those nodes from view, as well as any joining branches. We also expose the ability to toggle branches on/off. We also improve the starting variable choices for x & y.
daae5be
to
59c5e08
Compare
Thanks so much for these revisions @jameshadfield. I can confirm that updates to this branch fully address my issues 4, 5 and 8. This is good to be merged from my perspective. |
As the clock view is simply a specific type of scatterplot layout, this commit unifies the code and display between these two "separate" layouts. We preserve the clock button in the sidebar as this is a common action which we want to surface. **Show branch toggles** Are now rendered for both views. The layout of scatterplots does not consider internal nodes for calculating the domain if branches are not shown. Similarly, branch labels are not displayed if branches are not. **Regression Lines** These are now available for both layouts, and are toggled via a UI element similar to branches. Previously, the regression would be shown for clock layouts _if_ the branch metric was time, however the explicit UI element introduced here is better. For scatterplot views we calculate the regression with a free intercept, as the root node may not have co-ordinates defined (depending on chosen x,y variables), and additionally report the R^2. The display of the regression text can be improved in future commits. **Persist chosen state** To improve the UX, once a scatterplot has been viewed, we persist the x,y variables for future viewing. Similarly, the toggle state persists between clock & scatter layouts.
See added documentation for available queries
59c5e08
to
e3a96c4
Compare
This commit updates the logic for deciding the gridlines for both x and y axes for scatterplots. Previously we had a very limited range of cases to consider here. We now have two general functions available for creating grids - one for temporal scales and one for all other numeric scales (previously used only for divergence). We will need to add a third function when we expand scatterplots to plot non-continuous variables.
Thanks for the great reviews @trvrb & @huddlej -- as per Trevor's last message the blocking issues have been resolved and so am going to merge, however I'll make notes of the changes here for posterity.
I've created #1316 which sketches out a path to implementing these.
👍 Done
👍 Fixed. I believe all state is now being restored appropriately, but there may be some rare edge cases I haven't run into.
I've improved how we calculated domains so that the zooming looks much better here. More generally, zooming in auspice doesn't map straightforwardly onto scatterplots (see #1317 for more).
👍 Good reminder. Done.
It's a bit confusing that we keep the "show branches" toggle in these situations, but unfortunately removing it isn't trivial (we only realise branches are never rendered inside PhyloTree, and there's no easy way to update the rest of the UI from there).
Agreed! I've created #1318 to fix this.
I didn't know about these - they look great!
Fixed using the same approach as #1302
Yes -- this (unfortunately) isn't an easy fix. I've written more about this in #1317.
Fixed 👍 In addition to fixes to the points raised above, I improved the logic behind axes grids, so that (e.g.) scatterplots with time on the y-axis look as they should. |
This adds a new layout for scatterplots and allows users to choose the x and y variables from the available colorings. The defaults are the tree metric (x-axis) and the current color-by (y-axis). Choices are limited to continuous colorings (see below). Nodes without information are hidden from view, and branches can be toggled on/off.
As clock views are really just an instance of a scatterplot, the UI between the two is similar. Specifically, we now show toggles for both regression lines and display of branches for each layout. Regression lines for clock views are unchanged, however a new implementation is present for scatterplots which does not necessitate the regression passes through the root. Coefficients and R^2 are reported, although text formatting isn't perfect.
The relevant parameters are stored in URL queries to allow URLs to be shared (see documentation added here for details).
There are a number of future feature improvements, which I think are best as self-isolated issues:
scaleLinear
andscaleOrdinal
, which requires certain elements to re-render. Secondly, information about the variable (categories, ordering of categories etc) must be calculated and passed in to PhyloTree to calculate the appropriate coordinates. Medium priority.displayDefaults
extended to allow specifying of scatterplot variables. Low priority.