Show root-to-tip mutations in tip-clicked info box #1280

jameshadfield · 2021-01-28T21:06:12Z

Initial version for testing and feedback, especially on datasets with lots of mutations!

trvrb · 2021-01-29T05:06:37Z

I've cleaned up the layout a bit in this commit:

trvrb · 2021-01-29T05:23:09Z

@jameshadfield: There is one complication here that I think I know what to do, but want to flag because I suspect you're equipped to handle this quicker than I am. If you take a look at https://auspice-mutations-to-ti-bahczp.herokuapp.com/flu/seasonal/h3n2/ha/2y you'll see:

with R142G followed by G142R and K171N followed by N171K. You can see this is a real reversion on the tree here: https://auspice-mutations-to-ti-bahczp.herokuapp.com/flu/seasonal/h3n2/ha/2y?c=gt-HA1_171.

I think for these we need either to pick apart the mutations in the branch attrs to determine that site 171 is really N171N and so can be dropped or to do some actual comparison with the root node.

In current collectMutations there will also be an issue for sites of the form X171Y, followed by Y171Z.

And in general, we could call these "Mutations relative to root" to make what we're showing more clear. Also, this is enough of an edge case that I'd be okay with merging the PR and filing an issue to resolve at a later point.

emmahodcroft · 2021-01-29T11:26:22Z

I really like this James - I can already see these lists being helpful!

I think Trevor's points about how to handle reversions are good ones - and agree it's probably least confusing to show 'difference from root' (not show the mutation and reversion). Especially if someone's scanning quickly for one mutation, they might not notice the reversion!

joverlee521 · 2021-01-30T01:21:34Z

src/components/tree/infoPanels/click.js

+    Object.entries(mutations)
+      .sort(geneSortFn)
+      .map(([gene, muts], index) => (
+        item(index === 0 ? "Mutations from root" : "", gene + ": " + [...muts].sort(mutSortFn).join(", ").substring(0, 200))


With this substring, the list of mutations gets randomly cut off when it's too long.

If we want to limit the number mutations displayed, it might be better to slice the array first:

const mutLimit = 30; [...muts].sort(mutSortFn).slice(0, mutLimit).join(", ") + (muts.length > mutLimit && "...")

That's definitely nicer.

If we are cutting off the display of mutations, we should also make this explicit in the box - so people don't think it's a complete list erroneously. (Apologies if this is integrated but doesn't show up in Jover's example above!)

@trvrb given that #1259 asks to "display [the] full list of AA changes" why would we truncate this list? I changed back to displaying all AA muts in 4fee777 as I though this would be the expected behaviour - am I missing something?

Try this out in dengue. It ends up being pages long. Good to truncate in this case.

Yup (I tested by showing nucleotide mutations which are even longer than dengue) and think we have three options:

Truncate mutations

Show all muts. Info panel scrolls but is functional and doesn't look bad.

Some future UI where you have a little "show all" button which expands mutations per gene.

Whilst (3) may be the best, I think (2) is preferable to (1).

Got it. This makes sense. Thanks James. I just spent some more time with this PR and I think that the scrolly overlay panel is a fine solution. Having "mutations from root" at the end of the list of fields is helpful here. I'd agree that (2) is preferable to (1) for the moment.

The function `collectMutations` now doesn't report reversions (where the tip state = the ancestral state) and combines multiple mutations (e.g. A->B->C is now A->C rather than two separate mutations).

trvrb · 2021-03-10T00:14:35Z

Thanks for making these improvements @jameshadfield. I've just made a couple styling changes. Before commit 0a6fb81, we had:

where "Mutations from root" on the left is not vertically lined up with the mutation list on the right. Also, the double-line "Mutations from root" was making the first <tr> wider than subsequent <tr>s.

This commit moves to a single <tr> for the entire mutation list and makes infoPanelStyles.item have verticalAlign: "top" and results in the following behavior:

I prefer the verticalAlign: "top" for other entries in the info panel as well.

This PR is now good to go from my perspective.

A bug was introduced in PR #1280 where datasets which did not define `metadata.display_defaults` would crash, as the code assumed its existence. This property is optional in the dataset JSON. This commit ensures `display_defaults` exists in redux state after a dataset is loaded, thus allowing code to rely on its presence. This was preferred to checking for `display_defaults` in (all) the relevant sections of code, now and in future. (Using TypeScript, or expanding our smoke tests, would be approaches to avoiding these kinds of bugs in future.)

…:master (#7) * Enforce unnormalized frequencies when data is lacking This commit forces controls.normalizeFrequencies to be false if there are any pivots where the total frequency is less than 0.1%. This addresses two issues: 1. In the existing code, attempting to normalize situations where pivots have 0% total frequency results in bad looking "all equal" bands. This commit removes the capacity to get into this bad looking app state. 2. When filtering to a particular clade, we often want to switch to unnormalized frequencies anyway (as opposed to filtering to geography). This commit accomplishes this automatically because filtering to an emerging clade will generally result in pivots with <0.1% frequency. * Show root-to-tip mutations in tip-clicked info box * Clean up mutation display in tip-clicked info box * Cap string length in tip-clicked info box * Added Polish language to locales * Added Polish language to sidebar options * fixed typos in sidebar.json * controls/filter: Avoid spread syntax with potentially large arrays Push each value individually instead of all at once, which results in more method calls but a much smaller call stack size. Alternatively, Array.concat could be used, but this follows the pattern of surrounding code and avoids reassignment of "options", which would also necessitate removal of the "const" declaration. The /tb/global build has ~277k genotype states, which resulted in a call to Array.push with as many arguments when the spread syntax was used. This blew through the call stack size limit on Chrome with an error like: RangeError: Maximum call stack size exceeded at FilterData.eval [as makeOptions] (filter.js?6bcb:65) at FilterData.render (filter.js?6bcb:100) … Firefox was unaffected, so presumably has a larger limit. Debugging was waylaid for a bit by the assumption that exceeding the call stack size necessarily meant deep recursion, but the lack of a deep stack trace led to the realization that it can also occur when a function's arguments are too many. Resolves nextstrain#1292. * Correctly format BCE dates BCE dates were correctly interpreted but incorrectly rendered due to a bug in the final string-prettying step. This would result in a tree with the correct layout and positioning, but incorrect labels ("-undefined"). This commit remedies this and adds a test. * Increase bundlesize limits This simply increases the limits to allow CI tests to pass. Large bundlesizes are a long-term concern, but given the recent bugs regarding bundling and other priorities this change essentially pushes improvements here to "sometime in the future" * Correctly handle reversions & multiple mutations The function `collectMutations` now doesn't report reversions (where the tip state = the ancestral state) and combines multiple mutations (e.g. A->B->C is now A->C rather than two separate mutations). * Ensure unique component keys for rendering * Report all root-to-selected-tip mutations * Indicate if frequencies can be normalized in sidebar Previous implementation continued to display the toggle icon but disabled its functionality (by forcing `normalizeFrequencies` to be `false`). Here we replace the toggle with a "not available" message & update the info-popup text. * Update normalizeFrequencies flag via redux actions Upon initial parsing of the frequencies JSON as well as frequency data updates we may set redux→controls→normalizeFrequencies→false. This commit modifies the LOAD_FREQUENCIES and FREQUENCY_MATRIX actions to pass this information to the reducer, rather than updating the redux state directly from within the actions. I couldn't find any bugs caused by the previous implementation, but this change is more in line with suggested behaviour and should help future-proof work here. * Update frequencies panel when data changes The frequencies component performed some basic comparisons between the previously rendered data and new data to avoid unnecessary re-renders. This logic was too simplistic and caused a bug where the component wouldn't re-render the graph when the data had indeed changed. This commit skips these checks. This may result in some unnecessary re-renders, however I couldn't find any in my testing. Closes nextstrain#1224 * [frequencies panel] Don't round frequencies below 1% When a frequency value is below 1% we now display "<1%" rather than rounding to the nearest integer which could lead to confusing output of "frequency: 0%". Closes nextstrain#1279 * Legend no longer obscures branches/tips. Shifts the top of the tree down slightly so that tips and branches cannot be hidden behind the (closed) legend, which prevents interacting with them. This only happens in rectangular / unrooted trees, as radial / clock views almost never have tips rendered in the top-left corner. * Styling adjustments for mutation list * Allow JSONs to define language This is a squashed & rebased version of PR nextstrain#1221, which itself superseded PR nextstrain#1218. Closes nextstrain#1049. Co-authored-by: Charlie Jones <[email protected]> Co-authored-by: eharkins <[email protected]> * Fix spelling typo * changelog * version bump to 2.24.0 for release * Ensure metadata.display_defaults exists A bug was introduced in PR nextstrain#1280 where datasets which did not define `metadata.display_defaults` would crash, as the code assumed its existence. This property is optional in the dataset JSON. This commit ensures `display_defaults` exists in redux state after a dataset is loaded, thus allowing code to rely on its presence. This was preferred to checking for `display_defaults` in (all) the relevant sections of code, now and in future. (Using TypeScript, or expanding our smoke tests, would be approaches to avoiding these kinds of bugs in future.) * version bump to 2.24.1 for release * Treat accessions and urls as special node traits The schema defines these as "special" property, and we use them to render the value and link to be rendered via `<AccessionAndUrl>` within the tip-clicked panel. These should not be available as valid traits for general display. * Allow node_traits to define their own URLs This extends our interpretation of dataset-supplied traits to allow them to define a URL as well as a value. If a url is specified, then the value (in the tip-clicked panel) is rendered as a link. Closes nextstrain#1307 * Improve validation of URLs & add tests This improves our validation of URLs which should improve app stability. * Generic scatterplot layouts This adds a new layout for scatterplots and allows users to choose the x and y variables from the available colorings. The defaults are the tree metric (x-axis) and the current color-by (y-axis). The layout algorithm is largely unchanged from the root-to-tip layout. This presupposes that node trait values will be numeric, and thus map nicely to an axis. Future work will allow scales to map non-numeric values (e.g. categorical, ordinal, boolean scales) to a d3 domain for rendering. Currently these traits get assigned `0` as their x and/or y values. Similarly, the algorithm presupposes that all nodes (internal and terminal) have values and should be rendered. There will be many cases where nodes (especially internal nodes) do not have traits assigned. In these cases we should hide them from view, and remove any connecting branches. Future work needed: * More testing is needed for rare use cases, e.g. trees without divergence, datasets with no colorings. * Dataset JSONs and URL queries should be able to select the scatterplot variables. This commit is based off previous work by trvrb. Co-authored-by: Trevor Bedford <[email protected]> * Add support for "data_provenance" metadata In the early stages of COVID-19, we added support for acknowledging GISAID as the source of data in the Byline. This was inferred based on domain / dataset name heuristics. We now support data provenance to be defined in the dataset JSON (see nextstrain/augur#705) and all core nCoV builds have been updated to include this here. This commit parses and renders such information. Note that a previous commit removed the "Build info" from the byline for datasets displaying GISAID (see [0]), which I believe was an oversight. This commit reinstates it. [0] nextstrain@18d5d21 * Scatterplots improved for continuous variables This fixes some issues highlighted by the previous commit to improve rendering of scatterplots. We now limit scatterplot x,y variable choices to continuous-scaled colorings, and leave the display of other scale types to future work as this requires PhyloTree to switch to a new d3 scale. As not all nodes may have traits assigned (contrary to other tree layouts), we detect and hide those nodes from view, as well as any joining branches. We also expose the ability to toggle branches on/off. We also improve the starting variable choices for x & y. * Link out to gisaid.org This adds a link to gisaid.org from the GISAID logo (when present). It also adjusts parsing so that data_provenance.name == "gisaid" will still get picked up. * Unify clock and scatterplot layouts As the clock view is simply a specific type of scatterplot layout, this commit unifies the code and display between these two "separate" layouts. We preserve the clock button in the sidebar as this is a common action which we want to surface. **Show branch toggles** Are now rendered for both views. The layout of scatterplots does not consider internal nodes for calculating the domain if branches are not shown. Similarly, branch labels are not displayed if branches are not. **Regression Lines** These are now available for both layouts, and are toggled via a UI element similar to branches. Previously, the regression would be shown for clock layouts _if_ the branch metric was time, however the explicit UI element introduced here is better. For scatterplot views we calculate the regression with a free intercept, as the root node may not have co-ordinates defined (depending on chosen x,y variables), and additionally report the R^2. The display of the regression text can be improved in future commits. **Persist chosen state** To improve the UX, once a scatterplot has been viewed, we persist the x,y variables for future viewing. Similarly, the toggle state persists between clock & scatter layouts. * Store scatterplot state in URL query See added documentation for available queries * Render appropriate scatterplot axes grids This commit updates the logic for deciding the gridlines for both x and y axes for scatterplots. Previously we had a very limited range of cases to consider here. We now have two general functions available for creating grids - one for temporal scales and one for all other numeric scales (previously used only for divergence). We will need to add a third function when we expand scatterplots to plot non-continuous variables. * [bugfix] Initialize `filtersInFooter` for all datasets This fixes a bug noticed in auspice.us [1] where certain datasets would not have the `controls.filtersInFooter` state set, causing a crash when metadata was dropped on. Type / prop checking would have alerted us to this. [1] nextstrain#1304 * [phylotree] fix case where x-axis grid wasn't present A missed conditional resulted in certain configurations of the scatterplot "missing" the x-axis grid. (The grid was incorrectly being calculated for a temporal scale, which resulted in no meaningful grid lines.) Closes nextstrain#1323 * [phylotree] fix edge cases surrounding display of branch labels Fixes a couple of edge cases introduced by the scatterplot functionality which would result in a tree rendering with branch labels when it shouldn't have them (and vice versa). * make frequencies tend to 0 in absense of data * Use trait titles for data filter display Each coloring variable is defined by both a "key" and a "title". Keys are (largely) used internally, whereas titles are intended for user- facing display. This commit improves the "Filter Data" sidebar UI to use titles, resulting in a more consistent (and nicer) UI. Closes nextstrain#1322 * changelog * version bump to 2.25.0 for release * [PhyloTree] branch label bugfix Fixes a bug where we sometimes ask PhyloTree to update the branch labels for a view without any branch labels, which would cause auspice to crash. I first tried to fix this in a80e186 but that didn't cover all the situations when this could arise. * increase padding value for frequencies * lint appeasment * changelog * version bump to 2.25.1 for release * set frequencies explicity to 0 if total is too low * reduce frequency normalizaton threshold to single constant * remove unassigned variable * Allow continuous colorings to define anchor points The schema currently allows datasets to provide a scale for non- continuous scales where specific trait values are given colour hexes (missing values are given greys by auspice). Here we extend this to continuous scales by interpreting the same data structure as anchor points which we interpolate between using the same method as we currently use for generating default continuous color scales (d3's `interpolateRgb`) * Allow legend entries to be user-defined This allows continuous colour scales to define custom legend entries, via a `legend` key in the JSON. This allows control over the values in the scale which we use as legend elements, the displayed text, and the range of values which each entry covers. Bounds are enforced to be non-overlapping. If overlapping bounds are detected, we revert to Auspice dynamically generating these. (This is a requirement for future work which will map continuous tip values to a legend entry, which will allow pie-chart display using the legend swatches.) * Legend bound matching is (a, b] for continuous scales This restores the algorithm used to associate a hovered legend item to tips for continuous variables. Commit 0f37b1a (Mar 2018) incorrectly changed this to `tip \in [a, b]` rather than the intended (and documented) `tip \in (a, b]`. This takes on more importance given that the previous commit allows user-defined bounds. Note that the frequencies panel already used `(a, b]` matching, so now the legend matching mirrors this. * Extend user-provided legend info beyond continuous scales * Use filterOptions to modify search alg * GitHub Action to create nextstrain.org PR This action will run on each auspice PR and create a corresponding PR on nextstrain.org which includes a commit using the version of auspice from this (auspice) PR. This functionality is extremely useful for auspice development as it will allow us to use a Heroku review app to test auspice in the context of nextstrain.org There are a number of future improvements to implement: * New auspice releases (tagged commits on `release` branch) would ideally create a PR on nextstrain.org which could be merged to update the version of auspice there. * Other consumers of auspice (e.g. auspice.us) could be added to this GitHub Action. * Allow non-continuous scatterplot variables This implements a requested improvement to the original scatterplot implementation. The implementation hinges on two changes: (1) The collection of values for a given variable (e.g. x-var) need to be computed and passed to PhyloTree to act as the scale's domain. We reuse the colorScale machinery here, which could be optimised (see todo messages in code), but this has the advantage that the domain ordering matches the legend (unless user supplied). (2) PhyloTree needed to be modified to use non-linear scales, in this case `pointScale`. This commit should be fully functional, however there are some future improvements to be made: (i) Grid text is obscured and unreadable when there are many entries in the domain. (ii) Genotypes and Boolean scales are not yet available. (iii) Jitter should be added to nodes to avoid obfuscation. * Layout changes occur via redux thunk This commit is in preparation for allowing genotype to be a scatterplot variable. This will complicate the allowable scatterplot variables and force these to update upon colorBy changes. This is much cleaner if layout is changed in a thunk. * Allow genotype to be scatterplot variable Genotype is treated differently to other colorings in two important ways: (1) it can change value, for instance when changing the colorby to another genotype position and (2) it is stored in a different place to other colorings. These require scatterplot logic to be more complex as actions are no longer separate - we now require a NEW_COLOURS action to potentially update the layout which was formerly within the remit of the CHANGE_LAYOUT actions. This is achieved through a middleware layer. This implementation makes it clear that jitter and better domain spacing are crucial for scatterplots. * Improve padding for categorical scatterplot variables This prevents nodes falling on the axis itself or at the very end of the grid, which was especially noticeable for traits with small domains. * Add jitter to categorical scatterplots * Apply clipping to first column of legend We have had issues in the past with legend values from column 1 overflowing into column 2. For instance, issue nextstrain#899 was fixed by PR nextstrain#914 which implemented a maximum character limit for legend names. This solution can produce misleading views, such as those described in nextstrain#1306. This solution implements a clipping mask for column 1, avoiding the complication of limiting the string size. Column 2 already has similar behaviour because the SVG element of the legend itself performs the clipping. * changelog * version bump to 2.26.0 for release * Always show regression toggle for clock layout Fixes a bug where the ability to toggle regression lines was hidden for clock views. (The ability to hide this toggle is only intended for scatter layouts, where we should not expose the toggle unless both axes are showing continuous variables.) * Adjust grayscale color ramp The existing grayscale color ramp (used for values absent in an explicitly specified color scale) had values that were too dark and threw off the overall color balance. This commit narrows the grayscale color ramp to be more in line with pastel color ramp. * Inject a bit of color into the "grayscale" color ramp This adds a bit of blue into the grayscale color ramp. Still reads as mostly gray, but no colors seem to exist more in the same universe as canonical auspice color ramp. * changelog * version bump to 2.27.0 for release * Styling adjustments to footer text * Remove metadata download from GISAID datasets This commit uses dataProvenance in metadata to identify datasets using "GISAID" data. For these datasets, the full metadata download is swapped to an "acknowledgments" download that only includes the following fields: - strain - gisaid_epi_isl - genbank_accession - originating_lab - submitting_lab - author * Cleanup metadata headers This commit cleans up naming of metadata headers in downloaded metadata TSV. It does the following: 1. Keeps headers as input into "augur export" rather than renaming by title. Thus it has "originating_lab" rather than "Originating lab", "pango_lineage" rather than "PANGO lineage", etc... This should make it easier for people to process downloaded metadata from Auspice alongside metadata provisioned by Nextstrain (via GISAID or via S3). 2. Makes "date" the second column as this is often what's most important. I couldn't figure out a way to intelligently order remaining fields. My first thought was to use metadata.colorings, but this isn't sorted. 3. Fixes "accession". It had been exporting as "[object Object]". * Update changelog * version bump to 2.28.0 for release Co-authored-by: Trevor Bedford <[email protected]> Co-authored-by: James Hadfield <[email protected]> Co-authored-by: Michał Kowalski <[email protected]> Co-authored-by: Thomas Sibley <[email protected]> Co-authored-by: james hadfield <[email protected]> Co-authored-by: Charlie Jones <[email protected]> Co-authored-by: eharkins <[email protected]> Co-authored-by: Richard Neher <[email protected]> Co-authored-by: Muhammad Aditya Hilmy <[email protected]>

Show root-to-tip mutations in tip-clicked info box

8d7f78e

jameshadfield temporarily deployed to auspice-mutations-to-ti-bahczp January 28, 2021 21:06 Inactive

Clean up mutation display in tip-clicked info box

dd6a679

trvrb temporarily deployed to auspice-mutations-to-ti-bahczp January 29, 2021 05:00 Inactive

Cap string length in tip-clicked info box

9e3e4c3

trvrb temporarily deployed to auspice-mutations-to-ti-bahczp January 29, 2021 05:37 Inactive

joverlee521 reviewed Jan 30, 2021

View reviewed changes

jameshadfield added 3 commits March 8, 2021 17:36

Correctly handle reversions & multiple mutations

ddaff9e

The function `collectMutations` now doesn't report reversions (where the tip state = the ancestral state) and combines multiple mutations (e.g. A->B->C is now A->C rather than two separate mutations).

Ensure unique component keys for rendering

0653954

Report all root-to-selected-tip mutations

4fee777

trvrb temporarily deployed to auspice-mutations-to-ti-q73a3p March 9, 2021 23:08 Inactive

trvrb temporarily deployed to auspice-mutations-to-ti-q73a3p March 10, 2021 00:04 Inactive

Styling adjustments for mutation list

0a6fb81

trvrb force-pushed the mutations-to-tip branch from 027d0aa to 0a6fb81 Compare March 10, 2021 00:13

trvrb temporarily deployed to auspice-mutations-to-ti-q73a3p March 10, 2021 00:13 Inactive

jameshadfield merged commit 1069d30 into master Mar 10, 2021

jameshadfield deleted the mutations-to-tip branch March 10, 2021 00:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Show root-to-tip mutations in tip-clicked info box #1280

Show root-to-tip mutations in tip-clicked info box #1280

jameshadfield commented Jan 28, 2021

trvrb commented Jan 29, 2021

trvrb commented Jan 29, 2021 •

edited

Loading

emmahodcroft commented Jan 29, 2021

joverlee521 Jan 30, 2021

trvrb Jan 30, 2021

emmahodcroft Jan 30, 2021

jameshadfield Mar 8, 2021

trvrb Mar 8, 2021

jameshadfield Mar 9, 2021

trvrb Mar 9, 2021

trvrb commented Mar 10, 2021 •

edited

Loading

Show root-to-tip mutations in tip-clicked info box #1280

Show root-to-tip mutations in tip-clicked info box #1280

Conversation

jameshadfield commented Jan 28, 2021

trvrb commented Jan 29, 2021

trvrb commented Jan 29, 2021 • edited Loading

emmahodcroft commented Jan 29, 2021

joverlee521 Jan 30, 2021

Choose a reason for hiding this comment

trvrb Jan 30, 2021

Choose a reason for hiding this comment

emmahodcroft Jan 30, 2021

Choose a reason for hiding this comment

jameshadfield Mar 8, 2021

Choose a reason for hiding this comment

trvrb Mar 8, 2021

Choose a reason for hiding this comment

jameshadfield Mar 9, 2021

Choose a reason for hiding this comment

trvrb Mar 9, 2021

Choose a reason for hiding this comment

trvrb commented Mar 10, 2021 • edited Loading

trvrb commented Jan 29, 2021 •

edited

Loading

trvrb commented Mar 10, 2021 •

edited

Loading