Add support for categorical colorbys to scatterplot #1339

trvrb · 2021-04-27T00:20:53Z

The scatterplot functionality has proved highly enabling to be able to view multiple aspects of data in a single view. One chief example at this moment is to compare attributes of emerging lineages, where a couple common views look like:

Color by emerging lineage, time on x axis, spike S1 mutations on y axis
https://nextstrain.org/ncov/global?branches=hide&c=emerging_lineage&l=scatter&scatterY=S1_mutations

Color by emerging lineage, time on x axis, logistic growth on y axis
https://nextstrain.org/ncov/global?c=emerging_lineage&l=scatter&scatterY=logistic_growth

Both these views help to surface which emerging lineages have highest S1 mutations (P.1) and which emerging lineages have highest rates of logistic growth (largely B.1.1.7), but other lineages get lost in the mix due to occlusion of tips.

For the desired comparison of S1 mutations and logistic growth across emerging lineages it would be preferable to have emerging lineage on the x axis and S1 mutations or logistic growth on the y axis.

In this case, rather than a regression line, I'd imagine a horizontal black line for each categorical variable demarcating its mean.

huddlej · 2021-04-27T20:57:59Z

One other use case that occurred to me recently would be a kind of transmission network view where we could plot time on the x-axis, regions on the y-axis, color by clades, and look for clades whose branches traverse region boundaries (sort of like a slope plot). For some subset of geographic locations, we could get a similar effect from the current functionality by plotting latitude or longitude on the y-axis.

Edit: Here is an example of what I was trying to describe above where I've plotted strains from a Washington-focused tree by sample date and latitude (of country as inferred by augur traits for internal nodes or missing data), colored by country:

Filter view to only strains from USA and identify transmissions into USA from other latitudes:

Zoom in to see possible transmission from Mexico to USA. The three red diagonal lines suggest three separate introductions and their slopes suggest different rates at which the introductions occurred (note that this tree is biased heavily toward Washington and North America, so it isn't a fair representation):

Switch back to tree view to see the phylogenetic context and confirm that there do appear to be three separate introductions into the US:

jameshadfield · 2021-06-10T04:40:11Z

This feature was released in v2.26.0

trvrb added the enhancement New feature or request label Apr 27, 2021

jameshadfield closed this as completed Jun 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for categorical colorbys to scatterplot #1339

Add support for categorical colorbys to scatterplot #1339

trvrb commented Apr 27, 2021 •

edited

Loading

huddlej commented Apr 27, 2021 •

edited

Loading

jameshadfield commented Jun 10, 2021

Add support for categorical colorbys to scatterplot #1339

Add support for categorical colorbys to scatterplot #1339

Comments

trvrb commented Apr 27, 2021 • edited Loading

huddlej commented Apr 27, 2021 • edited Loading

jameshadfield commented Jun 10, 2021

trvrb commented Apr 27, 2021 •

edited

Loading

huddlej commented Apr 27, 2021 •

edited

Loading