Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for categorical colorbys to scatterplot #1339

Closed
trvrb opened this issue Apr 27, 2021 · 2 comments
Closed

Add support for categorical colorbys to scatterplot #1339

trvrb opened this issue Apr 27, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@trvrb
Copy link
Member

trvrb commented Apr 27, 2021

The scatterplot functionality has proved highly enabling to be able to view multiple aspects of data in a single view. One chief example at this moment is to compare attributes of emerging lineages, where a couple common views look like:

Color by emerging lineage, time on x axis, spike S1 mutations on y axis
https://nextstrain.org/ncov/global?branches=hide&c=emerging_lineage&l=scatter&scatterY=S1_mutations
s1

Color by emerging lineage, time on x axis, logistic growth on y axis
https://nextstrain.org/ncov/global?c=emerging_lineage&l=scatter&scatterY=logistic_growth
Screen Shot 2021-04-26 at 5 12 51 PM

Both these views help to surface which emerging lineages have highest S1 mutations (P.1) and which emerging lineages have highest rates of logistic growth (largely B.1.1.7), but other lineages get lost in the mix due to occlusion of tips.

For the desired comparison of S1 mutations and logistic growth across emerging lineages it would be preferable to have emerging lineage on the x axis and S1 mutations or logistic growth on the y axis.

In this case, rather than a regression line, I'd imagine a horizontal black line for each categorical variable demarcating its mean.

@trvrb trvrb added the enhancement New feature or request label Apr 27, 2021
@huddlej
Copy link
Contributor

huddlej commented Apr 27, 2021

One other use case that occurred to me recently would be a kind of transmission network view where we could plot time on the x-axis, regions on the y-axis, color by clades, and look for clades whose branches traverse region boundaries (sort of like a slope plot). For some subset of geographic locations, we could get a similar effect from the current functionality by plotting latitude or longitude on the y-axis.

Edit: Here is an example of what I was trying to describe above where I've plotted strains from a Washington-focused tree by sample date and latitude (of country as inferred by augur traits for internal nodes or missing data), colored by country:

image

Filter view to only strains from USA and identify transmissions into USA from other latitudes:

image

Zoom in to see possible transmission from Mexico to USA. The three red diagonal lines suggest three separate introductions and their slopes suggest different rates at which the introductions occurred (note that this tree is biased heavily toward Washington and North America, so it isn't a fair representation):

image

Switch back to tree view to see the phylogenetic context and confirm that there do appear to be three separate introductions into the US:

image

@jameshadfield
Copy link
Member

This feature was released in v2.26.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants