Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convey uncertainty via tip colors #1796

Merged
merged 2 commits into from
Jun 30, 2024
Merged

Conversation

jameshadfield
Copy link
Member

@jameshadfield jameshadfield commented Jun 28, 2024

The previous code conveyed uncertainty in node attrs for branches by making them appear grey-er, but we never implemented this for tips; most likely because we never had a dataset with such data when this was built.

Here we use the same approach for tips as for branches, but with a slightly different parameterisation of the interpolation. The mapping of the entropy value into [0,1] (tipOpacityFunction) was chosen so that tips with no (or very little) uncertainty look unchanged from previous Auspice versions, and uncertainty makes them appear more similar to the branch colour (for an equivalent uncertainty).

There should be no visible changes for views without any uncertainty (genotype is a good one to use to test this), as well as traits where there is uncertainty in the dataset but not in the tips (e.g. ebola country / division reconstruction). Here's a side-by-side with the h5n1-cattle-flu dataset from nextstrain/avian-flu#66, which identified this issue in Auspice (this PR LHS, current Auspice RHS):

Frame 12(1)

  • Checks pass (CI passes, RTD is currently broken however that's unrelated to this PR)
  • If making user-facing changes, add a message in CHANGELOG.md summarizing the changes in this PR
  • (to be done by a Nextstrain team member) [Create preview PRs on downstream repositories][1].

Consuming code expected this to be a boolean and all actions which
update this state set a boolean. Thankfully this means the default state
was never used in practice.
@nextstrain-bot nextstrain-bot temporarily deployed to auspice-james-uncertain-wu7geg June 28, 2024 01:12 Inactive
.clamp(true);
const tipOpacityFunction = branchOpacityFunction
.copy()
.range([0, 0.9]); // if entropy close to 0 return the original node color
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The range (and the domain) of this scale are the magic values which control how entropy values affect the stroke colour (via an interpolation between the original colour & grey). The tip fill colour is a brighter version of the stroke colour. Very happy for adjustments here, although I think it's important that tipOpacityFn(close-to-zero) -> 0 so that we don't change how the majority of datasets appear.

@jameshadfield
Copy link
Member Author

Some URLs to compare this PR on nextstrain.org vs released Auspice on nextstrain.org:

Cattle-flu new & old

Zika (country) new & old. Note that this does have uncertainty for tips, which is kind of strange, but that's how augur traits currently works.

H3N2 (genotype view) new & old - no uncertainty here.

Copy link
Contributor

@joverlee521 joverlee521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would one be able to differentiate the grey scale for uncertainty vs grey scale for unprovided colorings? For example, imagine if zika's region had uncertainty, it would be mixed in with the "Asia" grey colorings.

(The default "Asia" coloring issue will be fixed when augur is released with nextstrain/augur#1490, but the question still stands for any other unprovided colorings)

* @param {bool} confidence enabled?
* @return {array} array of hex's. 1-1 with nodes.
*/
export const calcBranchStrokeCols = (tree, confidence, colorBy) => {
export const calculateStrokeColors = (tree, branch, confidence, colorBy) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a request for change, just curious

Why combine branch/tip colors into one function with the branch flag when they are essentially completely different paths within the function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While they are ~separate code paths, they both do the same thing: take a colour and modify it according to the node's uncertainty. Co-locating them feels natural to me and should help them to stay in-sync.

I actually thought about taking this further and refactoring it into a function calculateNodeColors -> {branchColors, tipStrokeColors, tipFillColors} so we calculate everything at once, but I'll leave that for another day (and I need to check that we never have a situation where we only need to recompute one of those sets).

@trvrb
Copy link
Member

trvrb commented Jun 28, 2024

Awesome! This behavior looks spot on to me. Here's the current H5N1 cattle outbreak

Screenshot 2024-06-28 at 11 42 31 AM

Note that recent SRA tips are not completely gray. For example, the top clade descends from viruses sampled from South Dakota. These are appropriately colored a gray/green indicating potential South Dakota, but with little certainty.

Screenshot 2024-06-28 at 11 45 19 AM

The interpolation between SRA tips close to known Ohio viruses in blue to the the Michigan human case in lime also seems very appropriate.

Screenshot 2024-06-28 at 11 47 02 AM

I think Auspice is now doing exactly what it should be doing. However, we still should have a way to have a more data-informed decision about how to set --sampling-bias-correction. We could be doing leave 10% out cross validation as Gytis did in the 2019 BMC Evol Biol paper.

@trvrb
Copy link
Member

trvrb commented Jun 28, 2024

How would one be able to differentiate the grey scale for uncertainty vs grey scale for unprovided colorings? For example, imagine if zika's region had uncertainty, it would be mixed in with the "Asia" grey colorings.

This is a really good point @joverlee521. The issue is that currently we use gray to mean either:

  1. Unknown or uncertain
  2. Uninteresting

This uninteresting take can be seen here https://nextstrain.org/ncov/gisaid/north-america/6m@2020-05-01 for example. This felt semantically appropriate to distinguish focal samples from background samples.

I think this is okay however... This example does DTA on samples with a focal vs contextual color ramp so that uncertain nodes and contextual nodes are both gray. This feels okay and appropriate (perhaps not ideal, but not broken). It highlights clades that are more certain to be in a focal region.

That said, we should be fixing colorings like the Zika example so that random location is not gray. In the Zika example, "Asia" should be blue, like it is for country.

The previous code conveyed uncertainty in node attrs for _branches_ by
making them appear grey-er, but we never implemented this for _tips_;
most likely because we never had a dataset with such data when this was
built.

Here we use the same approach for tips as for branches, but with a
slightly different parameterisation of the interpolation. The mapping of
the entropy value into `[0,1]` (`tipOpacityFunction`) was chosen so that
tips with no (or very little) uncertainty look unchanged from previous
Auspice versions, and uncertainty makes them appear more similar to the
branch colour (for an equivalent uncertainty).
@jameshadfield jameshadfield force-pushed the james/uncertain-tip-attrs branch from f9e654f to c2ffa90 Compare June 30, 2024 23:50
@nextstrain-bot nextstrain-bot temporarily deployed to auspice-james-uncertain-wu7geg June 30, 2024 23:50 Inactive
@jameshadfield jameshadfield merged commit 3e818f0 into master Jun 30, 2024
25 of 26 checks passed
@jameshadfield jameshadfield deleted the james/uncertain-tip-attrs branch June 30, 2024 23:56
@jameshadfield
Copy link
Member Author

How would one be able to differentiate the grey scale for uncertainty vs grey scale for unprovided colorings? For example, imagine if zika's region had uncertainty, it would be mixed in with the "Asia" grey colorings.

Very difficult at the moment! Let's continue discussion in [maybe] differentiate between nodes with uncertainty vs nodes missing from colour scale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants