Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequency Panel Shows 'All Equal' Before Estimated Date of Ancestor on Zoomed-in Cluster #1225

Closed
emmahodcroft opened this issue Oct 27, 2020 · 3 comments · Fixed by #1278
Closed
Assignees
Labels
enhancement New feature or request

Comments

@emmahodcroft
Copy link
Member

Context
This is an improvement of the current frequency panel view, when "Normalize Frequencies" is turned on.

Description
Currently if you zoom in to a more recent cluster, then for any dates prior to the estimated most recent common ancestor (MRCA of that cluster, the frequency panel (with "Normalize frequencies" on) will show all available option as equally likely. This could be very misleading for some traits - shown here is a cluster that is entirely clade 20A, yet prior to the estimated MRCA of the cluster (early May), it is implied that all clades were equally likely - perhaps on the whole tree or perhaps as the ancestor for this cluster.

image

To Reproduce
Go to https://nextstrain.org/ncov/europe?c=clade_membership&d=tree,map,frequencies&f_region=Europe&p=grid and click on a clade that begins roughly after May or so. (Can also try other color-by options)

Possible solution
One option would be to allow/enable x-axis zooming on the normalized frequencies panel, and only ever show 'starting from' the date of the MRCA of the currently visible tree section.
Another option would be to just not show frequencies for dates prior to the MRCA of the currently visible tree section - instead starting the estimation of frequencies only from the date of that MRCA. (Showing white, black, grey, etc, before that time.)

Notes
@rneher adds this (from slack discussion):

the normalization is tricky. The is no data to inform frequencies prior to the root of the clade. The KDE frequencies we use essentially send their Gaussian tail into the past. but this gets to be a very small number which causes underflow. we fix this by adding a small number to each category which results in uniform frequencies when there is no data.

@jameshadfield feel free to re-label as bug if more appropriate!

@trvrb
Copy link
Member

trvrb commented Jan 21, 2021

I'd prioritize this based on interest in "clades" / "variants". I find this behavior super confusing. My suggestion here would be to gray out pivots in the frequency panel if the total frequency is under some threshold, say 1%. This would make the above example a box of gray until you get to June 2020.

@trvrb
Copy link
Member

trvrb commented Jan 26, 2021

After thinking about this further I think the best way to tackle is to automatically toggle "normalize frequencies" based on whether a pivot with less than 1% total frequency is detected.

Thus, filtering to: https://nextstrain.org/ncov/global?f_clade_membership=20I/501Y.V1 would automatically switch to "unnormalized frequencies" because there are pivots with less than 1% total frequency.

However, filtering to: https://nextstrain.org/ncov/global?f_country=USA would not automatically switch to "unnormalized frequencies" because all pivots are more than 1% total frequency.

Generally, this should generally make is so that filtering by geography results in continued normalized frequencies, while filtering to clade results in a toggle to unnormalized frequencies. This kills two birds by papering over the issue above (you can still get to the app state that's problematic but it's harder to get to this app state), but also addresses a separate core issue of wanting to understand frequency of variant / clade over time.

@trvrb
Copy link
Member

trvrb commented Jan 27, 2021

I've bumped this to "next up" and labeled it "high priority" due to frequency this will be encountered with new genotype filtering feature. For example it will be common to link to URLs like: https://dev.nextstrain.org/ncov/north-america?f_clade_membership=20C&gt=S.452R (in this case showing the "California variant" 452R). The default view for frequencies in this case is pretty bad with the "rainbow" all equal pattern before Sep 2020.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants