Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect frequency plot from augur frequencies #1279

Closed
ps120195 opened this issue Jan 27, 2021 · 1 comment · Fixed by #1301
Closed

Incorrect frequency plot from augur frequencies #1279

ps120195 opened this issue Jan 27, 2021 · 1 comment · Fixed by #1301

Comments

@ps120195
Copy link

augur_frequencies

In the attached figure , the yellow colour is showing the mutation frequency at 0% ,which should not be the case ideally.
Please help

jameshadfield added a commit that referenced this issue Mar 9, 2021
When a frequency value is below 1% we now display "<1%" rather than rounding to the nearest integer which could lead to confusing output of  "frequency: 0%".

Closes #1279
@jameshadfield
Copy link
Member

@ps120195 this was a rounding error -- in this case the frequency was below 0.5% and was being rounded to 0%. PR #1301 will improve this output.

tomkinsc added a commit to broadinstitute/auspice that referenced this issue Jul 6, 2021
…:master (#7)

* Enforce unnormalized frequencies when data is lacking

This commit forces controls.normalizeFrequencies to be false if there are any pivots where the total frequency is less than 0.1%.

This addresses two issues:
1. In the existing code, attempting to normalize situations where pivots have 0% total frequency results in bad looking "all equal" bands. This commit removes the capacity to get into this bad looking app state.
2. When filtering to a particular clade, we often want to switch to unnormalized frequencies anyway (as opposed to filtering to geography). This commit accomplishes this automatically because filtering to an emerging clade will generally result in pivots with <0.1% frequency.

* Show root-to-tip mutations in tip-clicked info box

* Clean up mutation display in tip-clicked info box

* Cap string length in tip-clicked info box

* Added Polish language to locales

* Added Polish language to sidebar options

* fixed typos in sidebar.json

* controls/filter: Avoid spread syntax with potentially large arrays

Push each value individually instead of all at once, which results in
more method calls but a much smaller call stack size.  Alternatively,
Array.concat could be used, but this follows the pattern of surrounding
code and avoids reassignment of "options", which would also necessitate
removal of the "const" declaration.

The /tb/global build has ~277k genotype states, which resulted in a call
to Array.push with as many arguments when the spread syntax was used.
This blew through the call stack size limit on Chrome with an error
like:

    RangeError: Maximum call stack size exceeded
        at FilterData.eval [as makeOptions] (filter.js?6bcb:65)
        at FilterData.render (filter.js?6bcb:100)
        …

Firefox was unaffected, so presumably has a larger limit.

Debugging was waylaid for a bit by the assumption that exceeding the
call stack size necessarily meant deep recursion, but the lack of a deep
stack trace led to the realization that it can also occur when a
function's arguments are too many.

Resolves nextstrain#1292.

* Correctly format BCE dates

BCE dates were correctly interpreted but incorrectly rendered
due to a bug in the final string-prettying step. This would
result in a tree with the correct layout and positioning, but
incorrect labels ("-undefined"). This commit remedies this and
adds a test.

* Increase bundlesize limits

This simply increases the limits to allow CI tests to pass.
Large bundlesizes are a long-term concern, but given the
recent bugs regarding bundling and other priorities this change
essentially pushes improvements here to "sometime in the future"

* Correctly handle reversions & multiple mutations

The function `collectMutations` now doesn't report reversions (where the tip state = the ancestral state) and combines multiple mutations (e.g. A->B->C is now A->C rather than two separate mutations).

* Ensure unique component keys for rendering

* Report all root-to-selected-tip mutations

* Indicate if frequencies can be normalized in sidebar

Previous implementation continued to display the toggle icon but disabled its functionality (by forcing `normalizeFrequencies` to be `false`). Here we replace the toggle with a "not available" message & update the info-popup text.

* Update normalizeFrequencies flag via redux actions

Upon initial parsing of the frequencies JSON as well as frequency data updates we may set redux→controls→normalizeFrequencies→false. This commit modifies the LOAD_FREQUENCIES and FREQUENCY_MATRIX actions to pass this information to the reducer, rather than updating the redux state directly from within the actions.

I couldn't find any bugs caused by the previous implementation, but this change is more in line with suggested behaviour and should help future-proof work here.

* Update frequencies panel when data changes

The frequencies component performed some basic comparisons between the previously rendered data and new data to avoid unnecessary re-renders. This logic was too simplistic and caused a bug where the component wouldn't re-render the graph when the data had indeed changed. This commit skips these checks. This may result in some unnecessary re-renders, however I couldn't find any in my testing.

Closes nextstrain#1224

* [frequencies panel] Don't round frequencies below 1%

When a frequency value is below 1% we now display "<1%" rather than rounding to the nearest integer which could lead to confusing output of  "frequency: 0%".

Closes nextstrain#1279

* Legend no longer obscures branches/tips.

Shifts the top of the tree down slightly so that tips and branches cannot be hidden behind the (closed) legend, which prevents interacting with them. This only happens in rectangular / unrooted trees, as radial / clock views almost never have tips rendered in the top-left corner.

* Styling adjustments for mutation list

* Allow JSONs to define language

This is a squashed & rebased version of PR nextstrain#1221,
which itself superseded PR nextstrain#1218.

Closes nextstrain#1049.

Co-authored-by: Charlie Jones <[email protected]>
Co-authored-by: eharkins <[email protected]>

* Fix spelling typo

* changelog

* version bump to 2.24.0 for release

* Ensure metadata.display_defaults exists

A bug was introduced in PR nextstrain#1280 where datasets which did not
define `metadata.display_defaults` would crash, as the code assumed
its existence. This property is optional in the dataset
JSON.

This commit ensures `display_defaults` exists in redux state after
a dataset is loaded, thus allowing code to rely on its presence.
This was preferred to checking for `display_defaults` in (all) the
relevant sections of code, now and in future.

(Using TypeScript, or expanding our smoke tests, would be approaches
to avoiding these kinds of bugs in future.)

* version bump to 2.24.1 for release

* Treat accessions and urls as special node traits

The schema defines these as "special" property, and we use them to render the value and link to be rendered via `<AccessionAndUrl>` within the tip-clicked panel.

These should not be available as valid traits for general display.

* Allow node_traits to define their own URLs

This extends our interpretation of dataset-supplied traits to allow them to define a URL as well as a value. If a url is specified, then the value (in the tip-clicked panel) is rendered as a link.

Closes nextstrain#1307

* Improve validation of URLs & add tests

This improves our validation of URLs which should improve app stability.

* Generic scatterplot layouts

This adds a new layout for scatterplots and allows users to choose
the x and y variables from the available colorings. The defaults are
the tree metric (x-axis) and the current color-by (y-axis).

The layout algorithm is largely unchanged from the root-to-tip layout.
This presupposes that node trait values will be numeric, and thus
map nicely to an axis. Future work will allow scales to map
non-numeric values (e.g. categorical, ordinal, boolean scales) to
a d3 domain for rendering. Currently these traits get assigned `0`
as their x and/or y values.

Similarly, the algorithm presupposes that all nodes (internal and
terminal) have values and should be rendered. There will be many
cases where nodes (especially internal nodes) do not have traits
assigned. In these cases we should hide them from view, and remove
any connecting branches.

Future work needed:
* More testing is needed for rare use cases, e.g. trees without
divergence, datasets with no colorings.
* Dataset JSONs and URL queries should be able to select the
scatterplot variables.

This commit is based off previous work by trvrb.

Co-authored-by: Trevor Bedford <[email protected]>

* Add support for "data_provenance" metadata

In the early stages of COVID-19, we added support for acknowledging
GISAID as the source of data in the Byline. This was inferred based
on domain / dataset name heuristics.

We now support data provenance to be defined in the dataset JSON
(see nextstrain/augur#705) and all core
nCoV builds have been updated to include this here. This commit
parses and renders such information.

Note that a previous commit removed the "Build info" from the byline
for datasets displaying GISAID (see [0]), which I believe was an
oversight. This commit reinstates it.

[0] nextstrain@18d5d21

* Scatterplots improved for continuous variables

This fixes some issues highlighted by the previous commit to improve
rendering of scatterplots. We now limit scatterplot x,y variable
choices to continuous-scaled colorings, and leave the display of
other scale types to future work as this requires PhyloTree to switch
to a new d3 scale.

As not all nodes may have traits assigned (contrary to other tree
layouts), we detect and hide those nodes from view, as well as any
joining branches. We also expose the ability to toggle branches
on/off.

We also improve the starting variable choices for x & y.

* Link out to gisaid.org

This adds a link to gisaid.org from the GISAID logo (when present). It also adjusts parsing so that data_provenance.name == "gisaid" will still get picked up.

* Unify clock and scatterplot layouts

As the clock view is simply a specific type of scatterplot layout,
this commit unifies the code and display between these two
"separate" layouts. We preserve the clock button in the sidebar
as this is a common action which we want to surface.

**Show branch toggles**
Are now rendered for both views. The layout of scatterplots does
not consider internal nodes for calculating the domain if branches
are not shown. Similarly, branch labels are not displayed if
branches are not.

**Regression Lines**
These are now available for both layouts, and are toggled via a UI
element similar to branches. Previously, the regression would be
shown for clock layouts _if_ the branch metric was time, however
the explicit UI element introduced here is better. For scatterplot
views we calculate the regression with a free intercept, as the root
node may not have co-ordinates defined (depending on chosen x,y
variables), and additionally report the R^2.
The display of the regression text can be improved in future commits.

**Persist chosen state**
To improve the UX, once a scatterplot has been viewed, we persist the
x,y  variables for future viewing. Similarly, the toggle state persists
between clock & scatter layouts.

* Store scatterplot state in URL query

See added documentation for available queries

* Render appropriate scatterplot axes grids

This commit updates the logic for deciding the gridlines for both
x and y axes for scatterplots. Previously we had a very limited range
of cases to consider here. We now have two general functions available
for creating grids - one for temporal scales and one for all other
numeric scales (previously used only for divergence). We will need
to add a third function when we expand scatterplots to plot
non-continuous variables.

* [bugfix] Initialize `filtersInFooter` for all datasets

This fixes a bug noticed in auspice.us [1] where certain
datasets would not have the `controls.filtersInFooter` state set,
causing a crash when metadata was dropped on. Type / prop checking
would have alerted us to this.

[1] nextstrain#1304

* [phylotree] fix case where x-axis grid wasn't present

A missed conditional resulted in certain configurations of the scatterplot "missing" the x-axis grid. (The grid was incorrectly being calculated for a temporal scale, which resulted in no meaningful grid lines.)

Closes nextstrain#1323

* [phylotree] fix edge cases surrounding display of branch labels

Fixes a couple of edge cases introduced by the scatterplot functionality which would result in a tree rendering with branch labels when it shouldn't have them (and vice versa).

* make frequencies tend to 0 in absense of data

* Use trait titles for data filter display

Each coloring variable is defined by both a "key" and a "title". Keys
are (largely) used internally, whereas titles are intended for user-
facing display. This commit improves the "Filter Data" sidebar UI
to use titles, resulting in a more consistent (and nicer) UI.

Closes nextstrain#1322

* changelog

* version bump to 2.25.0 for release

* [PhyloTree] branch label bugfix

Fixes a bug where we sometimes ask PhyloTree to update the branch labels
for a view without any branch labels, which would cause auspice
to crash.
I first tried to fix this in a80e186
but that didn't cover all the situations when this could arise.

* increase padding value for frequencies

* lint appeasment

* changelog

* version bump to 2.25.1 for release

* set frequencies explicity to 0 if total is too low

* reduce frequency normalizaton threshold to single constant

* remove unassigned variable

* Allow continuous colorings to define anchor points

The schema currently allows datasets to provide a scale for non-
continuous scales where specific trait values are given colour hexes
(missing values are given greys by auspice).

Here we extend this to continuous scales by interpreting the same
data structure as anchor points which we interpolate between using
the same method as we currently use for generating default continuous
color scales (d3's `interpolateRgb`)

* Allow legend entries to be user-defined

This allows continuous colour scales to define custom legend
entries, via a `legend` key in the JSON. This allows control
over the values in the scale which we use as legend elements,
the displayed text, and the range of values which each entry covers.

Bounds are enforced to be non-overlapping. If overlapping bounds
are detected, we revert to Auspice dynamically generating these.
(This is a requirement for future work which will map continuous
tip values to a legend entry, which will allow pie-chart display
using the legend swatches.)

* Legend bound matching is (a, b] for continuous scales

This restores the algorithm used to associate a hovered legend
item to tips for continuous variables. Commit
0f37b1a (Mar 2018) incorrectly changed
this to `tip \in [a, b]` rather than the intended (and documented)
`tip \in (a, b]`.

This takes on more importance given that the previous commit allows
user-defined bounds.

Note that the frequencies panel already used `(a, b]` matching, so now
the legend matching mirrors this.

* Extend user-provided legend info beyond continuous scales

* Use filterOptions to modify search alg

* GitHub Action to create nextstrain.org PR

This action will run on each auspice PR and create a corresponding
PR on nextstrain.org which includes a commit using the version
of auspice from this (auspice) PR. This functionality is extremely
useful for auspice development as it will allow us to use a Heroku
review app to test auspice in the context of nextstrain.org

There are a number of future improvements to implement:

* New auspice releases (tagged commits on `release` branch)
would ideally create a PR on nextstrain.org which could be merged
to update the version of auspice there.

* Other consumers of auspice (e.g. auspice.us) could be added to this
GitHub Action.

* Allow non-continuous scatterplot variables

This implements a requested improvement to the original
scatterplot implementation. The implementation hinges on two changes:
(1) The collection of values for a given variable (e.g. x-var) need
to be computed and passed to PhyloTree to act as the scale's domain.
We reuse the colorScale machinery here, which could be optimised
(see todo messages in code), but this has the advantage that the
domain ordering matches the legend (unless user supplied).
(2) PhyloTree needed to be modified to use non-linear scales, in this
case `pointScale`.

This commit should be fully functional, however there are some
future improvements to be made:

(i) Grid text is obscured and unreadable when there are many entries
in the domain.
(ii) Genotypes and Boolean scales are not yet available.
(iii) Jitter should be added to nodes to avoid obfuscation.

* Layout changes occur via redux thunk

This commit is in preparation for allowing genotype to be a scatterplot
variable. This will complicate the allowable scatterplot variables
and force these to update upon colorBy changes. This is much cleaner
if layout is changed in a thunk.

* Allow genotype to be scatterplot variable

Genotype is treated differently to other colorings in two important
ways: (1) it can change value, for instance when changing the
colorby to another genotype position and (2) it is stored in a
different place to other colorings. These require scatterplot logic
to be more complex as actions are no longer separate - we now require
a NEW_COLOURS action to potentially update the layout which was
formerly within the remit of the CHANGE_LAYOUT actions. This is
achieved through a middleware layer.

This implementation makes it clear that jitter and better domain
spacing are crucial for scatterplots.

* Improve padding for categorical scatterplot variables

This prevents nodes falling on the axis itself or at the very end of
the grid, which was especially noticeable for traits with small domains.

* Add jitter to categorical scatterplots

* Apply clipping to first column of legend

We have had issues in the past with legend values from column 1 overflowing into column 2. For instance, issue nextstrain#899 was fixed by PR nextstrain#914 which implemented a maximum character limit for legend names. This solution can produce misleading views, such as those described in nextstrain#1306.

This solution implements a clipping mask for column 1, avoiding the complication of limiting the string size. Column 2 already has similar behaviour because the SVG element of the legend itself performs the clipping.

* changelog

* version bump to 2.26.0 for release

* Always show regression toggle for clock layout

Fixes a bug where the ability to toggle regression lines was hidden for clock views. (The ability to hide this toggle is only intended for scatter layouts, where we should not expose the toggle unless both axes are showing continuous variables.)

* Adjust grayscale color ramp

The existing grayscale color ramp (used for values absent in an explicitly specified color scale) had values that were too dark and threw off the overall color balance. This commit narrows the grayscale color ramp to be more in line with pastel color ramp.

* Inject a bit of color into the "grayscale" color ramp

This adds a bit of blue into the grayscale color ramp. Still reads as mostly gray, but no colors seem to exist more in the same universe as canonical auspice color ramp.

* changelog

* version bump to 2.27.0 for release

* Styling adjustments to footer text

* Remove metadata download from GISAID datasets

This commit uses dataProvenance in metadata to identify datasets using "GISAID" data. For these datasets, the full metadata download is swapped to an "acknowledgments" download that only includes the following fields:
 - strain
 - gisaid_epi_isl
 - genbank_accession
 - originating_lab
 - submitting_lab
 - author

* Cleanup metadata headers

This commit cleans up naming of metadata headers in downloaded metadata TSV. It does the following:
1. Keeps headers as input into "augur export" rather than renaming by title. Thus it has "originating_lab" rather than "Originating lab", "pango_lineage" rather than "PANGO lineage", etc... This should make it easier for people to process downloaded metadata from Auspice alongside metadata provisioned by Nextstrain (via GISAID or via S3).
2. Makes "date" the second column as this is often what's most important. I couldn't figure out a way to intelligently order remaining fields. My first thought was to use metadata.colorings, but this isn't sorted.
3. Fixes "accession". It had been exporting as "[object Object]".

* Update changelog

* version bump to 2.28.0 for release

Co-authored-by: Trevor Bedford <[email protected]>
Co-authored-by: James Hadfield <[email protected]>
Co-authored-by: Michał Kowalski <[email protected]>
Co-authored-by: Thomas Sibley <[email protected]>
Co-authored-by: james hadfield <[email protected]>
Co-authored-by: Charlie Jones <[email protected]>
Co-authored-by: eharkins <[email protected]>
Co-authored-by: Richard Neher <[email protected]>
Co-authored-by: Muhammad Aditya Hilmy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants