Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rescale geom_density for non-linear scales #4785

Closed
wants to merge 2 commits into from

Conversation

davidchall
Copy link

This is a proof-of-concept to address #4783. There might be some context I'm missing here, so I'm interested to hear feedback before proceeding any further.

library(tidyverse)

p <- tibble(x = rnorm(1000, mean = 10)) %>%
  ggplot(aes(x)) +
  geom_density() +
  geom_function(fun = dnorm, args = list(mean = 10), color = "red")

p + scale_x_continuous(limits = c(1, NA))

p + scale_x_log10(limits = c(1, NA))

p + scale_x_sqrt(limits = c(1, NA))

Created on 2022-03-30 by the reprex package (v2.0.1)

total_post <- integrate_density(trans$inverse(dens$x), dens$y)

dens$y <- dens$y * total_pre / total_post

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few observations

  1. If stats::density guarantees that the area is functionally 1, then total_pre is not needed.
  2. Instead of integrate_density why not hide all this diligence under a rescale_density function.
  3. Rescale the density only if the trans is not an identity transformation.

Copy link
Author

@davidchall davidchall Mar 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Hassan. For now, I'm just hoping to understand if the ggplot authors think we should be rescaling the density. I don't expect this will be the final implementation.

Having said that, you've raised some excellent points. I'll address them below:

If stats::density guarantees that the area is functionally 1, then total_pre is not needed.

Unfortunately, stats::density() doesn't guarantee this. In fact, the output is a linear interpolation of a fast Fourier transform and is represented by floating point numbers, so it's very unlikely the area equals exactly 1. Given this, we need to take a ratio of the areas before and after the transformation.

Instead of integrate_density why not hide all this diligence under a rescale_density function.

When it comes to integrating the area under the curve, there are quite a few choices to make (e.g., use stats::integrate(), use the trapezoidal rule, etc). I've just done a very simple approximation (the Riemann sum). So my integrate_density() function is acting as a placeholder.

Rescale the density only if the trans is not an identity transformation.

I totally agree on this. We shouldn't waste CPU time when there's no transformation to be performed. I'm not sure of the best way to identify this though, so I'd love feedback on how to achieve this.

@davidchall davidchall closed this Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants