Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-linear scale changes geom_density values #4783

Closed
davidchall opened this issue Mar 31, 2022 · 3 comments
Closed

Non-linear scale changes geom_density values #4783

davidchall opened this issue Mar 31, 2022 · 3 comments

Comments

@davidchall
Copy link

davidchall commented Mar 31, 2022

It is often useful to use geom_density() and geom_function() together, to compare data to a model. This is even an example in the geom_function() man page. However, the density values output by geom_density() change when a logarithmic axis is used, making such comparisons difficult.

I suspect that geom_density() is performing the kernel density estimation upon the data after the scale transformation has been applied. This will yield smoother curves in the logarithmic scale. However, it also reduces the distance between points, and this increases the density value.

If this is what's going on, perhaps we can find a way to apply an inverse transformation to the density values?

library(tidyverse)

p <- tibble(x = rnorm(1000, mean = 10)) %>%
  ggplot(aes(x)) +
  geom_density() +
  geom_function(fun = dnorm, args = list(mean = 10), color = "red")

p + scale_x_continuous(limits = c(1, NA))

p + scale_x_log10(limits = c(1, NA))

Created on 2022-03-30 by the reprex package (v2.0.1)

@davidchall davidchall changed the title Log scale changes geom_density values Non-linear scale changes geom_density values Mar 31, 2022
@davidchall
Copy link
Author

Related: r-lib/scales#322

@mjskay
Copy link
Contributor

mjskay commented Apr 4, 2022

Right, density functions need to have the Jacobian correction applied when being transformed. In this case, given a density function f(x) (here dnorm(x)) and a scale transformation function g(x) (here log(x, base = 10)), you must multiply f by the absolute value of derivative of the inverse of g applied to the transformed x values:

CodeCogsEqn (1)

In this case g(x) is log10(x) and [g-1]'(x) is 10xlog(10) so this simplifies to f(x)|xlog(10)|. You can apply this correction manually:

library(tidyverse)

set.seed(1234)

tibble(x = rnorm(1000, mean = 10)) %>%
  ggplot(aes(x)) +
  geom_density(aes(x)) +
  geom_function(fun = \(x) dnorm(x, mean = 10) * abs(x * log(10)), color = "red") +
  scale_x_log10()

Created on 2022-04-03 by the reprex package (v2.0.1)

Of course it would be nice if the correction did not need to be applied manually. Shameless plug: {ggdist} currently supports this by finding the derivatives of scale transformations and applying them automatically to distributions' density functions, for example:

library(tidyverse)
library(ggdist)
library(distributional)

set.seed(1234)

tibble(x = rnorm(1000, mean = 10)) %>%
  ggplot() +
  stat_slab(aes(xdist = dist), data = data.frame(dist = dist_normal(10, 1)), normalize = "none", scale = 1) +
  geom_density(aes(x)) +
  scale_x_log10()

Created on 2022-04-03 by the reprex package (v2.0.1)

Currently ggdist does this through a combination of symbolic and numeric differentiation, but it would be nice not to have to do that (this is the motivation for r-lib/scales#322).

I'm not sure what or if there's a good way of handling that in geom_function(). The simplest thing I can imagine is an option like jacobian = TRUE or density = TRUE which would apply the Jacobian correction to the function's return values, but I'm not sure if that is too idiosyncratic a solution.

@hadley
Copy link
Member

hadley commented Apr 14, 2022

Thanks for filing this issue! Unfortunately, I think it's out of scope for this package: developing good software requires relentless focus, which means that we have to say no to many good ideas. Even though I'm closing this issue, I really appreciate the feedback, and hope you'll continue to contribute in the future 😄

@hadley hadley closed this as completed Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants