Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

type_ridge() #252

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

vincentarelbundock
Copy link
Collaborator

@vincentarelbundock vincentarelbundock commented Nov 13, 2024

#71

This is pretty easy to implement (says the guy who couldn't figure it out for 3 hours).

library(tinyplot)
tinyplot(Month ~ Ozone, data = airquality, type = "ridge")

@vincentarelbundock vincentarelbundock marked this pull request as ready for review November 13, 2024 12:19
@zeileis
Copy link
Collaborator

zeileis commented Nov 13, 2024

This is really cool and I'm just starting to work through the examples. Quick first comment: For grid = TRUE I would have hoped to get horizontal lines matching the tick marks on the y-axis, e.g.:

tinyplot(~ Species ~ Sepal.Length, data = iris, type = "ridge", grid = TRUE)

@grantmcdermott
Copy link
Owner

A similar quick comment. I'd like to be able to do

tinyplot(Species ~ Sepal.Length | Species, data = iris, type = "ridge")

so that colors vary by the y-axis entries.

Accounting for this kind x==by or y==by logic normally requires some internal accounting, since we want to avoid splitting y (or x) by itself. But we've managed to do it in a few places. For example, R/type_spineplot.R:

tinyplot/R/type_spineplot.R

Lines 122 to 132 in c4d4e2c

x_by = identical(datapoints$x, datapoints$by)
y_by = identical(datapoints$y, datapoints$by)
# if either x_by or y_by are TRUE, we'll only split by facets and then
# use some simpl logic to assign colouring on the backend
if (isTRUE(x_by) || isTRUE(y_by)) {
datapoints = split(datapoints, list(datapoints$facet))
datapoints = Filter(function(k) nrow(k) > 0, datapoints)
} else {
datapoints = split(datapoints, list(datapoints$by, datapoints$facet))
datapoints = Filter(function(k) nrow(k) > 0, datapoints)
}

tinyplot/R/type_spineplot.R

Lines 209 to 211 in c4d4e2c

# catch for x_by / y/by
if (isTRUE(x_by)) datapoints$by = rep(xaxlabels, each = ny) # each x label extends over ny rows
if (isTRUE(y_by)) datapoints$by = rep(yaxlabels, length.out = nrow(datapoints))

(In the specific case of type_spineplot we have do some more work after this to handle custom color sequencing. But for adapting the logic to type_ridge I think that copying across the above two code chunks should suffice.)

@zeileis
Copy link
Collaborator

zeileis commented Nov 13, 2024

Great minds think alike, I was playing with the same thing. 🤓 More generally, faceting does not seem to work, yet.

Also, browsing the ggridges vignette, it would be really nice to have color gradients that help to compare the x-axis values across density curves. Are you planning to add this?

@grantmcdermott
Copy link
Owner

grantmcdermott commented Nov 13, 2024

Also, browsing the ggridges vignette, it would be really nice to have color gradients that help to compare the x-axis values across density curves. Are you planning to add this?

This would be quite a lot of work, no? Off the top of my head, I guess it would require either looping over the sequence of x values and drawing mini polygons (similar to this), or converting the polygon to an appropriate matrix and then rasterising it.

Perhaps there's a simpler solution. But I think that gradient fill support is probably out of scope for this PR. We can revisit the idea once we manage to fix #243, since the logic would probably carry over to regular density plots too.

Edit: To clarify, I think that this would be very cool. But I worry that supporting x gradient fill will require quite a lot of additional work.

@vincentarelbundock
Copy link
Collaborator Author

I added support for facets and fixed the grid problem.

I think that any fancier col or |by support would require a complete refactor of the by_aesthetics() functions. This is probably a good idea anyway (will open a different issue).

Unfortunately, I don't have the bandwidth for this right now. I can do minor fixes on PR review, but any major change will have to wait. We can merge this close to "as-is" (perhaps with an "experimental" tag), or we can wait a few weeks (months?) until I have more time.

library(tinyplot)

dat = transform(airquality, Late = ifelse(Day > 15, "Late", "Early"))
tinyplot(Month ~ Ozone,
  facet = ~Late,
  data = dat,
  type = "ridge",
  grid = TRUE,
  col = "white",
  bg = "light blue")

@grantmcdermott
Copy link
Owner

Great, thanks @vincentarelbundock. I want to take a stab at tweaking a few things so have cloned your fork locally and will test things. I'll push any changes that look good and then we can merge. Will probably be a few days.

@zeileis
Copy link
Collaborator

zeileis commented Nov 14, 2024

In the meantime, I'll have a look at how hard it would be to add a type_ridge(gradient = ...) specification. I hope that this shouldn't be excessive. If you merge before, it's probably straightforward to address it in a separate PR.

@zeileis
Copy link
Collaborator

zeileis commented Nov 14, 2024

OK, quick proof of concept:

tinyplot-ridge

To implement this I used a fixed grid of 1000 rectangles across the full range of the x variable. In the for() loop of the draw_ridge function:

  for (i in rev(seq_along(dsplit))) {
    if (gradient) {
      gn = 1000
      gc = hcl.colors(gn)
      gx = seq(from = min(d$x), to = max(d$x), length.out = gn + 1)
      gy = with(dsplit[[i]], approx(x = x, y = ymax, xout = gx)$y)
      gm = dsplit[[i]]$ymin[1]
      gy[is.na(gy)] = gm
      rect(gx[-(gn + 1)], gm, gx[-1], (gy[-1] + gy[-(gn + 1)])/2, col = gc, border = "transparent")
    }
    with(dsplit[[i]], polygon(x, ymax, col = if (gradient) "transparent" else ibg, border = icol))
  }

For the rect() to work it is crucial that gn is large enough so that you don't realize anymore that it's rectangles.

Instead one could also use polygon() to draw multiple polygons simultaneously. This would be more flexible and could also incorporate customized breaks and fewer colors. But the preprocessing of the data would require a bit more work...

@vincentarelbundock
Copy link
Collaborator Author

This looks amazing!

@zeileis
Copy link
Collaborator

zeileis commented Nov 16, 2024

OK, I have now a version which uses polygon() to draw multiple shaded polygons instead of drawing 1000 rect().

  • For the example I posted above, the outcome looks virtually identical.
  • The advantage is that it is sufficient to draw fewer polygons, say 100, while still producing a seemingly continuous gradient.
  • Moreover, one can also draw just a few, say 10, colors and select the breaks in between the intervals.
  • The disadvantage is that the code is slower than the one based on 1000 rectangles.

Personally, I would still go for the more general code. What do you think?

Should I modify type_ridge correspondingly? The changes are a still clear manageable but I added an internal helper function for drawing shaded segmented polygons.

@vincentarelbundock
Copy link
Collaborator Author

Cool. I don't have a view so I'll let Grant trace the path forward.

@zeileis
Copy link
Collaborator

zeileis commented Nov 17, 2024

Grant, what do you think about this? First complete the PR without color gradients and then make a new separate PR afterwards - or integrate my proposed changes into the existing PR?

If the latter, I would also export some of the density() arguments so that one can tweak kernel/bandwidth, in particular also supporting a common bandwidth for all groups.

@grantmcdermott
Copy link
Owner

Grant, what do you think about this? First complete the PR without color gradients and then make a new separate PR afterwards - or integrate my proposed changes into the existing PR?

Would the latter be easier? I don't mind and still have to integrate my own changes for this PR. (I also noticed some weird behaviour when y is a factor, which we'll have to fix.) So am happy to go with the path of least resistance.

P.S. Sorry for being slow on this. I've been solo parenting the last few days and also juggling an important deadline at work..

@grantmcdermott
Copy link
Owner

grantmcdermott commented Nov 17, 2024

Personally, I would still go for the more general code. What do you think?

Go for it. For posterity, I also played with some as.raster-based code last week, which I include as a proof of concept below. We obviously don't have to use this, but it does have the virtues of (a) being fast and (b) having built-in interpolation.

dens = density(Nile)
x = dens$x
y = dens$y

# How many y "bins"?
# (higher numbers mean a smoother looking density function)
nx = 1000L

# create a length(x) * ny matrix along the color gradient
m = matrix(
  rep(hcl.colors(length(x), "Viridis"), nx),
  ncol = length(x),
  byrow = TRUE
)

# Use an internal tinyplot function for rescaling/normalizing
y = tinyplot:::rescale_num(y, to = c(1, ny))
y2 = round(y)

# idea: "blank" out the matrix cells above the top edge of the distribution
# note that raster plots rowwise, so we have to do this a bit back-to-front
for (i in seq_along(y2)) m[1:(nrow(m)-y2[i]+1), i] = NA

plot(y, type = "n")
plot(as.raster(m), add = TRUE)
# lines(y2)
lines(y)

Created on 2024-11-17 with reprex v2.1.1

GM: Slight edits to make this example look and read better.

@zeileis
Copy link
Collaborator

zeileis commented Nov 18, 2024

Grant, I've pushed now my relatively slow version using polygon(). If you have the time to take a look that would be great. I have added various examples to the documentation that highlight the main new arguments gradient = FALSE, breaks = NULL.

Meanwhile I'm not convinced anymore that polygon() is the best option - at least not in general. It's main advantage is that I can exactly specify certain breaks on the x-axis. This will be fast and have no "fuzz" for a small number of breaks.

However, for a large number of breaks, your raster-based idea seems to be much faster. By definition this will break things down into a regular raster grid which might be somewhat less precise than the polygon(). However, for continuous gradients drawing is much faster. Do you have any thoughts on how to separate the case with "few" and "many" breaks?

I also adapted your code so that we rescale the raster rather rescaling the density:

## compute density
d <- density(Nile)

## set up raster matrix on x-grid and 1000 y-pixels 
n <- length(d$x) - 1
r <- matrix(1:n, ncol = n, nrow = 1000, byrow = TRUE)

## fill colors by column
r[] <- hcl.colors(n)[r]

## clip raster pixels above density line
ymax <- round(1000 * (d$y - min(d$y))/(max(d$y) - min(d$y)))
ix <- lapply(1:n, function(i) if(ymax[i] < 1000) cbind(setdiff(1:1000, 1001 - 0:ymax[i]), i) else NULL)
r[do.call("rbind", ix)] <- NA

## plot density and add raster gradient
plot(d)
rasterImage(as.raster(r), min(d$x), min(d$y), max(d$x), max(d$y))
lines(d)

@zeileis
Copy link
Collaborator

zeileis commented Nov 18, 2024

OK, I couldn't go to sleep before finishing the rasterImage()-based solution. This is now the new default but you can select via type_ridge(gradient = TRUE, raster = FALSE) vs. the default raster = TRUE. More later, need to get some sleep now...

@grantmcdermott
Copy link
Owner

grantmcdermott commented Nov 18, 2024 via email

@zeileis
Copy link
Collaborator

zeileis commented Nov 18, 2024

OK, some more updates. I tweaked the color gradient. By default, it uses rasterImage() now unless there are 20 intervals or fewer. In the latter case the segmented polygon() is used because it is more precise regarding the breaks and a little bit faster.

Example: On the left via raster, on the right via polygon.

tinyplot(Species ~ Sepal.Width, data = iris, type = type_ridge(gradient = TRUE))
tinyplot(Species ~ Sepal.Width, data = iris, type = type_ridge(gradient = TRUE, breaks = seq(2, 4.5, by = 0.5)))

tinyplot-ridge

If you want to play around with the two implementations, you can explicitly set raster = TRUE or `FALSE. My idea would be to get rid of that argument, though, when we are happy with the implementation. See also the FIXME remarks in the source code.

Additionally, I have implemented the option to use group-specific quantiles (at probs) rather than the same breaks across all groups. The two examples below highlight the center 50% of each density (between 25% and 75% quantile) and the entire distribution using a smooth gradient. The former uses the polygon code, the latter the raster code.

tinyplot(Species ~ Sepal.Width, data = iris, col = "white", type = type_ridge(
  gradient = hcl.colors(3, "Dark Mint")[c(2, 1, 2)], probs = c(0.25, 0.75)))
tinyplot(Species ~ Sepal.Width, data = iris, type = type_ridge(
  gradient = hcl.colors(250, "Dark Mint")[c(250:1, 1:250)], probs = 0:500/500))

tinyplot-ridge2

Finally, all density() arguments can be specified via bw, kernel, ... and tinyAxis() is used for the y-axis so that we can specify axes and yaxt. Some examples are on the manual page.

I think that this covers all features that I had in mind. Suggestions for improvement are very welcome. Also, let me know if I added something that you don't feel is so useful.

@vincentarelbundock
Copy link
Collaborator Author

nothing to add but just wanted to say that these last few plots look insanely cool

@grantmcdermott

This comment was marked as outdated.

@zeileis
Copy link
Collaborator

zeileis commented Nov 18, 2024

Thanks for the kind words! The examples are essentially stolen from the ggridges vignette plus a little tweaking...

Re: polygon with NAs inserted. Yes, that's what my code had been doing all along. Separate polygons would have been hopeless. But even the single segmented polygon becomes quite slow - and it can even create awkward artefacts if the segments are too narrow. Try

inyplot(Species ~ Sepal.Width, data = iris, type = type_ridge(gradient = hcl.colors(1000), raster = FALSE))

@grantmcdermott
Copy link
Owner

Re: polygon with NAs inserted. Yes, that's what my code had been doing all along. Separate polygons would have been hopeless. But even the single segmented polygon becomes quite slow - and it can even create awkward artefacts if the segments are too narrow.

Ah, sorry. I should have read your code to start with. Too many balls in the air at the moment...

@zeileis
Copy link
Collaborator

zeileis commented Nov 18, 2024

No worries, I know that feeling. And take your time with looking at the code - just do it when you have the capacity for it. Now that I have implemented the things that I wanted to implement, I will sleep well 😴

@grantmcdermott
Copy link
Owner

@zeileis I took a stab at improving the polygon logic and now think that it's at point we're we can safely default to it for everything instead of rasters.

The new polygon version (which is the now default) is slightly faster than the raster equivalent for gradients and doesn't leave any artifacts either.

I can post some examples here, but I think the best thing is for you to clone and test locally. Let me know if you agree. Thanks!

@zeileis
Copy link
Collaborator

zeileis commented Nov 21, 2024

Thank you so much, most of this looks great. But we need to be more careful about dropping polygon intervals that are empty. In this case we need to make sure that the intervals remain aligned with the color palette (see the left panel below).

Another small issue is that in the case without gradient but with breaks, we should keep the default light gray shading. Currently, this is dropped (see right panel below).

set.seed(0)
d <- data.frame(y = rep(1:4, each = 100), x = c(
  rnorm(100, mean = 5, sd = 2),
  rnorm(100, mean = 2, sd = 1),
  rnorm(100, mean = 8, sd = 1),
  rnorm(50, mean = 1, sd = 0.5), rnorm(50, mean = 9, sd = 0.5)
))
tinyplot(y ~ x, data = d, type = type_ridge(bw = 0.5, gradient = TRUE, breaks = -1:6 * 2))
tinyplot(y ~ x, data = d, type = type_ridge(bw = 0.5, breaks = -1:6 * 2))

tinyplot-breaks

@grantmcdermott
Copy link
Owner

grantmcdermott commented Nov 22, 2024

Thanks @zeileis. I believe that I've managed to plug those two cases now:

pkgload::load_all("~/Documents/Projects/tinyplot_vincent/")
#> ℹ Loading tinyplot

set.seed(0)
d <- data.frame(y = rep(1:4, each = 100), x = c(
  rnorm(100, mean = 5, sd = 2),
  rnorm(100, mean = 2, sd = 1),
  rnorm(100, mean = 8, sd = 1),
  rnorm(50, mean = 1, sd = 0.5), rnorm(50, mean = 9, sd = 0.5)
))
tinyplot(y ~ x, data = d, type = type_ridge(bw = 0.5, gradient = TRUE, breaks = -1:6 * 2))

tinyplot(y ~ x, data = d, type = type_ridge(bw = 0.5, breaks = -1:6 * 2))

Bonus: Replicating a fun example from the ggridges package/vignette. Note that this is a case where grid = TRUE gives misaligned horizontal lines (due do the y-axis scaling?). But we can deploy draw as a workaround. (Something to think about fixing. Maybe part of a dedicated tinytheme("ridges") theme that also does things like removing the y-axis label?)

data(lincoln_weather, package = "ggridges")

op = tpar(las = 1, mgp = c(3, 0, 0))
tinyplot(
  Month ~ `Max Temperature [F]`, data = lincoln_weather,
  type = type_ridge(gradient = "plasma", scale = 3),
  # grid = grid(nx = NA, ny = 12),
  draw = abline(h = 0:11, col = "lightgray"),
  axes = "l",
  main = "Temperatures in Lincoln NE",
  ylab = NA
)

tpar(op)

Created on 2024-11-22 with reprex v2.1.1

@vincentarelbundock
Copy link
Collaborator Author

One mistake I made (and corrected) in type_abline() is to include arguments like col and lty in the type_*() function itself, rather than using the top level tinyplot() values.

I don't know if this is a concern here, but I'm just flagging this in case gradient could be a logical flag and we could rely on the palette top-level settings.

@grantmcdermott
Copy link
Owner

grantmcdermott commented Nov 22, 2024

Still to do / fix:

  • by isn't working consistently. E.g. tinyplot(Month ~ Temp | Late, data = airq, type = "ridge").

    • Special case: Support by == x. E.g. tinyplot(Species ~ Sepal.Width | Sepal.Width, data = iris, type = "ridge"). Potential simple solution is to automatically trigger type_ridges(gradient = TRUE)?
    • Special case: Fix by == y. E.g. tinyplot(Species ~ Sepal.Width | Species, data = iris, type = "ridge", fill = "by") kind of works, but the drawing ordering of ridges is reversed and the y-axis is wrong.
  • Shouldn't faceting with with frame = FALSE turn off the duplicated axes?

  • Fix grid alignment. Maybe as part of a dedicated tinytheme("ridge") theme?

  • Add tests

@grantmcdermott
Copy link
Owner

I just realised another issue: Back when we first implemented gradient legends, we agreed that low values would correspond to light colors and high values to dark colours. See #122 (comment)

What is high and what is low? This depends on the context. The folklore is that on a white background the dark colors should stand out as extreme - while on dark/black background the light colors should represent the extreme. As the factory-fresh default is a white background, dark colors should be extreme. And usually extreme means large. So our default should be a reversed hcl.colors palette.

However, we're doing the opposite here for gradient = TRUE: low x values are dark and high x values are light.

Do we just want to live with this inconsistency, or reverse the palette direction?

@zeileis
Copy link
Collaborator

zeileis commented Nov 23, 2024

This all looks great!! Some comments/thoughts:

  • Gradient palette specification: When we merge this with themes, then gradient = TRUE can imply using "palette.sequential". But I'm not sure whether we should use tinyplot(..., palette = ...) for this. My understanding is that palette is an alternative specification of col which just specifies the border color.
  • Default gradient palette: Anticipating the merge with the themes, we should probably already switch from "Viridis" to "ag_Sunset" as the default gradient palette.
  • Default color order: I agree that it would be more consistent to use hcl.color(..., rev = TRUE) by default so that dark typically corresponds to high values.
  • By handling: I agree that y ~ x | x could be a nice alias for y ~ x with gradient = TRUE. Similarly, y ~ x | y could give separate fill (and/or line?) colors for each ridge line. But I'm not sure what to do with y ~ x | z then.

@grantmcdermott
Copy link
Owner

grantmcdermott commented Nov 23, 2024

  • But I'm not sure what to do with y ~ x | z then.

Just quickly on this topic: I have some mock-up code that yields the below result. What we should do is pick one of these cases as the default for y ~ x | z and then try to update the code to give us that automatically (i.e., without have to manually specify fill etc.).

  1. Border color varies by groups. Fill remains grey for all.
tinyplot(Month ~ Temp | Late, data = airq, type = "ridge")

  1. Border color varies by groups, and so does fill (with no transparency).
tinyplot(Month ~ Temp | Late, data = airq, type = type_ridge(), fill = "by")

  1. Border color varies by groups, and so does fill but with alpha transparency.
tinyplot(Month ~ Temp | Late, data = airq, type = "ridge", fill = 0.7)

  1. Border color is fixed (here "white" but would default to par("col"), whilst fill varies by groups.
tinyplot(Month ~ Temp | Late, data = airq, type = type_ridge(), fill = 1, col = "white")

My own order of preference is probably 3, 4, 1, 2. But interested to hear what you both think.

@zeileis
Copy link
Collaborator

zeileis commented Nov 23, 2024

I would recommend a slightly different variation of 3. Maybe you can try that with your code? The idea is to borrow the strategy for lightening colors as we do in the spineplots from #233 (comment)

  • For each by color apply seq_palette(by_col[i], n = 2).
  • Use the first color (original dark color) for the border.
  • Use the second color (light version) as the fill color.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants