Support 2D density visualizations #6043

jheer · 2020-03-09T14:38:39Z

Hi all,

I would really like to expand Vega-Lite with support for 2D density representations, leveraging Vega's kde2D, isocontour, and heatmap transforms. Adding these would make Vega-Lite largely "feature complete" for my current visualization teaching needs, and make it more comparable to other popular plotting libraries. However, it is not clear how to best do this.

I see two general approaches:

Use existing mark types only (geoshape for contours, image for heatmaps) and leverage new transforms (density2d, contour, and heatmap) to generate the appropriate input data to the mark. This is the same approach we've followed so far when adding transforms such as density and regression. However, Vega's kde2d and heatmap transforms both interleave data-space and encoding-space concerns in ways that make this difficult.
For example, the kde2d transform takes x and y field accessors that must return pixel-space values. Typically this is done using an expression that maps an underlying data field through a defined scale transform. So, for this to work in Vega-Lite we need a way to generate / access appropriate scale transforms, introducing a cross-cutting concern. Moreover, we'd like Vega-Lite to also use those scales to add appropriate axes, so we'd also have error-prone redundancy in the specification if we need to provide x/y fields in both the transform and encoding. (This is further complicated by the fact that geoshape doesn't take x and y encodings anyway...)
The heatmap transform has a separate issue, which is that it accepts expressions for determining pixel color and opacity. While these can be stand-alone, most times we actually want to use a defined color scale (and corresponding legend), again mixing transform and encoding concerns.
An alternative is to create new mark types, such as contour and heatmap. Ideally these marks could accept either pre-calculated raster grid data (from which contours / heatmaps can be directly generated) or point data (to which the kde2d transform would be applied). The Vega-Lite compiler would need to generate appropriate transforms and encodings. I imagine transform parameters could be passed as mark properties.
There are still some limitations to this approach: in Vega we can separately generate heatmap images and then use them as input to image marks, such that we could in theory do things like create an entire scatter plot where each point is a small density heatmap. However, I don't think this extra expressiveness is critical for Vega-Lite.
The biggest hurdle to this approach is that I have no idea how to implement it in the current VL compiler. So I can't estimate the feasibility or difficulty. That said, I would be happy to collaborate with someone more knowledgable.

While neither solution is completely satisfactory, I'm leaning towards the approach of adding new mark types. I think the interleaving of data-space and encoding-space operations in VL transforms breaks too much, in terms of both output and user mental models.

Any thoughts or feedback, particularly relative to the feasibility of option 2?

The text was updated successfully, but these errors were encountered:

jheer · 2020-03-09T14:41:11Z

To help seed thinking here are some hypothetical VL-API specs.

I also recommend studying these Vega examples:

// CONTOUR PLOTS

// contour plot from input raster grid data
vl.data(rasterData)
  .markContour({
    raster: true || field,
    thresholds, levels, nice, resolve, zero, smooth, scale, translate, // isocontour params
  })
  .project(...projection...)
  .encode(
    vl.color(),   // standard encoding
    vl.opacity(), // standard encoding
    vl.column(),  // standard facet
    vl.row(),     // standard facet
  )

// contour plot from input point data
vl.data(pointData)
  .markContour({
    raster: false || null,
    cellSize, bandwidth, counts, // kde2d params
    thresholds, levels, nice, resolve, zero, smooth, scale, translate, // isocontour params
  })
  .project(...projection...)
  .encode(
    vl.x(),         // -> kde2d x
    vl.y(),         // -> kde2d y
    vl.longitude(), // alternative, output goes to kde2d (x,y)
    vl.latitude(),  // alternative, output goes to kde2d (x,y)
    vl.weight(),    // -> kde2d weight
    vl.color(),     // -> kde2d groupby, standard encoding
    vl.opacity(),   // -> kde2d groupby, standard encoding
    vl.column(),    // -> kde2d groupby, standard facet
    vl.row(),       // -> kde2d groupby, standard facet
  )

// HEATMAP PLOTS

// heatmap from input raster grid data
vl.data(rasterData)
  .markHeatmap({
    raster: true || field,
    resolve
  })
  .encode(
    vl.color(),   // heatmap color
    vl.opacity(), // heatmap opacity
    vl.column(),  // standard facet
    vl.row(),     // standard facet
  );

// heatmap from input point data
vl.data(pointData)
  .markHeatmap({
    raster: false || null,
    cellSize, bandwidth, counts, // kde2d params
    resolve,                     // heatmap param
  })
  .encode(
    vl.x(),         // -> kde2d x
    vl.y(),         // -> kde2d y
    vl.longitude(), // alternative, output goes to kde2d (x,y)
    vl.latitude(),  // alternative, output goes to kde2d (x,y)
    vl.weight(),    // -> kde2d weight
    vl.color(),     // -> kde2d groupby, heatmap color
    vl.opacity(),   // -> kde2d groupby, heatmap opacity
    vl.column(),    // -> kde2d groupby, standard facet
    vl.row(),       // -> kde2d groupby, standard facet
  );

kanitw · 2020-03-14T22:24:13Z

Hi Jeff,

Thank for the proposal.

I think I agree we should should pursue option 2 since post-encoding transform isn't a primitive that we provide in VL. It's also likely produces a much more concise specification.

I think it should be doable (I don't see why it wouldn't), but would might a a few days to investigate / prototype. Do you have a specific time frame you'd like to have this by?
(I'm thinking after VIS deadline would be a good time to investigate more.)

The parameters above makes sense for the most parts, but I still have some comments / questions:

1) Raster as mark property or encoding channel?

The raster property currently accepts a field name. It's worth noting that we never accepts a field in a mark property before. So this diverges from a former pattern that mark property only accepts value directly, but encoding channel can accepts a field.

For consistency, it might be better to support raster field via an encoding channel?

2) When is kde triggered? (raster = false)

If I understand correctly, it seems like raster will be automatically true (no kde2d) or false (apply kde2d), based on whether the x/y encoding channel (input for kde2d x/y) are specified?

(I think this makes sense, but it's not explicitly clear from the proposal above.)

3) Heatmap Color Expr?

From the proposal above, it's still a bit unclear how the color encoding for heatmap:

vl.color(),     // -> kde2d groupby, heatmap color

could generate an appropriate expression like

"color": {"expr": "scale('density', datum.$value / datum.$max)"},

in https://vega.github.io/vega/examples/density-heatmaps/

To brainstorm, I could see we add expr support to the color channel for just heatmap like this:

encoding: {
  ...,
  color: {
    expr: "datum.$value / datum.$max"
    scale: ... // optional scale customization
    legend: ... // optional legend customization
  }
}

Note that we currently have experimental signal support for color as well, but I think this is a bit different since datum in the expr isn't really the origin datum (data point) on the data source that we directly seed in the the Vega-Lite marks anymore.

4. Scattered Heat Map Case

There are still some limitations to this approach: in Vega we can separately generate heatmap images and then use them as input to image marks, such that we could in theory do things like create an entire scatter plot where each point is a small density heatmap. However, I don't think this extra expressiveness is critical for Vega-Lite.

If we want to support this, I could see we support special imageX/Y channels (or something similar) for heatmap marks. We can then map imageX / imageY to the final image mark's x/y encoding channels.

We probably shouldn't do this in the initial implementation, but I'm just mentioning here that it's not totally impossible.

LeCyberDucky · 2020-06-07T18:28:18Z

Adding this would be great. I'm specifically looking for countour- and 2D density plots, because I would like to create something like this:

I'm using Altair, and so far, the only thing that I have found in this direction is this hack to create a 2D density plot vega/altair#2047 which is not ideal.

I think I'd prefer option 2. Adding thouse mark types sounds good.

metasoarous · 2020-08-12T19:55:46Z

Just wanted to chime in that I'm super excited to see this issue getting some attention, and would find this feature extremely useful. Thanks so much for thinking about it!

FWIW, option 2 sounds ideal from a user standpoint. Sounds like it makes things simpler on the implementation side as well. IIUC, option 1 might in theory offer more flexibility/leverage in certain cases? Seems not critical, however, and may be fine to say "if you need more than this, see Vega."

Should this issue be considered as superseding #1919?

Again, thanks for all of you work on this!

domoritz · 2020-08-12T19:59:23Z

I think contour plots are a special case of a density visualization. I consider this issue more about mapping density to color/count rather than contour lines. Let's leave #1919 open.

metasoarous · 2020-08-12T22:35:10Z

@domoritz Oh; I see. Thanks for clarification!

mattijn · 2023-03-17T15:00:25Z

I like to add another reference here. I noticed observable plot has introduced marks for contour and raster. That might also a route to explore.

But simultaneously, if the route of transforms is chosen than it will also enable to plugin server-side accelerators (like vegafusion) in order to not push all the data in the json specification. Or would that also be possible when introducing new marks?

jonathanzong · 2023-05-29T19:43:43Z

fyi i've made a quick and dirty VL contour plot spec and prototype compiler, motivated by getting lightweight contour plots into olli.

i can't really commit to thinking this all the way through at the moment, but sharing in case it's useful as we figure out use cases: https://github.com/jonathanzong/vl-contour

example (based on vega example):

{
  "description": "A contour plot of the Maungawhau volcano in New Zealand.",
  "data": {"url": "data/volcano.json"},
  "mark": "contour",
  "encoding": {
    "stroke": {"value": "#ccc"},
    "color": {"scale": {"scheme": "blueorange"}},
    "smooth": {"value": true},
    "thresholds": {"value": {"expr": "sequence(90, 195, 5)"}}
  }
}

jheer added Feature Request 🙋‍♀️ Area - Visual Encoding Area - Data & Transform labels Mar 9, 2020

jheer changed the title ~~Discuss: Add support 2D density visualizations~~ Discuss: Add support for 2D density visualizations Mar 12, 2020

kanitw added the P2 Important Issues that should be fixed soon label Mar 15, 2020

saulshanabrook mentioned this issue Mar 24, 2020

Vega, Datashader, and Holoviews Collaboration Quansight/omnisci#67

Open

jakevdp mentioned this issue Mar 28, 2020

Heatmap smoothing/interpolation in Altair vega/altair#2047

Open

domoritz mentioned this issue May 10, 2020

Support weights in kde transform vega/vega#2601

Open

kanitw added Enhancement 🎉 and removed Feature Request 🙋‍♀️ labels Jun 16, 2020

joelostblom mentioned this issue Jun 7, 2021

Altair seems to be missing mark_* method for contour vega/altair#1784

Closed

mattijn mentioned this issue Mar 28, 2022

Improve documentation on geographical visualizations vega/altair#2580

Closed

mattijn mentioned this issue Mar 7, 2023

xarray support vega/altair#891

Open

mattijn mentioned this issue Jun 4, 2023

Support array interchange protocols vega/altair#3077

Open

joelostblom added this to Roadmap Apr 12, 2024

joelostblom moved this to Statistical visualizations in Roadmap Apr 12, 2024

joelostblom changed the title ~~Discuss: Add support for 2D density visualizations~~ Support 2D density visualizations Apr 12, 2024

mattijn mentioned this issue Jul 12, 2024

introduce an array mark utilizing the heatmap transform for array data #9389

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support 2D density visualizations #6043

Support 2D density visualizations #6043

jheer commented Mar 9, 2020

jheer commented Mar 9, 2020 •

edited

Loading

kanitw commented Mar 14, 2020

LeCyberDucky commented Jun 7, 2020

metasoarous commented Aug 12, 2020

domoritz commented Aug 12, 2020

metasoarous commented Aug 12, 2020

mattijn commented Mar 17, 2023 •

edited

Loading

jonathanzong commented May 29, 2023

Support 2D density visualizations #6043

Support 2D density visualizations #6043

Comments

jheer commented Mar 9, 2020

jheer commented Mar 9, 2020 • edited Loading

kanitw commented Mar 14, 2020

1) Raster as mark property or encoding channel?

2) When is kde triggered? (raster = false)

3) Heatmap Color Expr?

4. Scattered Heat Map Case

LeCyberDucky commented Jun 7, 2020

metasoarous commented Aug 12, 2020

domoritz commented Aug 12, 2020

metasoarous commented Aug 12, 2020

mattijn commented Mar 17, 2023 • edited Loading

jonathanzong commented May 29, 2023

jheer commented Mar 9, 2020 •

edited

Loading

mattijn commented Mar 17, 2023 •

edited

Loading