Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce an array mark utilizing the heatmap transform for array data #9389

Open
mattijn opened this issue Jul 12, 2024 · 2 comments
Open

Comments

@mattijn
Copy link
Contributor

mattijn commented Jul 12, 2024

This feature request proposes the addition of a new array mark to Vega-Lite.

This mark aims to improve support for the visualization of various types of 2D data, including heatmaps, image data, and other matrix-based representations, with built-in support for color scales, axis labels, and faceting. I see this is an initial step towards #6043, as this focus on just a single transform in Vega, but many issues discussed in that issue also apply to this issue.

The following variants are an exploration on how the heatmap transform within Vega behaves, and how data can be prepared for ingestion within the specification. This is an initial attempt that can hopefully serve as a starting point to explore this field a bit more with the hope that someone is brave enough to turn this into an attempt for a PR.

variants explored so far

  1. heatmap transform single array only
  2. heatmap transform with color scale
  3. heatmap transform with color scale and axis
  4. heatmap transform double array faceted with color scale and axis
  5. heatmap transform single array with non-zero x and y scale
  6. heatmap transform double array with non-zero x and y scale

Note: in the specs below, I've reduced the length of the grid values. In the accompanying Vega-Editor links all values of the grids are included.

heatmap transform values only

A basic implementation using numpy to generate a heatmap from a single array, displaying it with Vega. The image is rendered with opacity levels only.

import numpy as np
import matplotlib.pyplot as plt
from skimage import data
from skimage.transform import rescale
import pyperclip

array = data.camera()

array_small = rescale(array, 0.245, anti_aliasing=False)
array_round = (array_small * 255).astype(np.uint8)

plt.imshow(array_round, cmap='gray')
print('shape', array_round.shape)

array_as_flatlist = array_round.flatten(order='C').tolist()  # row-major

print('head', array_as_flatlist[0:5])
pyperclip.copy(str(array_as_flatlist))

image

We can make it work using the heatmap transform in Vega, using the following specification (Vega-Editor):

{
  "$schema": "https://vega.github.io/schema/vega/v5.json",  
  "data": [
    {
      "name": "GRID_ARRAY",
      "values": [
        {
          "width": 125,
          "height": 125,
          "values": [199, 200, 200, 198, 198, 118, 135, 161, 161, 140]
        }
      ]
    },
    {
      "name": "GRID_IMAGE",
      "source": "GRID_ARRAY",
      "transform": [{"type": "heatmap"}]
    }
  ],
  "marks": [
    {
      "type": "image",
      "from": {"data": "GRID_IMAGE"},
      "encode": {
        "update": {
          "x": {"value": 0},
          "y": {"value": 0},
          "image": {"field": "image"},
          "width": {"signal": "datum.width"},
          "height": {"signal": "datum.height"}
        }
      }
    }
  ]
}

The result looks like this:

image

It seems this is the image drawn with opacity levels only.

heatmap transform with color scale

Adding a color scale to the heatmap to enhance visual differentiation of values. This example replicates a grayscale image using Vega's color scale functionality.

Let's add a color scale (Vega-Editor):

{
  "$schema": "https://vega.github.io/schema/vega/v5.json",  
  "data": [
    {
      "name": "GRID_ARRAY",
      "values": [
        {
          "width": 125,
          "height": 125,
          "values": [199, 200, 200, 198, 198, 130, 118, 135, 161, 161, 140]
        }
      ]
    },
    {
      "name": "GRID_IMAGE",
      "source": "GRID_ARRAY",
      "transform": [
        {
          "type": "heatmap",
          "color": {"expr": "scale('COLOR_SCALE', datum.$value / datum.$max)"},
          "opacity": 1
        }
      ]
    }
  ],
  "scales": [
    {
      "name": "COLOR_SCALE",
      "type": "linear",
      "zero": true,
      "domain": [0, 1],
      "range": {"scheme": "viridis"}
    }
  ],
  "marks": [
    {
      "type": "image",
      "from": {"data": "GRID_IMAGE"},
      "encode": {
        "update": {
          "x": {"value": 0},
          "y": {"value": 0},
          "image": {"field": "image"},
          "width": {"signal": "datum.width"},
          "height": {"signal": "datum.height"}
        }
      }
    }
  ]
}

The result will look like this:

image

Using this approach, I also can reproduce the grayscale image like in python using plt.imshow().

By modifying the color scale as such (Vega-Editor):

{
  "name": "COLOR_SCALE",
  "type": "linear",
  "zero": true,
  "domain": [0, 1],
  "range": {"scheme": "greys"},
  "reverse": true
}

image

heatmap transform with color scale and axis

Enhancing the previous example by including axis labels, providing context to the grid values. This facilitates interpretation of the data.

Next step is to add axis to the image.
The Vega specification now looks as such (Vega-Editor):

{
  "$schema": "https://vega.github.io/schema/vega/v5.json",
  "width": 250,
  "height": 250,
  "data": [
    {
      "name": "GRID_ARRAY",
      "values": [
        {
          "width": 125,
          "height": 125,
          "values": [199, 200, 200, 198, 198, 118, 135, 161, 161, 140]
        }
      ]
    },
    {
      "name": "GRID_IMAGE",
      "source": "GRID_ARRAY",
      "transform": [
        {
          "type": "heatmap",
          "color": {"expr": "scale('COLOR_SCALE', datum.$value / datum.$max)"},
          "opacity": 1
        }
      ]
    }
  ],
  "scales": [
    {
      "name": "COLOR_SCALE",
      "type": "linear",
      "zero": true,
      "domain": [0, 1],
      "range": {"scheme": "viridis"}
    },
    {
      "name": "X_SCALE",
      "type": "linear",
      "zero": true,
      "domain": [0, 125],
      "range": "width"
    },
    {
      "name": "Y_SCALE",
      "type": "linear",
      "zero": true,
      "domain": [0, 125],
      "range": "height"
    }
  ],
  "axes": [
    {
      "scale": "X_SCALE",
      "domain": false,
      "orient": "bottom",
      "tickCount": 5,
      "labelFlush": true
    },
    {
      "scale": "Y_SCALE",
      "domain": false,
      "orient": "left",
      "titlePadding": 5,
      "offset": 2
    }
  ],
  "marks": [
    {
      "type": "image",
      "from": {"data": "GRID_IMAGE"},
      "encode": {
        "update": {
          "x": {"value": 0},
          "y": {"value": 0},
          "image": {"field": "image"},
          "width": {"signal": "width"},
          "height": {"signal": "height"}
        }
      }
    }
  ]
}

image

So far so good.

heatmap transform double array faceted with color scale and axis

Faceting multiple grids within a single visualization. This example demonstrates handling of two separate arrays with independent color scales and axis labels.

Are we able to facet grids, if we have for example two grids as input?

I've adapted my python code to prepare the data arrays:

import numpy as np
from skimage import data
from skimage import color
from skimage.transform import rescale
import pyperclip
import json

def array2vega(array):
    grid = {
        'height': array.shape[0],
        'width': array.shape[1],
        'values': array.flatten(order='C').tolist()  # row-major
    }
    return grid

array = data.camera()
array_small = rescale(array, 0.245, anti_aliasing=False)
array_round = np.round(array_small, 2)

grid0 = array2vega(array_round)
grid1 = array2vega(1 - array_round)
arrays = [{'grid':grid0, 'variant': 'A'}, {'grid':grid1, 'variant': 'B'}]

pyperclip.copy(json.dumps(arrays))

And modified the Vega specification. This now looks as such (Vega-Editor):

{
  "$schema": "https://vega.github.io/schema/vega/v5.json",
  "width": 250,
  "height": 250,
  "data": [
    {
      "name": "GRID_ARRAY",
      "values": [{"grid": {"width": 125, "height": 125, "values": [0.78, 0.78, 0.78, 0.78, 0.78, 0.46, 0.53, 0.63, 0.63, 0.55]}, "variant": "A"}, {"grid": {"width": 125, "height": 125, "values": [0.21999999999999997, 0.21999999999999997, 0.21999999999999997, 0.21999999999999997, 0.21999999999999997, 0.54, 0.47, 0.37, 0.37, 0.44999999999999996]}, "variant": "B"}]
    },
    {
      "name": "GRID_IMAGE",
      "source": "GRID_ARRAY",
      "transform": [
        {
          "type": "heatmap",
          "field": "grid",
          "color": {"expr": "scale('COLOR_SCALE', datum.$value / datum.$max)"},
          "opacity": 1
        }
      ]
    }
  ],
  "scales": [
    {
      "name": "COLOR_SCALE",
      "type": "linear",
      "zero": true,
      "domain": [0, 1],
      "range": {"scheme": "viridis"}
    },
    {
      "name": "X_SCALE",
      "type": "linear",
      "zero": true,
      "domain": [0, 125],
      "range": "width"
    },
    {
      "name": "Y_SCALE",
      "type": "linear",
      "zero": true,
      "domain": [0, 125],
      "range": "height"
    }
  ],
  "axes": [
    {
      "scale": "Y_SCALE",
      "domain": false,
      "orient": "left",
      "offset": 2
    }
  ],
  "layout": {
    "columns": 2
  },
  "marks": [
    {
      "type": "group",
      "from": {
        "facet": {
          "name": "facet",
          "data": "GRID_IMAGE",
          "groupby": "variant"
        }
      },
      "title": {
        "text": {"signal": "parent.variant"}
      },
      "encode": {
        "update": {
          "width": {"signal": "width"},
          "height": {"signal": "height"}
        }
      },
      "axes": [
        {
          "scale": "X_SCALE",
          "domain": false,          
          "orient": "bottom"
        }
      ],
      "marks": [
        {
          "type": "image",
          "from": {"data": "facet"},
          "encode": {
            "update": {
              "x": {"value": 0},
              "y": {"value": 0},
              "image": {"field": "image"},
              "width": {"signal": "width"},
              "height": {"signal": "height"}
            }
          }
        }
      ]
    }
  ]
}

image

Not bad!

heatmap transform single array with non-zero x and y scale

Handling grids with custom scales, such as geographical data. This example showcases the challenges of aligning non-zero axes with grid dimensions and values.

This variant is still a bit difficult. The array is in unit degrees and goes on the x-axis from -180 to 180 longitude and on the y-axis from -81 to 87 latitude. The step-size is 1 degrees in both directions.

See Vega-Editor:

{
  "$schema": "https://vega.github.io/schema/vega/v5.json",
  "width": 360,
  "height": 168,
  "data": [
    {
      "name": "GRID_ARRAY",
      "values": [{
        "year":2016,
        "grid":{
          "x1_":-180,
          "x2_":180,
          "y1_":-81,
          "y2_":87,
          "height":168,
          "width":360,
          "values":[392,392,392,392,393,166,163,165,168,169]
        }
      }]
    },
    {
      "name": "GRID_IMAGE",
      "source": "GRID_ARRAY",
      "transform": [
        {
          "type": "heatmap",
          "field": "grid",
          "color": {"expr": "scale('COLOR_SCALE', datum.$value / datum.$max)"},
          "opacity": 1
        }
      ]
    }
  ],
  "scales": [
    {
      "name": "COLOR_SCALE",
      "type": "linear",
      "zero": true,
      "domain": [0, 1],
      "range": {"scheme": "viridis"}
    },
    {
      "name": "X_SCALE",
      "type": "linear",
      "zero": false,
      "domain": [-180, 180],
      "range": "width"
    },
    {
      "name": "Y_SCALE",
      "type": "linear",
      "zero": false,
      "domain": [-81, 87],
      "range": "height"
    }
  ],
  "axes": [
    {
      "scale": "X_SCALE",
      "domain": false,
      "orient": "bottom"
    },
    {
      "scale": "Y_SCALE",
      "domain": false,
      "orient": "left",
      "titlePadding": 5,
      "offset": 2
    }
  ],
  "marks": [
    {
      "type": "image",
      "from": {"data": "GRID_IMAGE"},
      "encode": {
        "update": {
          "x": {"value": 0},
          "y": {"value": 0},
          "image": {"field": "image"},
          "width": {"signal": "datum.grid.width"},
          "height": {"signal": "datum.grid.height"}
        }
      }
    }
  ]
}

This results in:

image

Basically, for the grid only use the height and width to allocate the canvas size and iterate over the 1D array to colorize each pixel.
For the X_SCALE and Y_SCALE we use the information of x1/x2 and y1/y2 (still manually). We use the "datum.grid.width" and "datum.grid.height" as signal for within the image mark encoding. Since the scales also need a width and height, the global width/height are currently still set to the same witdth and height of the grid.

But if I change the grid input object to:

"x1":-180,
"x2":180,
"y1":-81,
"y2":87,
"height":168,
"width":360,

(removing the appended _ from x1/x2/y1/y2)
The result is this:

image

I've the feeling all negative values of our scales malfunction in the iterator within heatmap.js (here). But then it seems the drawn y-axis is reversed for the canvas iterator. If I add a "reverse":true to the scale Y_SCALE then it becomes more clear that only positive values are colorized in the canvas:

image

But then the latitude values on the y-axis does not match the input array.

heatmap transform double array with non-zero x and y scale

A more complex scenario with faceted charts using custom scales. This variant highlights the issues with global versus array-specific dimensions and independent color scales.

Lets make it a bit more complex. A facetted chart with non-zero x and y scales. Lets start with data preparation in python:

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import pyperclip

import urllib.request
import json
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import pyperclip

# define data
source = 'https://raw.githubusercontent.com/vega/vega-datasets/main/data/annual-precip.json'
with urllib.request.urlopen(source) as url:
    data = json.load(url)
values = data['values']
width = data['width']  # 360
height = data['height']  # 168
extent = [-180, 180, -81, 87]  # xmin, xmax, ymin, ymax

# prepare array and plot
array = np.array(values).reshape(height, width)
plt.imshow(array, extent=extent)

image

def array2vega(array, extent):
    grid = {
        'extent': extent,
        'height': array.shape[0],
        'width': array.shape[1],
        'values': array.flatten(order='C').tolist()  # row-major
    }
    return grid

grid0 = array2vega(array, extent)
grid1 = array2vega(1 - array, extent)
arrays = [{'grid': grid0, 'variant': 'A'}, {'grid': grid1, 'variant': 'B'}]
df = pd.DataFrame.from_dict(arrays)

# copy and display
pyperclip.copy(df.to_json(orient='records'))
df

image

When prepararing a vega chart for this as such, See Vega-Editor:

{
  "$schema": "https://vega.github.io/schema/vega/v5.json",
  "width": 250,
  "height": 250,
  "data": [
    {
      "name": "GRID_ARRAY",
      "values": [
        {
          "grid": {
            "extent": [-180, 180, -81, 87],
            "height": 168,
            "width": 360,
            "values": [392, 392, 392, 169, 187, 196]
          },
          "variant": "A"
        },
        {
          "grid": {
            "extent": [-180, 180, -81, 87],
            "height": 168,
            "width": 360,
            "values": [-391, -391, -391, -164, -167, -168]
          },
          "variant": "B"
        }
      ]
    },
    {
      "name": "GRID_IMAGE",
      "source": "GRID_ARRAY",
      "transform": [
        {
          "type": "heatmap",
          "field": "grid",
          "color": {"expr": "scale('COLOR_SCALE', datum.$value / datum.$max)"},
          "opacity": 1
        }
      ]
    }
  ],
  "scales": [
    {
      "name": "COLOR_SCALE",
      "type": "linear",
      "zero": true,
      "domain": [0, 1],
      "range": {"scheme": "viridis"}
    },
    {
      "name": "X_SCALE",
      "type": "linear",
      "zero": true,
      "domain": [-180, 180],
      "range": "width"
    },
    {
      "name": "Y_SCALE",
      "type": "linear",
      "zero": true,
      "domain": [-81, 87],
      "range": "height"
    }
  ],
  "axes": [
    {"scale": "Y_SCALE", "domain": false, "orient": "left", "offset": 2}
  ],
  "layout": {"columns": 2},
  "marks": [
    {
      "type": "group",
      "from": {
        "facet": {"name": "facet", "data": "GRID_IMAGE", "groupby": "variant"}
      },
      "title": {"text": {"signal": "parent.variant"}},
      "encode": {
        "update": {"width": {"signal": "width"}, "height": {"signal": "height"}}
      },
      "axes": [{"scale": "X_SCALE", "domain": false, "orient": "bottom"}],
      "marks": [
        {
          "type": "image",
          "from": {"data": "facet"},
          "encode": {
            "update": {
              "x": {"value": 0},
              "y": {"value": 0},
              "image": {"field": "image"},
              "width": {"signal": "datum.grid.width"},
              "height": {"signal": "datum.grid.height"}
            }
          }
        }
      ]
    }
  ]
}

image

Two issues become clear from this:

  • We see the interference of a global-defined width and height and the array-defined grid.width and grid.height.
  • Another issue that becomes apparent is that currently the color scale is not applied independent.

Proposed Specification

This is already more discussed within #6043, but something as such should be sufficient for many things (notice there is no need for an x and y encoding channel, as the 2D array data comes prepared).

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {
    "values": {
          "grid": {
            "extent": [-180, 180, -81, 87],
            "height": 168,
            "width": 360,
            "values": [392, 392, 392, 169, 187, 196]
          },
          "variant": "A"
        },
  },
  "mark": "array",
  "encoding": {
    "color": {"scale": {"scheme": "viridis"}},
    "row": {},
    "column": {}
  }
}

With a new array mark it is hoped we can simplify syntax to specify array data, simultaneously still support handling of color schemes, with options for customization including integration with Vega-Lite's axis and scale system, supporting both zero and non-zero scales.
More over it is shown that faceting of multiple arrays is a real possibility even though maintaining independent scales and axes is something to be explored more deeply.

Performance optimization has not been part of this exploration, but it is to be noted that it would be great if the result of a heatmap transform, a canvas image, can be included within the JSON specification, meaning that the application of the heatmap transform can be done server-side. Currently it is unclear if this is accepted within the JSON standard.

This issue is one of the results of a spontaneous attempt to bring vega/altair#891 further. Thanks for brainstorming on this topic @kanitw, @timtreis, @melonora and @joelostblom!

@jonmmease
Copy link
Contributor

Thanks @mattijn, I'm still reading through in detail, but I was taken aback by the expression scale('COLOR_SCALE', datum.$value / datum.$max). I've never seen this datum.$foo syntax before. What does it mean, and where did you learn about it?

@mattijn
Copy link
Contributor Author

mattijn commented Jul 13, 2024

If i reverse engineer my mind, I think I found it in the heatmap transform docs: https://vega.github.io/vega/docs/transforms/heatmap/

A color value or expression for setting each individual pixel’s color. If an expression is provided, it will be invoked with an input datum that includes $x, $y, $value, and $max fields for the grid. If unspecified, the color defaults to gray ("#888").

So it is basically a normalizer for all values in the grid, where datum.$value represent each single value, and datum.$max the maximum in the grid. By normalizing these values, I can combine it with the "domain": [0, 1] in the color scale.

I don't think this approach will hold if I have negative values in my grid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Gridded data support
Development

No branches or pull requests

2 participants