Design discussion for composite charts #682

monfera · 2020-05-25T18:24:44Z

📦 Meta issue for Small Multiples project

Composite charts

Also known as compound charts

What's a composite chart?

A single chart is a familiar thing, eg. a pie chart or a bar chart(*).

A composite chart has multiple constituents, most known for distinct, non-overlapping rectangular areas, typically with their own axes. They look like several charts laid out in some tiling, grid or freeform arrangement, but layering (analogy: z-axis) is possible as well.

The planar composition can be homogeneous, eg. a grid of identically sized rectangles, and a lot of commonality in terms of the axes, their data and the chart type (eg. all line charts), known as small multiples, faceted, lattice, trellis or matrix chart. Some of the name proliferation is historical baggage.

Alternatively, the planar composition may be heterogeneous(**), ie. its constituents don't usually share a common width/height, chart type etc., ie. it's not based on repetition but on partly shared projection. A common example is the marginal scatterplot:

Side notes:

(*) Compositing isn't constrained to non-overlapping rectangles. For example, multilayer plots are also composites. Their data (or at least a common root data) may be shared, as well as their Cartesian projections. In an EDA tool, there might be a single toggle that switches between a multilayer view such as this one into a small multiples view, scatterplot on the left, contour plot on the right.

(**) We'll see that while it's possible to come up with special cases, and it's useful to do (we'll do it here too), there's a continuum or degree of sharing projections, and any boundary is somewhat arbitrary and artificial. You can change projections (incl. screen projections that hinge on width/height) one step at a time to morph a perfectly regular trellis into a quite heterogeneous chart set, without crossing into fundamental boundaries. In particular, there's no fundamental difference between a grouped (clustered) bar chart and small multiples of non-grouped bar charts for the same series. Generally, even the planar compositing vs layering is not a significant boundary unless in the context of particulars about their readability etc. Even data processing and rendering wise, there's a ton of sharing between the two.

Typical ways for compositing

There are no sharp boundaries for compositing to form ontologies or "chart typology" because the pertinent abstraction from which we're constructing is the projection, not the chart. A simple chart is usually a projection of 2-3 variables onto a rectangle, maybe with color, while a composite chart is the projection of more variables onto

the same rectangle (layering)
different rectangles (tiling)

Some composite projection strategies have become popular and earned a name, leading to the illusion that it's worth putting them in pigeonholes. So a future proof implementation goal is the enablement of a constructive, declarative grammar, rather than the implementation of a singular composite "type".

It's still worth enumerating the most useful compositing types to capture the right level of freedom without trying the chase the mirage of ultimate genericity and abstraction. Also, some composition strategies need to be done earlier than others.

Planar partitioning

Planar partitioning doesn't imply gridding, ie. splits may not be at regular x, y or matrix intervals. However, most of the enumerations happen to project onto regularly partitioned spaces. The reason for this is

inherent in the data, and our knowledge (or lack thereof) about it: we may be interested in the top N view for a phenomenon, where N may vary; or simply, the dataset has a certain, maybe unknown at design time, number of categories for a given variable to tile by
perceptual: if there's rythm and regularity in the data, which is the case even with something as simple as enumerating categories or bins, the initial comprehension and subsequent navigation (eg. via saccadic eye movements) is aided by a predictable, regular arrangement

Trellis - a special case of small multiples

Trellis plots may keep everything shared, except that different subsets of the data are shown per pane (aka panel or facet):

for categorical variables, it's typically one pane per category
for quantitative variables, one pane per whatever bin of a chosen binning strategy
for multiple variables, one pane for each combination of categorical value and/or bin

(ggplot2 example source)

Layout of panes:

univariate trellising usually arranges panes in a sawtooth pattern to fill a grid (called facet_wrap in ggplot2)
if there are few shown categories or bins, univariate layout is often just horizontal or vertical, depending on embedding layout, desired pane aspect ratio and which axis needs better sharing - eg. it's preferred to stack time series (x-axis: time) vertically, as the pane aspect ratio is usually large (long, horizontal box) and it's useful to preattentively or consciously correlate phenomena around a given event or time
bivariate trellising often spreads panes along the x-axis for one variable and along y for the other, but it doesn't have to be so - if the variables are not independent and the combinations vith shown data are sparse, then it can be a "global" sawtooth pattern
multivariate trellising just takes the above one step further: with independent dimensions, more than one dimension can map to the same axis, for a hierarchical-like dual (or more) axis; or add another variable to what you sawtooth by
layout are permissible though less usual and there's diminishing returns: for example, variable 1: x-axis, variable 2: y-axis, variable 3: inside the x/y addressable pane, follow a sawtooth layout (eg. show small panes for the variable 3 values that rank as eg. the first N most important to show)

Trellis: variations

Trellis plots can vary in several ways:

varying level of scale unification: any given scale for some variable mapped as eg. a Cartesian within a pane may be shared across some subset of other panes (or all), or be independent - eg. ggplot2 calls it fixed and free, respectively; the value domain is computed as an aggregation on whatever GROUP BY criteria (here, the dose domains are unioned for all panes, while the len domains are unioned only per row (dose) - a necessity as axes aren't repeated)
axis variations: repeat or share axes; redundantly place dual (but identical) axes on either side of a small multiples grid; alternately place axis left/right or top/bottom so that both the minimum and maximum value can be shown without much padding, yet without overlap
to cope with incomplete data - eg. due to space constraints - there may be "others" panes similar to an "others" slice in a pie chart, or there needs to be some other way to convey lack of completeness or lack of granularity
sorting shouldn't be taken for granted, it should be configurable, because the subjective call for importance, therefore positioning, may demand a layout order different from the default order for categories or even bins

An example unannotated trellis where the spreading by x-axis is driven by two variables:

Small multiples for varying the dimensions

Vega/Altair term: repeated charts; varying the encodings; partial Altair example that varies which variable is cast on the x-axis:

While trellising varies the subset of the data shown, it's often useful to show the full data in each panel, but from another angle, typically, Cartesian projection for various variable pairs. Often, one axis will be the same variable (eg. x-axis: time) while the other axis indicates a different variable per panel. The layout strategies mentioned with trellis also apply here.

The difference between varying dimensions, and varying category values within a dimension (trellising) is vague, because both may give the same result, depending on the shape of the tabular input data (long vs wide data). Ie. the same content can be packed differently in the input table, but of course it doesn't fundamentally alter what the chart is. There are many ways to bridge over long, wide and mixed tabular data.

Scatterplot matrix (SPLOM)

A special case of varying dimensions is the SPLOM, where the purpose is to show all possible variable pairs for a complete overview. It's one of the most unopinionated, metadata independent visualizations that exist (another one is the parallel coordinates, a kind of small multiples visualization, described further down).

(source)

Variations:

show both the upper and lower triangle; it's redundant, but the data ink orientation will differ, maybe useful
only show the upper or lower triangle
put another chart type in the diagonal, as bivariate plots often degenerate if both variables are the same - eg. scatterplot in the triangles but univariate histograms or kernel density estimates in the diagonals
use different chart types for the panes in the upper vs lower triangle - eg. the top one, scatterplot, bottom one, heatmap (or a collection of statistics eg. correlation)

Parallel coordinates (parcoords), parallel categories (parcats)

Here the trellising is based on adjacent dimensions. The order of the dimensions can often be changed, and so is the ordering for a dimension. Sometimes it's 2D gridding, with each row having a differently ordered variable. A pairwise complete SPLOM-like layout is also possible.

(source)

Just like SPLOM, parcoords, parcats are also great first views when approaching a new dataset for exploration, or as a starting point for navigating in the dashboard design space, as dozens of variables can be shown simultaneously; there are very few assumptions about the data, and weaknesses can be generally addressed (eg. overplotting due to discrete categories can be solved by jittering).

Small multiples

It's a general idea that covers all of the trellising, and much more. The term was coined or popularized by Edward Tufte's Envisioning Information which dedicates an entire chapter to the topic. There's no definition there - a good thing, as chart ontologies are a mirage - and there are not just charts in the facets but also images, composite charts and schematic drawings. Highlight quotes:

At the heart of quantitative reasoning is a single question: Compared
to what?
Information slices are positioned within the eyespan, so that viewers make comparisons at a glance — uninterrupted visual reasoning. ... Comparisons must be enforced within the scope ofthe eyespan, a fundamental point occasionally forgotten in practice.
Constancy of design puts the emphasis on changes in data, not changes in data frames.

One example of the many diverse examples in the book:

Tufte also cites an example for small multiples which is a freeform arrangement of similarly sized panels, rather than identically sized or grid arranged. There are links and annotations among the panels.

Often neglected, Tufte, by asking, "Compared to what?", suggests the possibility and utility for small multiples where, beyond partitioning a dataset into facets, there would be aggregated facets for various levels of detail to aid comparability, eg. compare asset price/volatility/... time series to those of asset groups.

What's inside the panels?

Usually each panel has what looks like a single chart, except no legends and sometimes no or partial axis, as in previous examples. But it doesn't have to be so.

It could be maps: full blown GIS maps or fast to render, simple locator or choropleth maps:

Or vertical stacking of in themselves composite rows (or horizontal stacking of columns) for a heterogeneous column grid:

Pane size need not be uniform, but it's a good idea if they're rooted in something, eg. constant scale while reducing inkless area:

Grids may reflect something other than a dense matrix of independent x/y variables to split by, eg. tile grid map:

In sum,

pretty much everything, even composite charts, can be usefully composited
compositing may be a simple or hierarchical x/y partitioning but it can also follow some projection logic which we traditionally identify as projections for simple charts or maps
grammar wise, most simple charts can be seen as composite charts of even simpler charts:
- multiline chart: layers of line charts
- scatterplot: layers of x/y projected individual points (or variable sized bubbles, or heaven forbid, pie charts)
- heatmap: a trellis of quantitatively colored rectangles along ordinal variables for x and y (eg. binned quantitative)

Layering

[write about pure additive layering, eg. line over barchart; boxplot over binned scatter etc.]
[write about constraint layering when the presence or absence of layers impact each other; eg. a clustered barchart requires thinner bars as the number of layers - series - grows]
[add screenshots from elastic-charts]

Unity between gridding and layering

They're often alternatives to one another; examples:
- single scatterplot with one color per category, vs. a trellis of small scatterplots
- grouped bar chart vs. small multiple of single-series bar chart

Consequences of unity:

in a data exploration user interface, or a chart configurator / data binder tool like Lens, it should be easy for the user to toggle; the number of clicks shouldn't be noticeably more between a grouped bar chart and bars small multiples than between eg. a line and bar chart, or simple vs grouped bar chart
it's good to have a fairly unopinionated charting API for the incoming data; eg. a flat, tabular, multivariate dataset with categorical, quantitative and (pre- or runtime-) binned quantitative variables is conducive to single charting; single charting with series splits (eg. one point color, one line or one bar series per category) and small multiples - this way, the library user needs not reshape the data into preconceived notions per chart template
it's good to have a fairly unopinionated charting API for the specification; for example, extracting out the color and cartesian projections removes a lot of structural broilerplate and lets them be used in various templated charts, small multiple or not
the current chart should not be the primary organizational unit, ie. a small multiple of [whatever] shouldn't simply be a number of [whatever]s placed side by side

Goals

Have a specification structure and implementation that can eventually serve broad use cases even if buildup of capabilities is gradual
Follow the DRY principle, eg. by exploiting commonality between layering and trellising
Allow the prioritized release of compositing strategies, eg. trellising by category; trellising by bin; trellising by varying dimension, maybe in this order

Motive for considering diverse examples of planar partitioning and layering

Arrive at commonality across functions (mappings); eg. a single chart with two lines requires almost the same exact functions as two charts with one line each; same if the single chart had a different Y axis for the two lines
Be able to make new variations efficiently
Over time,
- expose increasing configuration capabilities to support Elastic data exploration, chart design etc.
- expose projections as first-class entities to advance Beyond palettes
Architecting in terms of projections rather than individual charts is important for future render targets eg. WebGL
Accessibility is better if the eventual a11y annotations (DOM elements, WAI-ARIA properties) reflect fundamental structures (eg. scale ranges) instead of redundant auditory repetition of visually rendered axis ticks panel by panel

Motive for prioritized release

Product demand
Make it easy and compact to specify the most common trellising variation(s)
Refraining from exposing all capabilities right away, because the grammar will inevitably change over time, so exposing compositing strategies one by one through more dedicated, compact API specs lets the grammar stay internal while it improves

Why not just put existing charts side by side internally?

even this requires work, and inventory of what we'll need over time
it still needs formulation of a specification, as asking the user end to specify the panels one by one is laborious
runtime efficiency: calculation of apparently simple things eg. axis ticks can still be computationally intensive, require text length measurements by Canvas2d rendering, computing min/max on possibly large datasets; multiply it by 100 for a 10x10 grid and the redundancy will make it slow
shared projections: most often, at least some of the scales are computed over the entire data, or groups of data rather than data just in the individual panel
trellis aware axis placement strategies, eg. don't render all the redundant axes; ensure there's no overlap of axis tick labels in neighboring charts, but still try to be compact
legends should be common
tooltips, events, ...
eventual WebGL render target: it's not feasible to render each chart into its own <canvas> due to max. GL context count
logic may be needed to (re)arrange the charts, save space or even collapse them if the containing area is too small (eg. for a narrower rectangle, omit redundant axes, but on a wide display, useful to add them to avoid large distance eye travels to look up values)

The text was updated successfully, but these errors were encountered:

monfera added enhancement New feature or request wip work in progress labels May 25, 2020

markov00 added the meta ...meta issue label May 26, 2020

markov00 mentioned this issue May 26, 2020

Support for faceted charts (small multiples) #500

Closed

1 task

monfera mentioned this issue Sep 30, 2022

ARIA Data Visualisation module w3c/aria#991

Open

nickofthyme mentioned this issue Jun 3, 2024

[Legend] Align bottom of legend to the zero axis #2442

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design discussion for composite charts #682

Design discussion for composite charts #682

monfera commented May 25, 2020 •

edited by markov00

Loading

Design discussion for composite charts #682

Design discussion for composite charts #682

Comments

monfera commented May 25, 2020 • edited by markov00 Loading

Composite charts

What's a composite chart?

Typical ways for compositing

Planar partitioning

Trellis - a special case of small multiples

Trellis: variations

Small multiples for varying the dimensions

Scatterplot matrix (SPLOM)

Parallel coordinates (parcoords), parallel categories (parcats)

Small multiples

What's inside the panels?

Layering

Unity between gridding and layering

Goals

Motive for considering diverse examples of planar partitioning and layering

Motive for prioritized release

Why not just put existing charts side by side internally?

monfera commented May 25, 2020 •

edited by markov00

Loading