Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design discussion for composite charts #682

Open
monfera opened this issue May 25, 2020 · 0 comments
Open

Design discussion for composite charts #682

monfera opened this issue May 25, 2020 · 0 comments
Labels
enhancement New feature or request meta ...meta issue wip work in progress

Comments

@monfera
Copy link
Contributor

monfera commented May 25, 2020

📦 Meta issue for Small Multiples project

Composite charts

Also known as compound charts

What's a composite chart?

A single chart is a familiar thing, eg. a pie chart or a bar chart(*).

image

A composite chart has multiple constituents, most known for distinct, non-overlapping rectangular areas, typically with their own axes. They look like several charts laid out in some tiling, grid or freeform arrangement, but layering (analogy: z-axis) is possible as well.

The planar composition can be homogeneous, eg. a grid of identically sized rectangles, and a lot of commonality in terms of the axes, their data and the chart type (eg. all line charts), known as small multiples, faceted, lattice, trellis or matrix chart. Some of the name proliferation is historical baggage.
image

Alternatively, the planar composition may be heterogeneous(**), ie. its constituents don't usually share a common width/height, chart type etc., ie. it's not based on repetition but on partly shared projection. A common example is the marginal scatterplot:

image

Side notes:

(*) Compositing isn't constrained to non-overlapping rectangles. For example, multilayer plots are also composites. Their data (or at least a common root data) may be shared, as well as their Cartesian projections. In an EDA tool, there might be a single toggle that switches between a multilayer view such as this one into a small multiples view, scatterplot on the left, contour plot on the right.
image

(**) We'll see that while it's possible to come up with special cases, and it's useful to do (we'll do it here too), there's a continuum or degree of sharing projections, and any boundary is somewhat arbitrary and artificial. You can change projections (incl. screen projections that hinge on width/height) one step at a time to morph a perfectly regular trellis into a quite heterogeneous chart set, without crossing into fundamental boundaries. In particular, there's no fundamental difference between a grouped (clustered) bar chart and small multiples of non-grouped bar charts for the same series. Generally, even the planar compositing vs layering is not a significant boundary unless in the context of particulars about their readability etc. Even data processing and rendering wise, there's a ton of sharing between the two.

Typical ways for compositing

There are no sharp boundaries for compositing to form ontologies or "chart typology" because the pertinent abstraction from which we're constructing is the projection, not the chart. A simple chart is usually a projection of 2-3 variables onto a rectangle, maybe with color, while a composite chart is the projection of more variables onto

  • the same rectangle (layering)
  • different rectangles (tiling)

Some composite projection strategies have become popular and earned a name, leading to the illusion that it's worth putting them in pigeonholes. So a future proof implementation goal is the enablement of a constructive, declarative grammar, rather than the implementation of a singular composite "type".

It's still worth enumerating the most useful compositing types to capture the right level of freedom without trying the chase the mirage of ultimate genericity and abstraction. Also, some composition strategies need to be done earlier than others.

Planar partitioning

Planar partitioning doesn't imply gridding, ie. splits may not be at regular x, y or matrix intervals. However, most of the enumerations happen to project onto regularly partitioned spaces. The reason for this is

  • inherent in the data, and our knowledge (or lack thereof) about it: we may be interested in the top N view for a phenomenon, where N may vary; or simply, the dataset has a certain, maybe unknown at design time, number of categories for a given variable to tile by
  • perceptual: if there's rythm and regularity in the data, which is the case even with something as simple as enumerating categories or bins, the initial comprehension and subsequent navigation (eg. via saccadic eye movements) is aided by a predictable, regular arrangement

Trellis - a special case of small multiples

Trellis plots may keep everything shared, except that different subsets of the data are shown per pane (aka panel or facet):

  • for categorical variables, it's typically one pane per category
  • for quantitative variables, one pane per whatever bin of a chosen binning strategy
  • for multiple variables, one pane for each combination of categorical value and/or bin

image
(ggplot2 example source)

Layout of panes:

  • univariate trellising usually arranges panes in a sawtooth pattern to fill a grid (called facet_wrap in ggplot2)
  • if there are few shown categories or bins, univariate layout is often just horizontal or vertical, depending on embedding layout, desired pane aspect ratio and which axis needs better sharing - eg. it's preferred to stack time series (x-axis: time) vertically, as the pane aspect ratio is usually large (long, horizontal box) and it's useful to preattentively or consciously correlate phenomena around a given event or time
  • bivariate trellising often spreads panes along the x-axis for one variable and along y for the other, but it doesn't have to be so - if the variables are not independent and the combinations vith shown data are sparse, then it can be a "global" sawtooth pattern
  • multivariate trellising just takes the above one step further: with independent dimensions, more than one dimension can map to the same axis, for a hierarchical-like dual (or more) axis; or add another variable to what you sawtooth by
  • layout are permissible though less usual and there's diminishing returns: for example, variable 1: x-axis, variable 2: y-axis, variable 3: inside the x/y addressable pane, follow a sawtooth layout (eg. show small panes for the variable 3 values that rank as eg. the first N most important to show)

Trellis: variations

Trellis plots can vary in several ways:

  • varying level of scale unification: any given scale for some variable mapped as eg. a Cartesian within a pane may be shared across some subset of other panes (or all), or be independent - eg. ggplot2 calls it fixed and free, respectively; the value domain is computed as an aggregation on whatever GROUP BY criteria (here, the dose domains are unioned for all panes, while the len domains are unioned only per row (dose) - a necessity as axes aren't repeated)
    image
  • axis variations: repeat or share axes; redundantly place dual (but identical) axes on either side of a small multiples grid; alternately place axis left/right or top/bottom so that both the minimum and maximum value can be shown without much padding, yet without overlap
  • to cope with incomplete data - eg. due to space constraints - there may be "others" panes similar to an "others" slice in a pie chart, or there needs to be some other way to convey lack of completeness or lack of granularity
  • sorting shouldn't be taken for granted, it should be configurable, because the subjective call for importance, therefore positioning, may demand a layout order different from the default order for categories or even bins

An example unannotated trellis where the spreading by x-axis is driven by two variables:
image

Small multiples for varying the dimensions

Vega/Altair term: repeated charts; varying the encodings; partial Altair example that varies which variable is cast on the x-axis:
image

While trellising varies the subset of the data shown, it's often useful to show the full data in each panel, but from another angle, typically, Cartesian projection for various variable pairs. Often, one axis will be the same variable (eg. x-axis: time) while the other axis indicates a different variable per panel. The layout strategies mentioned with trellis also apply here.

The difference between varying dimensions, and varying category values within a dimension (trellising) is vague, because both may give the same result, depending on the shape of the tabular input data (long vs wide data). Ie. the same content can be packed differently in the input table, but of course it doesn't fundamentally alter what the chart is. There are many ways to bridge over long, wide and mixed tabular data.

Scatterplot matrix (SPLOM)

A special case of varying dimensions is the SPLOM, where the purpose is to show all possible variable pairs for a complete overview. It's one of the most unopinionated, metadata independent visualizations that exist (another one is the parallel coordinates, a kind of small multiples visualization, described further down).

image
(source)

Variations:

  • show both the upper and lower triangle; it's redundant, but the data ink orientation will differ, maybe useful
  • only show the upper or lower triangle
  • put another chart type in the diagonal, as bivariate plots often degenerate if both variables are the same - eg. scatterplot in the triangles but univariate histograms or kernel density estimates in the diagonals
  • use different chart types for the panes in the upper vs lower triangle - eg. the top one, scatterplot, bottom one, heatmap (or a collection of statistics eg. correlation)

Parallel coordinates (parcoords), parallel categories (parcats)

Here the trellising is based on adjacent dimensions. The order of the dimensions can often be changed, and so is the ordering for a dimension. Sometimes it's 2D gridding, with each row having a differently ordered variable. A pairwise complete SPLOM-like layout is also possible.

image
(source)

Just like SPLOM, parcoords, parcats are also great first views when approaching a new dataset for exploration, or as a starting point for navigating in the dashboard design space, as dozens of variables can be shown simultaneously; there are very few assumptions about the data, and weaknesses can be generally addressed (eg. overplotting due to discrete categories can be solved by jittering).

Small multiples

It's a general idea that covers all of the trellising, and much more. The term was coined or popularized by Edward Tufte's Envisioning Information which dedicates an entire chapter to the topic. There's no definition there - a good thing, as chart ontologies are a mirage - and there are not just charts in the facets but also images, composite charts and schematic drawings. Highlight quotes:

  • At the heart of quantitative reasoning is a single question: Compared
    to what?
  • Information slices are positioned within the eyespan, so that viewers make comparisons at a glance — uninterrupted visual reasoning. ... Comparisons must be enforced within the scope ofthe eyespan, a fundamental point occasionally forgotten in practice.
  • Constancy of design puts the emphasis on changes in data, not changes in data frames.

One example of the many diverse examples in the book:
image

Tufte also cites an example for small multiples which is a freeform arrangement of similarly sized panels, rather than identically sized or grid arranged. There are links and annotations among the panels.

Often neglected, Tufte, by asking, "Compared to what?", suggests the possibility and utility for small multiples where, beyond partitioning a dataset into facets, there would be aggregated facets for various levels of detail to aid comparability, eg. compare asset price/volatility/... time series to those of asset groups.

What's inside the panels?

Usually each panel has what looks like a single chart, except no legends and sometimes no or partial axis, as in previous examples. But it doesn't have to be so.

It could be maps: full blown GIS maps or fast to render, simple locator or choropleth maps:
image

Or vertical stacking of in themselves composite rows (or horizontal stacking of columns) for a heterogeneous column grid:
image

Pane size need not be uniform, but it's a good idea if they're rooted in something, eg. constant scale while reducing inkless area:

image

Grids may reflect something other than a dense matrix of independent x/y variables to split by, eg. tile grid map:
image

In sum,

  • pretty much everything, even composite charts, can be usefully composited
  • compositing may be a simple or hierarchical x/y partitioning but it can also follow some projection logic which we traditionally identify as projections for simple charts or maps
  • grammar wise, most simple charts can be seen as composite charts of even simpler charts:
    • multiline chart: layers of line charts
    • scatterplot: layers of x/y projected individual points (or variable sized bubbles, or heaven forbid, pie charts)
    • heatmap: a trellis of quantitatively colored rectangles along ordinal variables for x and y (eg. binned quantitative)

Layering

[write about pure additive layering, eg. line over barchart; boxplot over binned scatter etc.]
[write about constraint layering when the presence or absence of layers impact each other; eg. a clustered barchart requires thinner bars as the number of layers - series - grows]
[add screenshots from elastic-charts]

Unity between gridding and layering

  • They're often alternatives to one another; examples:
    • single scatterplot with one color per category, vs. a trellis of small scatterplots
    • grouped bar chart vs. small multiple of single-series bar chart

Consequences of unity:

  • in a data exploration user interface, or a chart configurator / data binder tool like Lens, it should be easy for the user to toggle; the number of clicks shouldn't be noticeably more between a grouped bar chart and bars small multiples than between eg. a line and bar chart, or simple vs grouped bar chart
  • it's good to have a fairly unopinionated charting API for the incoming data; eg. a flat, tabular, multivariate dataset with categorical, quantitative and (pre- or runtime-) binned quantitative variables is conducive to single charting; single charting with series splits (eg. one point color, one line or one bar series per category) and small multiples - this way, the library user needs not reshape the data into preconceived notions per chart template
  • it's good to have a fairly unopinionated charting API for the specification; for example, extracting out the color and cartesian projections removes a lot of structural broilerplate and lets them be used in various templated charts, small multiple or not
  • the current chart should not be the primary organizational unit, ie. a small multiple of [whatever] shouldn't simply be a number of [whatever]s placed side by side

Goals

  • Have a specification structure and implementation that can eventually serve broad use cases even if buildup of capabilities is gradual
  • Follow the DRY principle, eg. by exploiting commonality between layering and trellising
  • Allow the prioritized release of compositing strategies, eg. trellising by category; trellising by bin; trellising by varying dimension, maybe in this order

Motive for considering diverse examples of planar partitioning and layering

  • Arrive at commonality across functions (mappings); eg. a single chart with two lines requires almost the same exact functions as two charts with one line each; same if the single chart had a different Y axis for the two lines
  • Be able to make new variations efficiently
  • Over time,
    • expose increasing configuration capabilities to support Elastic data exploration, chart design etc.
    • expose projections as first-class entities to advance Beyond palettes
  • Architecting in terms of projections rather than individual charts is important for future render targets eg. WebGL
  • Accessibility is better if the eventual a11y annotations (DOM elements, WAI-ARIA properties) reflect fundamental structures (eg. scale ranges) instead of redundant auditory repetition of visually rendered axis ticks panel by panel

Motive for prioritized release

  • Product demand
  • Make it easy and compact to specify the most common trellising variation(s)
  • Refraining from exposing all capabilities right away, because the grammar will inevitably change over time, so exposing compositing strategies one by one through more dedicated, compact API specs lets the grammar stay internal while it improves

Why not just put existing charts side by side internally?

  • even this requires work, and inventory of what we'll need over time
  • it still needs formulation of a specification, as asking the user end to specify the panels one by one is laborious
  • runtime efficiency: calculation of apparently simple things eg. axis ticks can still be computationally intensive, require text length measurements by Canvas2d rendering, computing min/max on possibly large datasets; multiply it by 100 for a 10x10 grid and the redundancy will make it slow
  • shared projections: most often, at least some of the scales are computed over the entire data, or groups of data rather than data just in the individual panel
  • trellis aware axis placement strategies, eg. don't render all the redundant axes; ensure there's no overlap of axis tick labels in neighboring charts, but still try to be compact
  • legends should be common
  • tooltips, events, ...
  • eventual WebGL render target: it's not feasible to render each chart into its own <canvas> due to max. GL context count
  • logic may be needed to (re)arrange the charts, save space or even collapse them if the containing area is too small (eg. for a narrower rectangle, omit redundant axes, but on a wide display, useful to add them to avoid large distance eye travels to look up values)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request meta ...meta issue wip work in progress
Projects
None yet
Development

No branches or pull requests

2 participants