-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scatter Plot Matrix (aka SPLOM) discussion #2372
Comments
Solution 1 (aka splom overlord)Add a new do-it-all trace = {
dimensions: [{
values: [/* */],
// some scatter style props ...
// some axis props reused from cartesian axes
}],
// some splom-wide options e.g.:
showdiagonal: true || false,
showupperhalf: true || false,
showlowerhalf: true || false,
direction: 'top-left-to-bottom-right' || 'bottom-left-to-top-right',
// ...
} PROs
CONs
|
Solution 2 (tooling)Port var Plotly = require('plotly.js')
var fields = [
[/* */],
[/* */],
// ...
]
var layout = Plotly.makeSubplots({rows: fields.length, cols: fields.length})
var data = []
for (var i = 0; i < fields.length; i++) {
for (var j = 0; j < fields.length; j++) {
var trace = {
mode: 'markers',
x: fields[i],
y: fields[j]
}
Plotly.linkToSubplot(trace, i, j)
data.push(trace)
}
}
Plotly.newPlot(gd, data, layout) PROs
CONs
|
Solution 3 (data-array reusing)This could be combined with solution 2 to solve the data-array-duplication problem. But this would allow require some backend work for plot.ly support. In short, we could add a new top-level argument to var columns: [
{name: 'col 0', values: [/* */]},
{name: 'col 1', values: [/* */]},
// ...
]
// unfortunately, in this paradigm columns should really be labeled data,
// and data -> traces
var data = [{
x: 'col 0',
y: 'col 1'
}, {
x: 'col 1',
y: 'col 0'
}]
Plotly.newPlot(gd, {
columns: columns,
data: data,
layout: {}
}) PROs
CONs
|
I think it's clear we want to encapsulate a The question in my mind is whether we can do it by linking the Preferred option: refer to regular cartesian axestrace = {
dimensions: [{
values: [/* */],
name: 'Sepal Width' // used as default x/y axis titles
xaxis: 'x' | 'x2' ... // defaults to ith x axis ID for dimension i
yaxis: 'y' | 'y2' ...
}],
marker: {
// just like scatter, and all the same ones are arrayOk.
// goes outside the `dimensions` array because the same data point should get
// the same marker in all subplots.
}
// domain settings - not used directly, just fed into the defaults for all the
// individual x/y axis domains
domain: {
// total domain to be divided among all x / y axes
x: [0, 1],
y: [0, 1],
// blank space between x axes, as a fraction of the length of each axis
// possibly xgutter and ygutter?
gutter: 0.1
}
// some splom-wide options e.g.:
// maybe turn these into a flaglist 'upper+lower+diagonal'?
// these and related attrs will affect the default x/y axis anchor and/or side attributes
showdiagonal: true || false,
showupperhalf: true || false,
showlowerhalf: true || false,
// maybe xdirection and ydirection?
direction: 'top-left-to-bottom-right' || 'bottom-left-to-top-right',
// ...
};
layout = {
xaxis: { /* overriding any of the defaults set by SPLOM */ },
xaxis2: { /* */ },
xaxis3: { /* */ },
... ,
yaxis: { /* */ },
...
}; One variation that might be nice but I'm not sure: separate the list of axes from the dimensions. This could make it easier for example to reorder the dimensions without having to do all sorts of gymnastics with swapping axis attributes (though we might need to swap axis titles still, if they're not inherited from the dimension names): trace = {
dimensions: [{
values: [/* */],
name: 'Sepal Width' // used as default x/y axis titles
// some scatter style props ...
}],
xaxes: ['x', 'x2', 'x3', ...], // defaults to the first N x axis IDs. info_array, Not data_array.
yaxes: ['y', 'y2', 'y3', ...],
...
} Bonus: layout.gridAlso, it might be nice to move the axis arrangement to // splom trace would still have axis ids in it but no axis layout info (domain or gutter)
layout = {
grid: {
xaxes: ['x', 'x2', 'x3', ...],
yaxes: ['y', 'y2', 'y3', ...],
domain: { x: [0, 1], y: [0, 1] },
gutter: 0.1
}
} Cases like splom would use a 1D arrays of x/y axes, as all rows share the same x axes and all columns share the same y axes, but we could also allow 2D arrays for when you want a grid of uncoupled axes. And if you put '' in any entry it leaves that row/col/cell blank, and at some point we can make a way to refer to empty cells in other trace/subplot types - so in a Actually, this would make it easy to support multiple
That way all of this would happen automatically if you just make a Alternative: axes also encapsulated in the traceWhat I'm trying to avoid above, but might be even higher performance at the expense of flexibility, trace = {
dimensions: [{
values: [/* */],
xaxis: { /* all the x axis attributes like title, tick/grid specs, fonts, etc */ },
yaxis: { /* same for y - or these could go in xaxes/yaxes arrays but still in the trace */ }
}]
} or in My hope though is that the SVG axis machinery is fast enough, especially if we avoid having |
Thanks for the 📚 @alexcjohnson I'm a big fan of those About your Now, to give a more concrete example (to e.g. @dfcreative 😉), the Iris splom (e.g. https://codepen.io/etpinard/pen/Vbzxqa) would be declared as: var url = 'https://cdn.rawgit.com/plotly/datasets/master/iris.csv'
var colors = ['red', 'green', 'blue']
Plotly.d3.csv(url, (err, rows) => {
var keys = Object.keys(rows[0]).filter(k => k !== 'Name')
var names = rows.map(r => r.Name).filter((v, i, self) => self.indexOf(v) === i)
var xaxes = keys.map((_, i) => 'x' + (i ? i + 1 : ''))
var yaxes = keys.map((_, i) => 'y' + (i ? i + 1 : ''))
var data = names.map((name, i) => {
var rowsOfName = rows.filter(r => r.Name === name)
var trace = {
type: 'splom',
name: name,
dimensions: keys.map((k, j) => {
// 'label' would be better here than 'name' (parcoords uses 'label')
label: k,
values: rowsOfName.map(r => r[j]),
}),
marker: {color: color[i]},
// the default (for clarity)
showlegend: true,
xaxes: xaxes,
yaxes: yaxes
}
return trace
})
var layout = {
grid: {
xaxes: xaxes,
yaxes: yaxes
domain: { x: [0, 1], y: [0, 1] },
gutter: 0.1
}
}
Plotly.newPlot('graph', data, layout) That is, one splom trace per 🥀 type and one dimension per observed field in each trace. |
Interesting point here about the grid lines. It shouldn't be too hard to draw them in WebGL (much easier than axis labels 😉 at least), if we find SVG too slow. |
May I add my 2¢? Plotly.newPlot(document.body, [{
type: 'scattermatrix',
x: [[], [], ...xdata],
y: [[], [], [], ...ydata]
}]) That would be familiar already for the users who know trace types and options. |
Usually it's 2¢ but we like you so sure :)
Two things I don't like about this:
Anyway we do have a precedent for the structure I'm proposing, in |
I suppose we could let
I guess ^^ could be massaged into the grid format with concepts like So I still think we'll need something like #2274 (comment) but perhaps grid would be allowed to provide defaults to that when the layout is conducive to it. @dfcreative don't worry about |
Branch |
Things to note:
|
Just a couple of clarifying questions:
Sounds great, just as long as this doesn't restrict us from displaying other data (be it splom or some other trace type) on the same axes.
I'm not really sure what a |
Yes, for sure 👌
Here's a sneak peak: |
Here are some observations on splom-generated cartesian subplots: Off the var Nvars = ???
var Nrows = 2e4 // make no difference for now
var dims = []
for(var i = 0; i < Nvars; i++) {
dims.push({values: []})
for(var j = 0; j < Nrows; j++) {
dims[i].values.push(Math.random())
}
}
Plotly.purge(gd);
console.time('splom')
Plotly.plot(gd, [{
type: 'splom',
dimensions: dims
}])
console.timeEnd('splom') I got: where I added A few quick hits:
|
Work in progress https://dfcreative.github.io/regl-scattermatrix/ |
Quick update:
|
Interesting finding:
|
There's also |
too bad. Although https://developer.mozilla.org/en-US/docs/Web/API/Node/baseURI is incomplete: |
New benchmarks post 5887104 (which I pushed to #2474 - hopefully @alexcjohnson won't mind): Things are looking up 🎸 Next steps:
|
A first attempt at drawing grid lines using @dfcreative 's Here are the numbers (in ms) with all axes having the same
In brief, we start to see improvements over SVG at around 15 dimensions (i.e 15x15=225 subplots). |
SPLOMs are coming to plotly.js.
For the uninitiated, docs on the python api
scatterplotmatrix
figure factory are here. Seaborn calls it a pairplot. Matlab has plotmatrix draw function.Some might say that SPLOMs are already part of plotly.js: all we have to do is generate traces for each combination of variables and plot them on an appropriate axis layout (example).
But, this technique has a few limitations and inconveniences:
Numerous solutions are available. This issue will attempt to spec out the best one.
cc @dfcreative @alexcjohnson @cldougl @chriddyp
The text was updated successfully, but these errors were encountered: