scatter plots #107

leeoniya · 2020-01-28T23:05:39Z

uPlot.Scatter

e.g. https://academy.datawrapper.de/article/65-how-to-create-a-scatter-plot

spatial index via e.g. https://github.com/mourner/kdbush or https://github.com/mourner/flatbush

data format e.g.:

[
  [x,y,v,l,x,y,v,l],  // series 1
  [x,y,v,l,x,y,v,l],  // series 2
  [x,y,v,l,x,y,v,l],  // series 3
]

v value (size of point/shape)
l label

should use a Path2D shape cache

The text was updated successfully, but these errors were encountered:

ryantxu · 2020-01-29T01:59:37Z

I don't understand why this would need a different data format/layout from the line variant? I would expect something like:

data:[
  [a1,a2,a3,...],  
  [b1,b2,b3,...], 
  [c1,c2,c3,...], 
]
config: {
  x: 0, // a
  y: 1, // b
  label: 2, // c
}

or something

leeoniya · 2020-01-29T02:58:45Z

if you look at the linked example, each point needs to encode at least x,y and the size of the point (i'm calling it "v"alue). i guess the per-point label can be left out, but i feel like it's pretty fundamental to scatter plots. in addition to this, my gut feeling is that scatterplots are less likely to be easily alignable than line charts (the majority of which are time series). if they cannot be aligned, then i cannot do a simple binary search as i do with uPlot.Line, and need a spatial index (quadtree, kd tree, etc.).

if your a1, a2 and a3 are objects, then 100k of these will take up a lot of memory. to avoid this, uPlot sticks to using flat arrays.

fundamentally, i think scatter plots are different enough to justify a different data format for performance reasons. a lot of charting libs fall for the temptation of complete uniformity across chart types, or more human-friendly formats and pay for it with performance; there's a reason why uPlot.Line is as fast as it is.

ryantxu · 2020-01-29T05:05:17Z

Obviously many ways to skin this ;) I like your columnar approach in the line chart.

Plotly uses something worth looking at -- essentially a vector for each of the attributes: x,y,size,text, etc: https://plot.ly/javascript/line-and-scatter/#data-labels-on-the-plot

trace1 = {
  x: [1, 2, 3, 4, 5],
  y: [1, 6, 3, 6, 1],
  mode: 'markers+text',
  type: 'scatter',
  name: 'Team A',
  text: ['A-1', 'A-2', 'A-3', 'A-4', 'A-5'],
  textposition: 'top center',
  textfont: {
    family:  'Raleway, sans-serif'
  },
  marker: { size: 12 }
};

leeoniya · 2020-01-29T05:19:46Z

yea, that feels right...perhaps with a tweaked uPlot take:

[
  [  // series 1
     [1,2,3],  // x
     [10,20,30],  // y
     [2.2,1.5,6.5], // v
     ["a","b","c"],  // l
  ],
  [  // series 2
     [1,2,3],  // x
     [10,20,30],  // y
     [2.2,1.5,6.5], // v
     ["a","b","c"],  // l
  ]
]

nice benefit is we can use typed arrays too, and the value and label arrays can be optional in a series.

ryantxu · 2020-01-29T07:14:42Z

FYI, what you list above is essentially the native grafana data format -- that is optionally backed by apache arrow tables

backspaces · 2020-03-21T16:21:53Z

Not to be a grumpy old man (which I am!), I'd favor scatter plots only if they do not significantly increase the size of the library. I say this as someone currently using chartjs and desperately need a smaller library like yours!

How in the world do you keep it so small! Sounds like a good medium article.

leeoniya · 2020-03-21T17:29:25Z

@backspaces

scatter should not add much code (definitely still within 30K).

also, scatter would be feature-gated like many uPlot's features, which can be compiled out:

uPlot/rollup.config.js

Lines 36 to 43 in c912bec

    
           const FEATS = { 
        
           	FEAT_TIME: true, 
        
           	FEAT_CURSOR: true, 
        
           	FEAT_PATHS: true, 
        
           	FEAT_POINTS: true, 
        
           	FEAT_LEGEND: true, 
        
           //	FEAT_GAPS: false, 
        
           };

i'm not sure if i'll even need a spatial index, where the indexing costs may outweigh the querying costs. most scatter plots are within 1k points, so a dumb linear scan might be quite sufficient. if i do ingest a spatial index, i'm able to get flatbush down to 3.66 KB [1] and kdbush [2] is even smaller if i don't account for variable point diameters when testing cursor proximity.

the main issue here is one of different data layout. a lot of internal util functions & loops assume aligned data across series. i would need to create an additional branch in every place that references data i0 and i1, which would make for much messier code - you can see how often those are used by searching [3]. i'm going to prototype this over the next week or two to find a path forward. i feel like scatter and log scales are the last significant missing pieces in uPlot.

How in the world do you keep it so small! Sounds like a good medium article.

i have the opposite question: how are the other libraries so huge!? some include data parsing & statistical aggregation, animations, declarative options for every possible combo of desires and styles. some have complex area fillers for stacked series and include radar, donut, pie, and other chart types. if they do timezone or DST handling, many include Luxon or Moment (which are huge), while uPlot relies on a neat hack i found [4] (but does not support IE11). Another reason is that uPlot is monolithic and not prototype of class based - many things live inside the uPlot constructor closure, so most variables can be mangled and minified without issue; uPlot only publicly exposes what the user-facing API needs. the drawback to this of course is that a lot of the code has to live in one giant 2K LOC file [5] and some difficulty around adding an additional data format like scatter. Chart.js, for example, is more java-esque where every component is a derived class, but must expose all its non-minifyable innards as private APIs. as with everything, trade-offs abound.

[1] mourner/flatbush#27 (comment)
[2] https://github.com/mourner/kdbush
[3] https://github.com/leeoniya/uPlot/blob/master/dist/uPlot.esm.js
[4] https://github.com/leeoniya/uPlot/blob/master/src/fmtDate.js#L126
[5] https://github.com/leeoniya/uPlot/blob/master/src/uPlot.js

leeoniya · 2020-03-22T07:06:10Z

sneak peek

30,000 scatter points (10k per series) in 100ms:

this is without building a spatial point index, which turns out to be a fairly expensive task (~40ms for kdbush).

if the points were square (instead of circles) and pixel-aligned, this would likely run 30-50% faster since there'd be no need for anti-aliasing.

leeoniya · 2020-03-22T19:58:39Z

rendering solid circles instead of hollow circles shaves 20-25ms (since hollow circles are stroked and filled by separate Path2D objects).

leeoniya · 2020-03-25T00:17:37Z

ok, so i optimized point rendering in 12b6beb, to use a single Path2D.

since i had no baseline for whether 75ms was slow or fast for a 30k point scatter plot. i tried the same with the next-fastest lib (Chart.js 3.0 alpha). spoiler: it's was never actually 75ms (that's just JS exec time):

uPlot:

build: 450ms (75ms js + 375ms "system")
toggle last series on/off: 625ms

Chart.js 3.0 alpha:

build: 900ms
toggle last series on/off: 3157ms

so about 2x as fast for init (but still without building a spatial index). and much faster for series toggle, even with rebuilding all paths. once i cache the Path2D objects, toggle will be faster still (by a lot).

another question is whether the design should accommodate multi-scale scatter plots. these are not very common, but they do exist:

tboerstad · 2020-05-29T18:43:28Z

I discovered uPlot lately on Hacker News, and was really impressed by the speed.

I have a project where speed has been important, a web page for creating scatter plots from CSV files. I'm currently using Plotly.js, with webgl option, and it's been quite performant, but not anything near uPlot. Plotly also has a lot more bells and whistles than I need.

I just wanted to let everyone know that exists an excited user who is waiting for scatter plots!

If people reading this are shaking their heads at CSV and performance in the same sentence, they'd be correct, as CSV parsing is currently the slowest part of the process.

leeoniya · 2020-06-01T05:54:41Z

@tboerstad csvplot looks cool :)

i paused on scatter when i ran into various questions about how and if y auto-scaling should work and some other internals that assume aligned data, etc. - quite a lot of the core needs to be ifd away for scatter/bubble plots.

i'd like to figure out #184 before moving forward here.

backspaces · 2020-12-16T01:58:04Z

Just a clarification: we may want to include "points" in this:
Canvas 2D-based chart for plotting time series, lines, areas, ohlc & bars;

I needed a points graph, i.e. sorting x,y pairs. By simply sorting by x, then extracting the x, y arrays, it worked fine. And adjacent x values can be the same, no issue.

So the distinction between "points" and "scatter" graphs might be useful to make.

leeoniya · 2020-12-16T02:28:33Z

And adjacent x values can be the same, no issue.

well, mostly. you'll run into issues with being unable to hover a point properly and zooming will get wonky if either edge ends up in a same-xs territory since there's a lot internally that relies on a binary search which will fail to converge to one value. for scatter, you also expect the cursor to work by cursor proximity / one point at a time, which it obviously doesn't do in the current x-oriented mode. also, the x-auto-range won't add a nice extra buffer as it does when auto-scaling y.

for static / non-interactive charts i think it'll work fine, though.

hkang1 · 2021-02-09T00:42:43Z

hey @leeoniya, big fan of uPlot! we incorporated it into Determined - open source deep learning platform for visualizing large datasets and it's been very performant! We are just looking into scatter plots as well, and could really use it if and when it becomes available.

We're able to replicate a scatter plot with the current form of uPlot as seen below:

But we don't have the ability to render the points differently via size or color (fill) depending on the point value. Here is an example of what we are trying to achieve (apologize for the really poor resolution on this image!):

One thought is to update the series[n].points properties (show, fill, size, space, stroke, width) to accept a callback for each. So to render different sizes based on value, series[n].points.size can take a callback that is in a form of something like:

(self: uPlot, seriesIdx: number, pointIdx: number): number | undefined => {
  // use the `pointIdx` to get the data value and return a different size based on the value.
  ...
};

Let me know what you think

leeoniya · 2021-02-11T19:49:00Z

hey @hkang1 , there's actually quite a bit more work here than meets the eye. hover points would also need to take this into account during interaction, not just draw. point size is used to determine whether or not to show or hide points base on data density, so that needs to be tweaked. once you start getting into larger points (like a bubble scatter chart), you need to actually have hover detection for the circle boundaries. and a lot more stuff. i'd like to avoid making a partial, non-holistic api adjustment that only solves a small part of the issue, and maybe not in an optimal way.

if it's sufficient for you to simply adjust sizes, you can just implement a custom points renderer that handles size variation: https://leeoniya.github.io/uPlot/demos/draw-hooks.html. but unless you only need this for static charts, i think you'll find that this alone leaves a lot to be desired.

i'm fairly confident that proper scatter and bubble support will land in the next 6 months, but cannot give a definitive timeline for it yet.

jjech · 2021-04-07T20:35:56Z

Any news on the scatterplot roadmap?

We've been able to make do with dygraphs for line graphs and Chart.js for scatterplots, but migrating to uPlot would greatly improve the user experience for our tool!

ghost · 2021-09-23T17:48:14Z

COSMOS is looking forward to adding X/Y support to our graphing tool. Thanks for such an awesome graphing library!

leeoniya · 2021-09-23T19:13:26Z

there is some support for this now (still being refined) via mode: 2 and series.facets api.

here's a demo, for early testing: https://leeoniya.github.io/uPlot/demos/scatter.html

a lot of the implementation is still done in userland (such as quadtree construction and path renderer).

hkang1 · 2022-01-07T05:43:15Z

there is some support for this now (still being refined) via mode: 2 and series.facets api.

here's a demo, for early testing: https://leeoniya.github.io/uPlot/demos/scatter.html

a lot of the implementation is still done in userland (such as quadtree construction and path renderer).

This demo has been great to work off of!

So we ran into a weird issue where the scatter plot doesn't render properly when:

there's only one data point
or that ALL of the data points are the same value. So in the scatter plot demo, if we redefine data2 to be the following:

let data2 = filledArr(series, v => [
  filledArr(points, i => randInt(100,100)),
  filledArr(points, i => randInt(100,100)),
  filledArr(points, i => randInt(1,10000)),  // bubble size, population
  filledArr(points, i => (Math.random() + 1).toString(36).substring(7)), // label / country name
]);

If there is enough of a variance of about 0.00001 between the data points, then the rendering works ok. Wondering if there is logic around calculating ranges where it's causing a division by 0 somewhere (e.g. (value - min) / (max - min) and where max === min)...

This is what happens in the scatter demo:

In our use case it causes the browser to crash hard without console logs:

const data = [
  null,
  [
    [ 32, 32 ],
    [ 0.5, 0.6 ],
    null,  // bubble size (slight modification to handle a single size when null)
    null,  // bubble color (slight modification to handle a single color when null)
    [ 'test a', 'test a' ],
  ],
];

When the data is slightly tweaked to:

const data = [
  null,
  [
    [ 32, 32.0000001 ],
    [ 0.5, 0.6 ],
    null,  // bubble size (slight modification to handle a single size when null)
    null,  // bubble color (slight modification to handle a single color when null)
    [ 'test a', 'test a' ],
  ],
];

then it renders somewhat ok, with the exception of the x-axis with the lack of precision:

Any clues or hints on what can be done to handle 1 data point case? Please let me know if there's anything else I can provide, thanks.

leeoniya · 2022-01-12T03:16:09Z

@hkang1 this is essentially a duplicate of #620.

any custom-supplied scale ranging functions must return a non-zero range. i've updated the demo to handle this in cb1e371

leeoniya added feature New feature or request future labels Jan 28, 2020

leeoniya mentioned this issue Jan 28, 2020

Available graph types and callbacks (documentation) #103

Closed

leeoniya mentioned this issue Mar 10, 2020

getIncrSpace errors if graph height 67px or less #146

Closed

leeoniya removed the future label Mar 13, 2020

leeoniya mentioned this issue Mar 16, 2020

Unexpected cursor snapping behavior with non-aligned series #155

Closed

This was referenced May 7, 2020

better chart performance liquidlabsio/fluidity#65

Open

scattergl lines sometimes draws random lines on some hardware plotly/plotly.js#3522

Closed

sausin mentioned this issue May 25, 2020

Pie / Donut #236

Closed

leeoniya mentioned this issue Nov 22, 2020

GraphNG: support x != time from the UI grafana/grafana#29288

Closed

leeoniya mentioned this issue Dec 12, 2020

Three "HelloWorld" Tutorials #399

Closed

milahu mentioned this issue Dec 17, 2020

feature: multiple x axis to combine different time precisions #405

Closed

leeoniya mentioned this issue Dec 20, 2020

hightlight when hovered over #409

Closed

leeoniya mentioned this issue Feb 28, 2021

X-Y Scatter Plots? #463

Open

jgehrcke mentioned this issue Apr 11, 2023

Add proof-of-concept / starting point for "list conceptual benchmarks" UI section conbench/conbench#1086

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scatter plots #107

scatter plots #107

leeoniya commented Jan 28, 2020 •

edited

Loading

ryantxu commented Jan 29, 2020

leeoniya commented Jan 29, 2020 •

edited

Loading

ryantxu commented Jan 29, 2020

leeoniya commented Jan 29, 2020

ryantxu commented Jan 29, 2020

backspaces commented Mar 21, 2020

leeoniya commented Mar 21, 2020 •

edited

Loading

leeoniya commented Mar 22, 2020 •

edited

Loading

leeoniya commented Mar 22, 2020

leeoniya commented Mar 25, 2020

tboerstad commented May 29, 2020 •

edited

Loading

leeoniya commented Jun 1, 2020

backspaces commented Dec 16, 2020

leeoniya commented Dec 16, 2020 •

edited

Loading

hkang1 commented Feb 9, 2021

leeoniya commented Feb 11, 2021

jjech commented Apr 7, 2021

ghost commented Sep 23, 2021

leeoniya commented Sep 23, 2021

hkang1 commented Jan 7, 2022 •

edited

Loading

leeoniya commented Jan 12, 2022

scatter plots #107

scatter plots #107

Comments

leeoniya commented Jan 28, 2020 • edited Loading

ryantxu commented Jan 29, 2020

leeoniya commented Jan 29, 2020 • edited Loading

ryantxu commented Jan 29, 2020

leeoniya commented Jan 29, 2020

ryantxu commented Jan 29, 2020

backspaces commented Mar 21, 2020

leeoniya commented Mar 21, 2020 • edited Loading

leeoniya commented Mar 22, 2020 • edited Loading

leeoniya commented Mar 22, 2020

leeoniya commented Mar 25, 2020

tboerstad commented May 29, 2020 • edited Loading

leeoniya commented Jun 1, 2020

backspaces commented Dec 16, 2020

leeoniya commented Dec 16, 2020 • edited Loading

hkang1 commented Feb 9, 2021

leeoniya commented Feb 11, 2021

jjech commented Apr 7, 2021

ghost commented Sep 23, 2021

leeoniya commented Sep 23, 2021

hkang1 commented Jan 7, 2022 • edited Loading

leeoniya commented Jan 12, 2022

leeoniya commented Jan 28, 2020 •

edited

Loading

leeoniya commented Jan 29, 2020 •

edited

Loading

leeoniya commented Mar 21, 2020 •

edited

Loading

leeoniya commented Mar 22, 2020 •

edited

Loading

tboerstad commented May 29, 2020 •

edited

Loading

leeoniya commented Dec 16, 2020 •

edited

Loading

hkang1 commented Jan 7, 2022 •

edited

Loading