-
-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scatter plots #107
Comments
I don't understand why this would need a different data format/layout from the line variant? I would expect something like:
or something |
if you look at the linked example, each point needs to encode at least x,y and the size of the point (i'm calling it "v"alue). i guess the per-point label can be left out, but i feel like it's pretty fundamental to scatter plots. in addition to this, my gut feeling is that scatterplots are less likely to be easily alignable than line charts (the majority of which are time series). if they cannot be aligned, then i cannot do a simple binary search as i do with uPlot.Line, and need a spatial index (quadtree, kd tree, etc.). if your a1, a2 and a3 are objects, then 100k of these will take up a lot of memory. to avoid this, uPlot sticks to using flat arrays. fundamentally, i think scatter plots are different enough to justify a different data format for performance reasons. a lot of charting libs fall for the temptation of complete uniformity across chart types, or more human-friendly formats and pay for it with performance; there's a reason why uPlot.Line is as fast as it is. |
Obviously many ways to skin this ;) I like your columnar approach in the line chart. Plotly uses something worth looking at -- essentially a vector for each of the attributes: x,y,size,text, etc: https://plot.ly/javascript/line-and-scatter/#data-labels-on-the-plot
|
yea, that feels right...perhaps with a tweaked uPlot take: [
[ // series 1
[1,2,3], // x
[10,20,30], // y
[2.2,1.5,6.5], // v
["a","b","c"], // l
],
[ // series 2
[1,2,3], // x
[10,20,30], // y
[2.2,1.5,6.5], // v
["a","b","c"], // l
]
] nice benefit is we can use typed arrays too, and the value and label arrays can be optional in a series. |
FYI, what you list above is essentially the native grafana data format -- that is optionally backed by apache arrow tables |
Not to be a grumpy old man (which I am!), I'd favor scatter plots only if they do not significantly increase the size of the library. I say this as someone currently using chartjs and desperately need a smaller library like yours! How in the world do you keep it so small! Sounds like a good medium article. |
scatter should not add much code (definitely still within 30K). also, scatter would be feature-gated like many uPlot's features, which can be compiled out: Lines 36 to 43 in c912bec
i'm not sure if i'll even need a spatial index, where the indexing costs may outweigh the querying costs. most scatter plots are within 1k points, so a dumb linear scan might be quite sufficient. if i do ingest a spatial index, i'm able to get flatbush down to 3.66 KB [1] and kdbush [2] is even smaller if i don't account for variable point diameters when testing cursor proximity. the main issue here is one of different data layout. a lot of internal util functions & loops assume aligned data across series. i would need to create an additional branch in every place that references
i have the opposite question: how are the other libraries so huge!? some include data parsing & statistical aggregation, animations, declarative options for every possible combo of desires and styles. some have complex area fillers for stacked series and include radar, donut, pie, and other chart types. if they do timezone or DST handling, many include Luxon or Moment (which are huge), while uPlot relies on a neat hack i found [4] (but does not support IE11). Another reason is that uPlot is monolithic and not prototype of class based - many things live inside the uPlot constructor closure, so most variables can be mangled and minified without issue; uPlot only publicly exposes what the user-facing API needs. the drawback to this of course is that a lot of the code has to live in one giant 2K LOC file [5] and some difficulty around adding an additional data format like scatter. Chart.js, for example, is more java-esque where every component is a derived class, but must expose all its non-minifyable innards as private APIs. as with everything, trade-offs abound. [1] mourner/flatbush#27 (comment) |
sneak peek 30,000 scatter points (10k per series) in 100ms: this is without building a spatial point index, which turns out to be a fairly expensive task (~40ms for kdbush). if the points were square (instead of circles) and pixel-aligned, this would likely run 30-50% faster since there'd be no need for anti-aliasing. |
rendering solid circles instead of hollow circles shaves 20-25ms (since hollow circles are stroked and filled by separate Path2D objects). |
ok, so i optimized point rendering in 12b6beb, to use a single Path2D. since i had no baseline for whether 75ms was slow or fast for a 30k point scatter plot. i tried the same with the next-fastest lib (Chart.js 3.0 alpha). spoiler: it's was never actually 75ms (that's just JS exec time): uPlot:
Chart.js 3.0 alpha:
so about 2x as fast for init (but still without building a spatial index). and much faster for series toggle, even with rebuilding all paths. once i cache the Path2D objects, toggle will be faster still (by a lot). another question is whether the design should accommodate multi-scale scatter plots. these are not very common, but they do exist: |
I discovered uPlot lately on Hacker News, and was really impressed by the speed. I have a project where speed has been important, a web page for creating scatter plots from CSV files. I'm currently using Plotly.js, with webgl option, and it's been quite performant, but not anything near uPlot. Plotly also has a lot more bells and whistles than I need. I just wanted to let everyone know that exists an excited user who is waiting for scatter plots! If people reading this are shaking their heads at CSV and performance in the same sentence, they'd be correct, as CSV parsing is currently the slowest part of the process. |
@tboerstad csvplot looks cool :) i paused on scatter when i ran into various questions about how and if y auto-scaling should work and some other internals that assume aligned data, etc. - quite a lot of the core needs to be i'd like to figure out #184 before moving forward here. |
Just a clarification: we may want to include "points" in this: I needed a points graph, i.e. sorting x,y pairs. By simply sorting by x, then extracting the x, y arrays, it worked fine. And adjacent x values can be the same, no issue. So the distinction between "points" and "scatter" graphs might be useful to make. |
well, mostly. you'll run into issues with being unable to hover a point properly and zooming will get wonky if either edge ends up in a same-xs territory since there's a lot internally that relies on a binary search which will fail to converge to one value. for scatter, you also expect the cursor to work by cursor proximity / one point at a time, which it obviously doesn't do in the current x-oriented mode. also, the x-auto-range won't add a nice extra buffer as it does when auto-scaling y. for static / non-interactive charts i think it'll work fine, though. |
hey @leeoniya, big fan of uPlot! we incorporated it into Determined - open source deep learning platform for visualizing large datasets and it's been very performant! We are just looking into scatter plots as well, and could really use it if and when it becomes available. We're able to replicate a scatter plot with the current form of uPlot as seen below: But we don't have the ability to render the points differently via size or color (fill) depending on the point value. Here is an example of what we are trying to achieve (apologize for the really poor resolution on this image!): One thought is to update the
Let me know what you think |
hey @hkang1 , there's actually quite a bit more work here than meets the eye. hover points would also need to take this into account during interaction, not just draw. point size is used to determine whether or not to show or hide points base on data density, so that needs to be tweaked. once you start getting into larger points (like a bubble scatter chart), you need to actually have hover detection for the circle boundaries. and a lot more stuff. i'd like to avoid making a partial, non-holistic api adjustment that only solves a small part of the issue, and maybe not in an optimal way. if it's sufficient for you to simply adjust sizes, you can just implement a custom points renderer that handles size variation: https://leeoniya.github.io/uPlot/demos/draw-hooks.html. but unless you only need this for static charts, i think you'll find that this alone leaves a lot to be desired. i'm fairly confident that proper scatter and bubble support will land in the next 6 months, but cannot give a definitive timeline for it yet. |
Any news on the scatterplot roadmap? We've been able to make do with dygraphs for line graphs and Chart.js for scatterplots, but migrating to uPlot would greatly improve the user experience for our tool! |
COSMOS is looking forward to adding X/Y support to our graphing tool. Thanks for such an awesome graphing library! |
there is some support for this now (still being refined) via here's a demo, for early testing: https://leeoniya.github.io/uPlot/demos/scatter.html a lot of the implementation is still done in userland (such as quadtree construction and path renderer). |
This demo has been great to work off of! So we ran into a weird issue where the scatter plot doesn't render properly when:
If there is enough of a variance of about 0.00001 between the data points, then the rendering works ok. Wondering if there is logic around calculating ranges where it's causing a division by 0 somewhere (e.g. (value - min) / (max - min) and where max === min)... This is what happens in the scatter demo: In our use case it causes the browser to crash hard without console logs:
When the data is slightly tweaked to:
then it renders somewhat ok, with the exception of the x-axis with the lack of precision: Any clues or hints on what can be done to handle 1 data point case? Please let me know if there's anything else I can provide, thanks. |
uPlot.Scatter
e.g. https://academy.datawrapper.de/article/65-how-to-create-a-scatter-plot
spatial index via e.g. https://github.com/mourner/kdbush or https://github.com/mourner/flatbush
data format e.g.:
v
value (size of point/shape)l
labelshould use a Path2D shape cache
The text was updated successfully, but these errors were encountered: