Early data parsing, stacking by value and support object data #6576

kurkle · 2019-10-17T21:08:18Z

#6106 done a bit further.

Leading thoughts

data should be parsed only once, not each animation cycle or interaction update
if data is added, only new data needs to be parsed
core.datasetController implements common data parses that can be overridden
a dataset controller knows how to parse its (custom) data
dataset controller asks assigned scale's to parse their input
- a dataset does not need to directly support the format a scale supports
internally data is stored by scale id, so we know what scale the data belongs to (scale decides the format)
a _custom entry is available to contain any custom data (not related to scales) a dataset needs.

dataset controllers

parsing is split to functions for easy overriding: _parse, _parsePlainData, _parseArrayData, _parseObjectData and _parseCustomObjectData
doughnut (and descendants) override the whole _parse, because they are not using scales.
bar overrides _parseArrayData, for floatBar support
bubble overrides _parseCustomObjectData to parse r

Data formats

plain array: [1,2,3...]
- each value is parsed by "value scale" and each index is parsed by "index scale".
- usually index is the index of a label
object array: [{x: 1, y: 2}, ...]
- depending on dataset, but usually x is parsed by x-scale and y by y-scale.
- t for example is parsed by time scale. if both x and y axes are time scales, both get the same value.
- r for bubble is an example of _custom data that is not related to any scale.
array of arrays: [[1,2], [3,4], ...]
- By default, [[x,y],[x,y]]. For floatBar, [[start,end],[start,end]]
mixed array of array / plain
object
- by default is converted to array, where key is x and value is y

Pen
Pen - far far away
Pen - master

Fixes: #6103
Fixes: #5657
Fixes: #5405
Fixes: #5072
Fixes: #6437
Fixes: #6455
Closes: #6136
Closes: #6461
(probably some more)

src/controllers/controller.line.js

src/core/core.scale.js

src/scales/scale.logarithmic.js

benmccann

Can you update the PR description to explain the high-level idea / organization of this PR? (e.g. what parse in scale vs controller are each used for). And also why each of the changed tests are different?

Should this replace all usage of getRightValue (if not in this PR then eventually)? I think there's still about half a dozen left

Are there any performance impacts (positive or negative) on large charts? e.g. I wonder how it would affect the uPlot benchmarks

src/scales/scale.linear.js

src/scales/scale.logarithmic.js

src/controllers/controller.bar.js

src/core/core.datasetController.js

src/controllers/controller.line.js

src/core/core.datasetController.js

src/core/core.scale.js

benmccann · 2019-10-21T15:20:05Z

src/core/core.datasetController.js

@@ -75,6 +75,74 @@ function unlistenArrayEvents(array, listener) {
 	delete array._chartjs;
 }

+function storeCrossRef(scale, datasetIndex, scaleValue, datasetValue) {


Can you add a comment explaining what the cross-ref is?

benmccann · 2019-10-21T17:58:24Z

I just took a cursory glance at this PR in the Chrome debugger. I see _parse being called a bunch of times from addElements within initialize. And then a bunch from resyncElements within buildOrUpdateElements. I didn't look into it, but I wouldn't think we'd need to do it twice in a row

This PR seems about 10% slower overall on the uPlot benchmark, but it seems like that should be addressable

src/core/core.datasetController.js

kurkle · 2019-10-22T13:43:00Z

This PR goes way too far now, but I wanted to see where it could go.

There are some things that can be extracted to separate PR's.

Speed: first update is somewhat similar to current master (from chartjs.org - it does not seem to include all commit). So in reality its slower. Subsequent updates are a lot faster.

Try hiding / showing datasets in those two;
Line/time performance, this PR
Line/time performance, master

src/controllers/controller.line.js

src/core/core.datasetController.js

kurkle · 2019-10-23T16:58:38Z

Some benchmarks done in my laptop, using uPlot benchmark + plugin to measure between events:

master (3cb308d)
{
"2 update/total": 723.6,
"3 render/total": 140.68,
"x from page load": 868.38
}
early parse (1dc078e)
{
"2 update/total": 674.98,
"3 render/total": 153.54,
"x from page load": 834.67
}

chart.update()

master
{
"2 update/total": 552.65,
"3 render/total": 163.05
}
early parse
{
"2 update/total": 366.79,
"3 render/total": 167.91
}

So, first update is about the same and subsequent updates are faster (~30%).
Single runs, but gives a rough idea.

Same chart, draw + 50x update(). master takes 29..30 seconds. Early parse 18..19 seconds. More than 30% faster.

The speed improvement comes from caching the parsed data, and can surely be done without "early parsing" too.

benmccann

It seems like it should be possible to store the data internally as {x: 0, y: 1} instead of {index: 1, xScale0: 0, yScale0: 1}. Then in the base _parseObjectData case we wouldn't actually have to do any parsing at all, which should be a speedup (though maybe behind a skipParsing flag for 2.x, which says you've passed in only numbers)

src/controllers/controller.line.js

src/core/core.datasetController.js

benmccann · 2019-10-24T04:16:39Z

In 3.0 I'd really like to see us get rid of parsing altogether. It really slows things down. We could perhaps provide some data transformer utilities to help users get their data into the format we expect if we want to support multiple formats. @etimberg was asking about making the next version 3.0 (#6555 (comment)). Perhaps we should think about how we'd approach this if we didn't have to worry about backwards compatibility

benmccann · 2019-10-26T18:35:26Z

A couple thoughts about how we might approach this:

Remove notion of parsing. E.g. time scale input should be millis since epoch (not moment objects) and line scale input should be numbers (not strings)
Don't create _meta.data. It's a lot of copying and object creation that negatively impacts performance

kurkle · 2019-10-26T19:29:34Z

@benmccann from a performance point of view, you are correct. That approach however does not solve the issues I'm trying to solve here. Removing parsing would make this lib a bit harder to use in many cases - and IMO ease of use is one of the things making this lib so popular.

benmccann · 2019-10-26T21:21:44Z

We might be able to make parsing optional though. If you want to pass in data in our ideal format then we would not need to do parsing and if you want to be loose with the data then we'll massage it for you

I think the main thing you'd need to do is change it back so that it stores data as {x: 0, y: 1} instead of {xScale0: 0, yScale0: 1}

kurkle · 2019-10-27T07:35:22Z

I think the main thing you'd need to do is change it back so that it stores data as {x: 0, y: 1} instead of {xScale0: 0, yScale0: 1}

It helps avoiding isHorizontal() / getRightValue() calls by storing that way. We could default the axis id:s to 'x' and 'y' though.

benmccann · 2019-10-27T12:57:55Z

Another thing I was thinking is that I don't think the scales really need to be aware of parsing at all and we might be able to keep it all in the controllers. E.g. the linear and logarithmic scales right now need to know about floating bar charts in order to determineDataLimits. However, if we move determineDataLimits to the controller then I think things become a lot cleaner because we can contain all the parsing purely in the controllers. Then bar chart is the only one that needs to know about floating bars and we can keep the logic contained in a much smaller area

It helps avoiding isHorizontal() / getRightValue() calls by storing that way. We could default the axis id:s to 'x' and 'y' though.

We still call _getParsedValue, which is essentially the new getRightValue, so I don't think it'd make much different on that one. You would need to call isHorizontal, but I think it would end up being a lot faster because you would really only need to do that once per dataset, which would be a lot cheaper than having to construct a new dataset if the user passes in data in the desired format

kurkle · 2019-11-03T18:52:51Z

I refactored linkScales a bit too (CC was complaining about it). The tests done were more expensive than the assignment (and linkScales is not critical anyway), so changed it to always assign. And extraced getFirstScaleId (should be easier to change for #6626)

benmccann

Great initiative! Thanks so much for driving this!

I really like it overall. I think there are a couple more things we might be able to do based off of this or a couple things I would tweak, but I'd rather do those in follow up PRs to be able to get this in and build off it

docs/getting-started/v3-migration.md

src/controllers/controller.line.js

src/core/core.scale.js

src/scales/scale.category.js

kurkle · 2019-11-04T05:54:14Z

Benchmarks with latest version:

{
  "2 update/total": 620.79,
  "3 render/total": 177.54,
  "x from page load": 802.21
}

2nd update:

{
  "2 update/total": 284.52,
  "3 render/total": 124.38
}

50x update: ~17s

master

{
  "2 update/total": 703.42,
  "3 render/total": 143.66,
  "x from page load": 850.76
}

2nd update:

{
  "2 update/total": 521.75,
  "3 render/total": 127.67
}
Code 
}

50x update: ~27s

benmccann reviewed Oct 18, 2019

View reviewed changes

src/controllers/controller.line.js Outdated Show resolved Hide resolved

benmccann reviewed Oct 18, 2019

View reviewed changes

src/core/core.scale.js Outdated Show resolved Hide resolved

benmccann reviewed Oct 18, 2019

View reviewed changes

src/core/core.scale.js Outdated Show resolved Hide resolved

benmccann reviewed Oct 18, 2019

View reviewed changes

src/scales/scale.logarithmic.js Outdated Show resolved Hide resolved

benmccann reviewed Oct 18, 2019

View reviewed changes

kurkle commented Oct 18, 2019

View reviewed changes

src/scales/scale.linear.js Outdated Show resolved Hide resolved

kurkle commented Oct 18, 2019

View reviewed changes

src/scales/scale.logarithmic.js Outdated Show resolved Hide resolved

kurkle commented Oct 18, 2019

View reviewed changes

src/scales/scale.logarithmic.js Outdated Show resolved Hide resolved

This was referenced Oct 19, 2019

Fix logarithmic test to use correct scale #6580

Merged

Return correct index/value id in radar/polarArea #6581

Merged

kurkle force-pushed the early-parse-retry branch from e68d290 to e3a3a71 Compare October 20, 2019 05:29

benmccann reviewed Oct 21, 2019

View reviewed changes

src/controllers/controller.bar.js Outdated Show resolved Hide resolved

src/core/core.datasetController.js Show resolved Hide resolved

src/controllers/controller.line.js Outdated Show resolved Hide resolved

src/core/core.datasetController.js Outdated Show resolved Hide resolved

benmccann reviewed Oct 21, 2019

View reviewed changes

src/core/core.datasetController.js Outdated Show resolved Hide resolved

benmccann reviewed Oct 22, 2019

View reviewed changes

src/controllers/controller.line.js Show resolved Hide resolved

benmccann reviewed Oct 22, 2019

View reviewed changes

src/core/core.datasetController.js Outdated Show resolved Hide resolved

src/core/core.datasetController.js Show resolved Hide resolved

kurkle force-pushed the early-parse-retry branch from 9205cd6 to 8066422 Compare October 23, 2019 07:42

benmccann reviewed Oct 24, 2019

View reviewed changes

src/controllers/controller.line.js Outdated Show resolved Hide resolved

src/controllers/controller.line.js Outdated Show resolved Hide resolved

src/core/core.datasetController.js Outdated Show resolved Hide resolved

src/core/core.datasetController.js Outdated Show resolved Hide resolved

kurkle force-pushed the early-parse-retry branch from 1dc078e to bc3807b Compare October 25, 2019 05:36

kurkle force-pushed the early-parse-retry branch from bc3807b to a564802 Compare October 27, 2019 05:45

kurkle force-pushed the early-parse-retry branch from a564802 to 8b50c33 Compare October 27, 2019 18:01

benmccann mentioned this pull request Nov 1, 2019

[2.9] Stop unnecessary line calculations #6671

Merged

kurkle added 4 commits November 3, 2019 20:12

Early data parsing + stacking by value

0d4e841

Review comments

2781f3a

review comments

163f813

Remove reduntant parsing

9ea67ce

kurkle force-pushed the early-parse-retry branch from 8b7acd0 to 9ea67ce Compare November 3, 2019 18:18

Couple CC warnings

d12b00b

Optimize filterBetween

2e7d65e

kurkle force-pushed the early-parse-retry branch from b926551 to 2e7d65e Compare November 3, 2019 19:31

benmccann previously approved these changes Nov 3, 2019

View reviewed changes

benmccann reviewed Nov 4, 2019

View reviewed changes

docs/getting-started/v3-migration.md Outdated Show resolved Hide resolved

benmccann reviewed Nov 4, 2019

View reviewed changes

src/controllers/controller.line.js Show resolved Hide resolved

benmccann reviewed Nov 4, 2019

View reviewed changes

src/core/core.scale.js Show resolved Hide resolved

benmccann reviewed Nov 4, 2019

View reviewed changes

src/scales/scale.category.js Show resolved Hide resolved

More migration info

227b8f9

kurkle dismissed benmccann’s stale review via 227b8f9 November 4, 2019 05:28

etimberg approved these changes Nov 4, 2019

View reviewed changes

benmccann approved these changes Nov 4, 2019

View reviewed changes

benmccann mentioned this pull request Nov 4, 2019

Remove index and datasetIndex from Element #6687

Closed

etimberg merged commit 72df272 into chartjs:master Nov 4, 2019

kurkle mentioned this pull request Nov 5, 2019

Remove duplicate array equality function #6693

Merged

benmccann mentioned this pull request Nov 5, 2019

Consider new format for passing data in #6696

Closed

kurkle deleted the early-parse-retry branch November 13, 2019 06:12

This was referenced Nov 13, 2019

Time scale - points and bars not lining up with gridlines #5918

Closed

Tooltip titles have the wrong value when datapoints are a subset of the labels array #6539

Closed

This was referenced Jan 2, 2020

Stacked chart issue for time scale #5484

Closed

Bars do not appear at correct X axis position when specified in {x, y} format #5289

Closed

sdehm mentioned this pull request Feb 24, 2021

v3 Gaps in stacked bar chart when datasets removed #8519

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Early data parsing, stacking by value and support object data #6576

Early data parsing, stacking by value and support object data #6576

kurkle commented Oct 17, 2019 •

edited

Loading

benmccann left a comment

benmccann Oct 21, 2019

benmccann commented Oct 21, 2019 •

edited

Loading

kurkle commented Oct 22, 2019

kurkle commented Oct 23, 2019

benmccann left a comment •

edited

Loading

benmccann commented Oct 24, 2019

benmccann commented Oct 26, 2019

kurkle commented Oct 26, 2019

benmccann commented Oct 26, 2019

kurkle commented Oct 27, 2019

benmccann commented Oct 27, 2019

kurkle commented Nov 3, 2019

benmccann left a comment

kurkle commented Nov 4, 2019

Early data parsing, stacking by value and support object data #6576

Early data parsing, stacking by value and support object data #6576

Conversation

kurkle commented Oct 17, 2019 • edited Loading

benmccann left a comment

Choose a reason for hiding this comment

benmccann Oct 21, 2019

Choose a reason for hiding this comment

benmccann commented Oct 21, 2019 • edited Loading

kurkle commented Oct 22, 2019

kurkle commented Oct 23, 2019

chart.update()

benmccann left a comment • edited Loading

Choose a reason for hiding this comment

benmccann commented Oct 24, 2019

benmccann commented Oct 26, 2019

kurkle commented Oct 26, 2019

benmccann commented Oct 26, 2019

kurkle commented Oct 27, 2019

benmccann commented Oct 27, 2019

kurkle commented Nov 3, 2019

benmccann left a comment

Choose a reason for hiding this comment

kurkle commented Nov 4, 2019

kurkle commented Oct 17, 2019 •

edited

Loading

benmccann commented Oct 21, 2019 •

edited

Loading

benmccann left a comment •

edited

Loading