Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early data parsing, stacking by value and support object data #6576

Merged
merged 7 commits into from
Nov 4, 2019

Conversation

kurkle
Copy link
Member

@kurkle kurkle commented Oct 17, 2019

#6106 done a bit further.

Leading thoughts

  • data should be parsed only once, not each animation cycle or interaction update
  • if data is added, only new data needs to be parsed
  • core.datasetController implements common data parses that can be overridden
  • a dataset controller knows how to parse its (custom) data
  • dataset controller asks assigned scale's to parse their input
    • a dataset does not need to directly support the format a scale supports
  • internally data is stored by scale id, so we know what scale the data belongs to (scale decides the format)
  • a _custom entry is available to contain any custom data (not related to scales) a dataset needs.

dataset controllers

  • parsing is split to functions for easy overriding: _parse, _parsePlainData, _parseArrayData, _parseObjectData and _parseCustomObjectData
  • doughnut (and descendants) override the whole _parse, because they are not using scales.
  • bar overrides _parseArrayData, for floatBar support
  • bubble overrides _parseCustomObjectData to parse r

Data formats

  • plain array: [1,2,3...]
    • each value is parsed by "value scale" and each index is parsed by "index scale".
    • usually index is the index of a label
  • object array: [{x: 1, y: 2}, ...]
    • depending on dataset, but usually x is parsed by x-scale and y by y-scale.
    • t for example is parsed by time scale. if both x and y axes are time scales, both get the same value.
    • r for bubble is an example of _custom data that is not related to any scale.
  • array of arrays: [[1,2], [3,4], ...]
    • By default, [[x,y],[x,y]]. For floatBar, [[start,end],[start,end]]
  • mixed array of array / plain
  • object
    • by default is converted to array, where key is x and value is y

Pen
Pen - far far away
Pen - master

Fixes: #6103
Fixes: #5657
Fixes: #5405
Fixes: #5072
Fixes: #6437
Fixes: #6455
Closes: #6136
Closes: #6461
(probably some more)

src/core/core.scale.js Outdated Show resolved Hide resolved
src/core/core.scale.js Outdated Show resolved Hide resolved
Copy link
Contributor

@benmccann benmccann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the PR description to explain the high-level idea / organization of this PR? (e.g. what parse in scale vs controller are each used for). And also why each of the changed tests are different?

Should this replace all usage of getRightValue (if not in this PR then eventually)? I think there's still about half a dozen left

Are there any performance impacts (positive or negative) on large charts? e.g. I wonder how it would affect the uPlot benchmarks

src/scales/scale.linear.js Outdated Show resolved Hide resolved
src/controllers/controller.bar.js Outdated Show resolved Hide resolved
src/core/core.datasetController.js Show resolved Hide resolved
src/controllers/controller.line.js Outdated Show resolved Hide resolved
src/core/core.datasetController.js Outdated Show resolved Hide resolved
src/core/core.datasetController.js Outdated Show resolved Hide resolved
src/core/core.datasetController.js Outdated Show resolved Hide resolved
src/core/core.scale.js Outdated Show resolved Hide resolved
@@ -75,6 +75,74 @@ function unlistenArrayEvents(array, listener) {
delete array._chartjs;
}

function storeCrossRef(scale, datasetIndex, scaleValue, datasetValue) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment explaining what the cross-ref is?

@benmccann
Copy link
Contributor

benmccann commented Oct 21, 2019

I just took a cursory glance at this PR in the Chrome debugger. I see _parse being called a bunch of times from addElements within initialize. And then a bunch from resyncElements within buildOrUpdateElements. I didn't look into it, but I wouldn't think we'd need to do it twice in a row

This PR seems about 10% slower overall on the uPlot benchmark, but it seems like that should be addressable

@kurkle
Copy link
Member Author

kurkle commented Oct 22, 2019

This PR goes way too far now, but I wanted to see where it could go.

There are some things that can be extracted to separate PR's.

Speed: first update is somewhat similar to current master (from chartjs.org - it does not seem to include all commit). So in reality its slower. Subsequent updates are a lot faster.

Try hiding / showing datasets in those two;
Line/time performance, this PR
Line/time performance, master

src/core/core.datasetController.js Outdated Show resolved Hide resolved
src/core/core.datasetController.js Show resolved Hide resolved
@kurkle kurkle force-pushed the early-parse-retry branch from 9205cd6 to 8066422 Compare October 23, 2019 07:42
@kurkle
Copy link
Member Author

kurkle commented Oct 23, 2019

Some benchmarks done in my laptop, using uPlot benchmark + plugin to measure between events:

master (3cb308d)
{
"2 update/total": 723.6,
"3 render/total": 140.68,
"x from page load": 868.38
}
early parse (1dc078e)
{
"2 update/total": 674.98,
"3 render/total": 153.54,
"x from page load": 834.67
}

chart.update()

master
{
"2 update/total": 552.65,
"3 render/total": 163.05
}
early parse
{
"2 update/total": 366.79,
"3 render/total": 167.91
}

So, first update is about the same and subsequent updates are faster (~30%).
Single runs, but gives a rough idea.

Same chart, draw + 50x update(). master takes 29..30 seconds. Early parse 18..19 seconds. More than 30% faster.

The speed improvement comes from caching the parsed data, and can surely be done without "early parsing" too.

Copy link
Contributor

@benmccann benmccann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it should be possible to store the data internally as {x: 0, y: 1} instead of {index: 1, xScale0: 0, yScale0: 1}. Then in the base _parseObjectData case we wouldn't actually have to do any parsing at all, which should be a speedup (though maybe behind a skipParsing flag for 2.x, which says you've passed in only numbers)

src/controllers/controller.line.js Outdated Show resolved Hide resolved
src/controllers/controller.line.js Outdated Show resolved Hide resolved
src/core/core.datasetController.js Outdated Show resolved Hide resolved
src/core/core.datasetController.js Outdated Show resolved Hide resolved
@benmccann
Copy link
Contributor

In 3.0 I'd really like to see us get rid of parsing altogether. It really slows things down. We could perhaps provide some data transformer utilities to help users get their data into the format we expect if we want to support multiple formats. @etimberg was asking about making the next version 3.0 (#6555 (comment)). Perhaps we should think about how we'd approach this if we didn't have to worry about backwards compatibility

@kurkle kurkle force-pushed the early-parse-retry branch from 1dc078e to bc3807b Compare October 25, 2019 05:36
@benmccann
Copy link
Contributor

A couple thoughts about how we might approach this:

  • Remove notion of parsing. E.g. time scale input should be millis since epoch (not moment objects) and line scale input should be numbers (not strings)
  • Don't create _meta.data. It's a lot of copying and object creation that negatively impacts performance

@kurkle
Copy link
Member Author

kurkle commented Oct 26, 2019

@benmccann from a performance point of view, you are correct. That approach however does not solve the issues I'm trying to solve here. Removing parsing would make this lib a bit harder to use in many cases - and IMO ease of use is one of the things making this lib so popular.

@benmccann
Copy link
Contributor

We might be able to make parsing optional though. If you want to pass in data in our ideal format then we would not need to do parsing and if you want to be loose with the data then we'll massage it for you

I think the main thing you'd need to do is change it back so that it stores data as {x: 0, y: 1} instead of {xScale0: 0, yScale0: 1}

@kurkle kurkle force-pushed the early-parse-retry branch from bc3807b to a564802 Compare October 27, 2019 05:45
@kurkle
Copy link
Member Author

kurkle commented Oct 27, 2019

I think the main thing you'd need to do is change it back so that it stores data as {x: 0, y: 1} instead of {xScale0: 0, yScale0: 1}

It helps avoiding isHorizontal() / getRightValue() calls by storing that way. We could default the axis id:s to 'x' and 'y' though.

@benmccann
Copy link
Contributor

Another thing I was thinking is that I don't think the scales really need to be aware of parsing at all and we might be able to keep it all in the controllers. E.g. the linear and logarithmic scales right now need to know about floating bar charts in order to determineDataLimits. However, if we move determineDataLimits to the controller then I think things become a lot cleaner because we can contain all the parsing purely in the controllers. Then bar chart is the only one that needs to know about floating bars and we can keep the logic contained in a much smaller area

It helps avoiding isHorizontal() / getRightValue() calls by storing that way. We could default the axis id:s to 'x' and 'y' though.

We still call _getParsedValue, which is essentially the new getRightValue, so I don't think it'd make much different on that one. You would need to call isHorizontal, but I think it would end up being a lot faster because you would really only need to do that once per dataset, which would be a lot cheaper than having to construct a new dataset if the user passes in data in the desired format

@kurkle kurkle force-pushed the early-parse-retry branch from a564802 to 8b50c33 Compare October 27, 2019 18:01
@kurkle kurkle force-pushed the early-parse-retry branch from 8b7acd0 to 9ea67ce Compare November 3, 2019 18:18
@kurkle
Copy link
Member Author

kurkle commented Nov 3, 2019

I refactored linkScales a bit too (CC was complaining about it). The tests done were more expensive than the assignment (and linkScales is not critical anyway), so changed it to always assign. And extraced getFirstScaleId (should be easier to change for #6626)

@kurkle kurkle force-pushed the early-parse-retry branch from b926551 to 2e7d65e Compare November 3, 2019 19:31
benmccann
benmccann previously approved these changes Nov 3, 2019
Copy link
Contributor

@benmccann benmccann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great initiative! Thanks so much for driving this!

I really like it overall. I think there are a couple more things we might be able to do based off of this or a couple things I would tweak, but I'd rather do those in follow up PRs to be able to get this in and build off it

@kurkle
Copy link
Member Author

kurkle commented Nov 4, 2019

Benchmarks with latest version:

{
  "2 update/total": 620.79,
  "3 render/total": 177.54,
  "x from page load": 802.21
}

2nd update:

{
  "2 update/total": 284.52,
  "3 render/total": 124.38
}

50x update: ~17s

master

{
  "2 update/total": 703.42,
  "3 render/total": 143.66,
  "x from page load": 850.76
}

2nd update:

{
  "2 update/total": 521.75,
  "3 render/total": 127.67
}
Code 
}

50x update: ~27s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment