Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PERF] evaluation: (much) faster array formulas #4589

Closed
wants to merge 4 commits into from

Conversation

LucasLefevre
Copy link
Collaborator

Description:

description of this task, what is implemented and why it is implemented that way.

Task: : TASK_ID

review checklist

  • feature is organized in plugin, or UI components
  • support of duplicate sheet (deep copy)
  • in model/core: ranges are Range object, and can be adapted (adaptRanges)
  • in model/UI: ranges are strings (to show the user)
  • undo-able commands (uses this.history.update)
  • multiuser-able commands (has inverse commands and transformations where needed)
  • new/updated/removed commands are documented
  • exportable in excel
  • translations (_t("qmsdf %s", abc))
  • unit tested
  • clean commented code
  • track breaking changes
  • doc is rebuild (npm run doc)
  • status is correct in Odoo

@robodoo
Copy link
Collaborator

robodoo commented Jul 4, 2024

Pull request status dashboard

@LucasLefevre LucasLefevre force-pushed the 17.0-perf-spread-lul branch 2 times, most recently from 0ac833a to a9cb13c Compare July 5, 2024 06:38
@LucasLefevre LucasLefevre force-pushed the 17.0-perf-spread-lul branch 3 times, most recently from aa44902 to 59e4205 Compare July 5, 2024 06:56
@LucasLefevre LucasLefevre changed the title 17.0 perf spread lul [PERF] (much) faster array formulas Jul 5, 2024
@LucasLefevre LucasLefevre changed the title [PERF] (much) faster array formulas [PERF] evaluation: (much) faster array formulas Jul 5, 2024
@LucasLefevre LucasLefevre force-pushed the 17.0-perf-spread-lul branch 2 times, most recently from 2477b09 to 5b14256 Compare July 5, 2024 07:01
`JetSet.deleteMany` only deletes the first element found
in the set, instead of deleting all elements.

The reason is `wasDeleted ||= super.delete(element);`.

It's equivalent to `wasDeleted = wasDeleted || super.delete(element);`
and as soon as `wasDeleted` is true (when the first element is deleted)
the right-hand part (`|| super.delete(element)`) is no longer executed.

I simplified the method by removing the return value. It's never used
and this data structure has been removed in future versions anyway.

Note that it didn't cause any actual functional bug; only some useless
computation in some cases.

Task: 4036899
@LucasLefevre LucasLefevre force-pushed the 17.0-perf-spread-lul branch from 5b14256 to df737cd Compare July 5, 2024 07:04
@rrahir
Copy link
Collaborator

rrahir commented Jul 5, 2024

small question about the third commit message:

we look for the dependencies one position in the spreadsheet zone at a time.

Did you mean spread/spreaded zone?

When spreading the result of an array formula, we invalidate formulas
depending on the spread zone. Currently, we look for the dependencies
one position in the spread zone at a time.

It's much more efficient to group all those individual positions in zones
and call `this.getCellsDependingOn(...)` with those zones.

This commit reduces the time to evaluate the large array formula
dataset from 60+ seconds to ~36 seconds.

Task: 4036899
Let's say we have the following array formulas:

in B1: =transpose(transpose(A1:A2))
in C1: =transpose(transpose(B1:B2))
in D1: =transpose(transpose(C1:C2))

Each columns depends on the array formula in the previous column.

Initially, all cells are evaluated in the order B1,C1,D1

When B1 is evaluated, all cells that depends on its result (B1 and B2)
are marked to be re-evaluated in another iteration.
=> C1 and D1 are added to `this.nextPositionsToUpdate`

At the next iteration (which evaluated C1 and D1), when C1 is evaluated
it goes again: D1 is added to `this.nextPositionsToUpdate` for a third
iteration.

Coming back to the initial iteration: even if C1 and D1 are in
`this.nextPositionsToUpdate`, they are later computed in the *current*
iteration (remember the initial iteration evaluates B1,C1,D1).
It's useless to re-compute them since they are already computed (and
no other result invalidated their result)

The evaluation of the large array formula dataset goes from ~36s
to ~750ms

Task: 4036899
@LucasLefevre LucasLefevre force-pushed the 17.0-perf-spread-lul branch from df737cd to ddd324e Compare July 5, 2024 09:06
@LucasLefevre
Copy link
Collaborator Author

small question about the third commit message:

we look for the dependencies one position in the spreadsheet zone at a time.

Did you mean spread/spreaded zone?

Yes, my fingers are typing themselves mechanically. I'm writing the word "spreadsheet" too much 😄
Fixed

@VincentSchippefilt
Copy link
Collaborator

robodoo r+

@robodoo
Copy link
Collaborator

robodoo commented Jul 5, 2024

@LucasLefevre @VincentSchippefilt because this PR has multiple commits, I need to know how to merge it:

  • merge to merge directly, using the PR as merge commit message
  • rebase-merge to rebase and merge, using the PR as merge commit message
  • rebase-ff to rebase and fast-forward

@VincentSchippefilt
Copy link
Collaborator

robodoo rebase-ff

robodoo pushed a commit that referenced this pull request Jul 5, 2024
Task: 4036899
Part-of: #4589
Signed-off-by: Vincent Schippefilt (vsc) <[email protected]>
robodoo pushed a commit that referenced this pull request Jul 5, 2024
`JetSet.deleteMany` only deletes the first element found
in the set, instead of deleting all elements.

The reason is `wasDeleted ||= super.delete(element);`.

It's equivalent to `wasDeleted = wasDeleted || super.delete(element);`
and as soon as `wasDeleted` is true (when the first element is deleted)
the right-hand part (`|| super.delete(element)`) is no longer executed.

I simplified the method by removing the return value. It's never used
and this data structure has been removed in future versions anyway.

Note that it didn't cause any actual functional bug; only some useless
computation in some cases.

Task: 4036899
Part-of: #4589
Signed-off-by: Vincent Schippefilt (vsc) <[email protected]>
robodoo pushed a commit that referenced this pull request Jul 5, 2024
When spreading the result of an array formula, we invalidate formulas
depending on the spread zone. Currently, we look for the dependencies
one position in the spread zone at a time.

It's much more efficient to group all those individual positions in zones
and call `this.getCellsDependingOn(...)` with those zones.

This commit reduces the time to evaluate the large array formula
dataset from 60+ seconds to ~36 seconds.

Task: 4036899
Part-of: #4589
Signed-off-by: Vincent Schippefilt (vsc) <[email protected]>
robodoo pushed a commit that referenced this pull request Jul 5, 2024
Let's say we have the following array formulas:

in B1: =transpose(transpose(A1:A2))
in C1: =transpose(transpose(B1:B2))
in D1: =transpose(transpose(C1:C2))

Each columns depends on the array formula in the previous column.

Initially, all cells are evaluated in the order B1,C1,D1

When B1 is evaluated, all cells that depends on its result (B1 and B2)
are marked to be re-evaluated in another iteration.
=> C1 and D1 are added to `this.nextPositionsToUpdate`

At the next iteration (which evaluated C1 and D1), when C1 is evaluated
it goes again: D1 is added to `this.nextPositionsToUpdate` for a third
iteration.

Coming back to the initial iteration: even if C1 and D1 are in
`this.nextPositionsToUpdate`, they are later computed in the *current*
iteration (remember the initial iteration evaluates B1,C1,D1).
It's useless to re-compute them since they are already computed (and
no other result invalidated their result)

The evaluation of the large array formula dataset goes from ~36s
to ~750ms

closes #4589

Task: 4036899
Signed-off-by: Vincent Schippefilt (vsc) <[email protected]>
@robodoo
Copy link
Collaborator

robodoo commented Jul 5, 2024

Merge method set to rebase and fast-forward.

@fw-bot fw-bot deleted the 17.0-perf-spread-lul branch July 19, 2024 09:47
rrahir added a commit that referenced this pull request Nov 14, 2024
How to reproduce:
-> see the attached test

What is happening?
During the evaluation of SUMIFS, it will fetch its ranges.
First it will first fetch an empty range (E4:E7)  since MUNIT hasn't
been evaluated yet.
Afterwards, it will fetch H4, which in turn will request C4
and therefore force the evaluation of the spreaded function MUNIT.

AT this point, there is a disparity of values because the values
of the range (E4:E7) have changed.
To address those situations, the evaluator invalidates the dependencies
of all the spreaded zone. Which means marking A1 to be recomputed.

Unfortunately, A1 is still being evaluated and since #4589 (specifically
commit 8d22e86) we remove the cell from the queue after we evaluted it.

This revision changes the approach taken in 8d22e86 to allow
legitimate invalidations take place while avoiding useless
reevaluations.

Task: 4252800
rrahir added a commit that referenced this pull request Nov 14, 2024
How to reproduce:
-> see the attached test

What is happening?
During the evaluation of SUMIFS, it will fetch its ranges.
First it will first fetch an empty range (E4:E7)  since MUNIT hasn't
been evaluated yet.
Afterwards, it will fetch H4, which in turn will request C4
and therefore force the evaluation of the spreaded function MUNIT.

AT this point, there is a disparity of values because the values
of the range (E4:E7) have changed.
To address those situations, the evaluator invalidates the dependencies
of all the spreaded zone. Which means marking A1 to be recomputed.

Unfortunately, A1 is still being evaluated and since #4589 (specifically
commit 8d22e86) we remove the cell from the queue after we evaluted it.

This revision changes the approach taken in 8d22e86 to allow
legitimate invalidations take place while avoiding useless
reevaluations.

Task: 4252800
LucasLefevre pushed a commit that referenced this pull request Nov 14, 2024
How to reproduce:
-> see the attached test

What is happening?
During the evaluation of SUMIFS, it will fetch its ranges.
First it will first fetch an empty range (E4:E7)  since MUNIT hasn't
been evaluated yet.
Afterwards, it will fetch H4, which in turn will request C4
and therefore force the evaluation of the spreaded function MUNIT.

AT this point, there is a disparity of values because the values
of the range (E4:E7) have changed.
To address those situations, the evaluator invalidates the dependencies
of all the spreaded zone. Which means marking A1 to be recomputed.

Unfortunately, A1 is still being evaluated and since #4589 (specifically
commit 8d22e86) we remove the cell from the queue after we evaluted it.

This revision changes the approach taken in 8d22e86 to allow
legitimate invalidations take place while avoiding useless
reevaluations.

Task: 4252800
rrahir added a commit that referenced this pull request Nov 14, 2024
How to reproduce:
-> see the attached test

What is happening?
During the evaluation of SUMIFS, it will fetch its ranges.
First it will first fetch an empty range (E4:E7)  since MUNIT hasn't
been evaluated yet.
Afterwards, it will fetch H4, which in turn will request C4
and therefore force the evaluation of the spreaded function MUNIT.

AT this point, there is a disparity of values because the values
of the range (E4:E7) have changed.
To address those situations, the evaluator invalidates the dependencies
of all the spreaded zone. Which means marking A1 to be recomputed.

Unfortunately, A1 is still being evaluated and since #4589 (specifically
commit 8d22e86) we remove the cell from the queue after we evaluted it.

This revision changes the approach taken in 8d22e86 to allow
legitimate invalidations take place while avoiding useless
reevaluations.

Task: 4252800
robodoo pushed a commit that referenced this pull request Nov 15, 2024
How to reproduce:
-> see the attached test

What is happening?
During the evaluation of SUMIFS, it will fetch its ranges.
First it will first fetch an empty range (E4:E7)  since MUNIT hasn't
been evaluated yet.
Afterwards, it will fetch H4, which in turn will request C4
and therefore force the evaluation of the spreaded function MUNIT.

AT this point, there is a disparity of values because the values
of the range (E4:E7) have changed.
To address those situations, the evaluator invalidates the dependencies
of all the spreaded zone. Which means marking A1 to be recomputed.

Unfortunately, A1 is still being evaluated and since #4589 (specifically
commit 8d22e86) we remove the cell from the queue after we evaluted it.

This revision changes the approach taken in 8d22e86 to allow
legitimate invalidations take place while avoiding useless
reevaluations.

Task: 4252800
Part-of: #5209
Signed-off-by: Lucas Lefèvre (lul) <[email protected]>
robodoo pushed a commit that referenced this pull request Nov 15, 2024
How to reproduce:
-> see the attached test

What is happening?
During the evaluation of SUMIFS, it will fetch its ranges.
First it will first fetch an empty range (E4:E7)  since MUNIT hasn't
been evaluated yet.
Afterwards, it will fetch H4, which in turn will request C4
and therefore force the evaluation of the spreaded function MUNIT.

AT this point, there is a disparity of values because the values
of the range (E4:E7) have changed.
To address those situations, the evaluator invalidates the dependencies
of all the spreaded zone. Which means marking A1 to be recomputed.

Unfortunately, A1 is still being evaluated and since #4589 (specifically
commit 8d22e86) we remove the cell from the queue after we evaluted it.

This revision changes the approach taken in 8d22e86 to allow
legitimate invalidations take place while avoiding useless
reevaluations.

Task: 4252800
Part-of: #5209
Signed-off-by: Lucas Lefèvre (lul) <[email protected]>
rrahir added a commit that referenced this pull request Nov 15, 2024
How to reproduce:
-> see the attached test

What is happening?
During the evaluation of SUMIFS, it will fetch its ranges.
First it will first fetch an empty range (E4:E7)  since MUNIT hasn't
been evaluated yet.
Afterwards, it will fetch H4, which in turn will request C4
and therefore force the evaluation of the spreaded function MUNIT.

AT this point, there is a disparity of values because the values
of the range (E4:E7) have changed.
To address those situations, the evaluator invalidates the dependencies
of all the spreaded zone. Which means marking A1 to be recomputed.

Unfortunately, A1 is still being evaluated and since #4589 (specifically
commit 8d22e86) we remove the cell from the queue after we evaluted it.

This revision changes the approach taken in 8d22e86 to allow
legitimate invalidations take place while avoiding useless
reevaluations.

Task: 4252800
robodoo pushed a commit that referenced this pull request Nov 15, 2024
How to reproduce:
-> see the attached test

What is happening?
During the evaluation of SUMIFS, it will fetch its ranges.
First it will first fetch an empty range (E4:E7)  since MUNIT hasn't
been evaluated yet.
Afterwards, it will fetch H4, which in turn will request C4
and therefore force the evaluation of the spreaded function MUNIT.

AT this point, there is a disparity of values because the values
of the range (E4:E7) have changed.
To address those situations, the evaluator invalidates the dependencies
of all the spreaded zone. Which means marking A1 to be recomputed.

Unfortunately, A1 is still being evaluated and since #4589 (specifically
commit 8d22e86) we remove the cell from the queue after we evaluted it.

This revision changes the approach taken in 8d22e86 to allow
legitimate invalidations take place while avoiding useless
reevaluations.

Task: 4252800
Part-of: #5211
Signed-off-by: Lucas Lefèvre (lul) <[email protected]>
Signed-off-by: Rémi Rahir (rar) <[email protected]>
rrahir added a commit that referenced this pull request Nov 15, 2024
How to reproduce:
-> see the attached test

What is happening?
During the evaluation of SUMIFS, it will fetch its ranges.
First it will first fetch an empty range (E4:E7)  since MUNIT hasn't
been evaluated yet.
Afterwards, it will fetch H4, which in turn will request C4
and therefore force the evaluation of the spreaded function MUNIT.

AT this point, there is a disparity of values because the values
of the range (E4:E7) have changed.
To address those situations, the evaluator invalidates the dependencies
of all the spreaded zone. Which means marking A1 to be recomputed.

Unfortunately, A1 is still being evaluated and since #4589 (specifically
commit 8d22e86) we remove the cell from the queue after we evaluted it.

This revision changes the approach taken in 8d22e86 to allow
legitimate invalidations take place while avoiding useless
reevaluations.

Task: 4252800
robodoo pushed a commit that referenced this pull request Nov 15, 2024
How to reproduce:
-> see the attached test

What is happening?
During the evaluation of SUMIFS, it will fetch its ranges.
First it will first fetch an empty range (E4:E7)  since MUNIT hasn't
been evaluated yet.
Afterwards, it will fetch H4, which in turn will request C4
and therefore force the evaluation of the spreaded function MUNIT.

AT this point, there is a disparity of values because the values
of the range (E4:E7) have changed.
To address those situations, the evaluator invalidates the dependencies
of all the spreaded zone. Which means marking A1 to be recomputed.

Unfortunately, A1 is still being evaluated and since #4589 (specifically
commit 8d22e86) we remove the cell from the queue after we evaluted it.

This revision changes the approach taken in 8d22e86 to allow
legitimate invalidations take place while avoiding useless
reevaluations.

Task: 4252800
Part-of: #5212
Signed-off-by: Lucas Lefèvre (lul) <[email protected]>
Signed-off-by: Rémi Rahir (rar) <[email protected]>
rrahir added a commit that referenced this pull request Nov 15, 2024
How to reproduce:
-> see the attached test

What is happening?
During the evaluation of SUMIFS, it will fetch its ranges.
First it will first fetch an empty range (E4:E7)  since MUNIT hasn't
been evaluated yet.
Afterwards, it will fetch H4, which in turn will request C4
and therefore force the evaluation of the spreaded function MUNIT.

AT this point, there is a disparity of values because the values
of the range (E4:E7) have changed.
To address those situations, the evaluator invalidates the dependencies
of all the spreaded zone. Which means marking A1 to be recomputed.

Unfortunately, A1 is still being evaluated and since #4589 (specifically
commit 8d22e86) we remove the cell from the queue after we evaluted it.

This revision changes the approach taken in 8d22e86 to allow
legitimate invalidations take place while avoiding useless
reevaluations.

Task: 4252800
rrahir added a commit that referenced this pull request Nov 15, 2024
How to reproduce:
-> see the attached test

What is happening?
During the evaluation of SUMIFS, it will fetch its ranges.
First it will first fetch an empty range (E4:E7)  since MUNIT hasn't
been evaluated yet.
Afterwards, it will fetch H4, which in turn will request C4
and therefore force the evaluation of the spreaded function MUNIT.

AT this point, there is a disparity of values because the values
of the range (E4:E7) have changed.
To address those situations, the evaluator invalidates the dependencies
of all the spreaded zone. Which means marking A1 to be recomputed.

Unfortunately, A1 is still being evaluated and since #4589 (specifically
commit 8d22e86) we remove the cell from the queue after we evaluted it.

This revision changes the approach taken in 8d22e86 to allow
legitimate invalidations take place while avoiding useless
reevaluations.

Task: 4252800
robodoo pushed a commit that referenced this pull request Nov 15, 2024
How to reproduce:
-> see the attached test

What is happening?
During the evaluation of SUMIFS, it will fetch its ranges.
First it will first fetch an empty range (E4:E7)  since MUNIT hasn't
been evaluated yet.
Afterwards, it will fetch H4, which in turn will request C4
and therefore force the evaluation of the spreaded function MUNIT.

AT this point, there is a disparity of values because the values
of the range (E4:E7) have changed.
To address those situations, the evaluator invalidates the dependencies
of all the spreaded zone. Which means marking A1 to be recomputed.

Unfortunately, A1 is still being evaluated and since #4589 (specifically
commit 8d22e86) we remove the cell from the queue after we evaluted it.

This revision changes the approach taken in 8d22e86 to allow
legitimate invalidations take place while avoiding useless
reevaluations.

Task: 4252800
Part-of: #5214
Signed-off-by: Lucas Lefèvre (lul) <[email protected]>
Signed-off-by: Rémi Rahir (rar) <[email protected]>
fw-bot pushed a commit that referenced this pull request Nov 15, 2024
How to reproduce:
-> see the attached test

What is happening?
During the evaluation of SUMIFS, it will fetch its ranges.
First it will first fetch an empty range (E4:E7)  since MUNIT hasn't
been evaluated yet.
Afterwards, it will fetch H4, which in turn will request C4
and therefore force the evaluation of the spreaded function MUNIT.

AT this point, there is a disparity of values because the values
of the range (E4:E7) have changed.
To address those situations, the evaluator invalidates the dependencies
of all the spreaded zone. Which means marking A1 to be recomputed.

Unfortunately, A1 is still being evaluated and since #4589 (specifically
commit 8d22e86) we remove the cell from the queue after we evaluted it.

This revision changes the approach taken in 8d22e86 to allow
legitimate invalidations take place while avoiding useless
reevaluations.

Task: 4252800
X-original-commit: 07c48aa
robodoo pushed a commit that referenced this pull request Nov 15, 2024
How to reproduce:
-> see the attached test

What is happening?
During the evaluation of SUMIFS, it will fetch its ranges.
First it will first fetch an empty range (E4:E7)  since MUNIT hasn't
been evaluated yet.
Afterwards, it will fetch H4, which in turn will request C4
and therefore force the evaluation of the spreaded function MUNIT.

AT this point, there is a disparity of values because the values
of the range (E4:E7) have changed.
To address those situations, the evaluator invalidates the dependencies
of all the spreaded zone. Which means marking A1 to be recomputed.

Unfortunately, A1 is still being evaluated and since #4589 (specifically
commit 8d22e86) we remove the cell from the queue after we evaluted it.

This revision changes the approach taken in 8d22e86 to allow
legitimate invalidations take place while avoiding useless
reevaluations.

Task: 4252800
X-original-commit: 07c48aa
Part-of: #5215
Signed-off-by: Lucas Lefèvre (lul) <[email protected]>
Signed-off-by: Rémi Rahir (rar) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants