-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PERF] evaluation: (much) faster array formulas #4589
Conversation
0ac833a
to
a9cb13c
Compare
aa44902
to
59e4205
Compare
2477b09
to
5b14256
Compare
`JetSet.deleteMany` only deletes the first element found in the set, instead of deleting all elements. The reason is `wasDeleted ||= super.delete(element);`. It's equivalent to `wasDeleted = wasDeleted || super.delete(element);` and as soon as `wasDeleted` is true (when the first element is deleted) the right-hand part (`|| super.delete(element)`) is no longer executed. I simplified the method by removing the return value. It's never used and this data structure has been removed in future versions anyway. Note that it didn't cause any actual functional bug; only some useless computation in some cases. Task: 4036899
5b14256
to
df737cd
Compare
small question about the third commit message:
Did you mean spread/spreaded zone? |
When spreading the result of an array formula, we invalidate formulas depending on the spread zone. Currently, we look for the dependencies one position in the spread zone at a time. It's much more efficient to group all those individual positions in zones and call `this.getCellsDependingOn(...)` with those zones. This commit reduces the time to evaluate the large array formula dataset from 60+ seconds to ~36 seconds. Task: 4036899
Let's say we have the following array formulas: in B1: =transpose(transpose(A1:A2)) in C1: =transpose(transpose(B1:B2)) in D1: =transpose(transpose(C1:C2)) Each columns depends on the array formula in the previous column. Initially, all cells are evaluated in the order B1,C1,D1 When B1 is evaluated, all cells that depends on its result (B1 and B2) are marked to be re-evaluated in another iteration. => C1 and D1 are added to `this.nextPositionsToUpdate` At the next iteration (which evaluated C1 and D1), when C1 is evaluated it goes again: D1 is added to `this.nextPositionsToUpdate` for a third iteration. Coming back to the initial iteration: even if C1 and D1 are in `this.nextPositionsToUpdate`, they are later computed in the *current* iteration (remember the initial iteration evaluates B1,C1,D1). It's useless to re-compute them since they are already computed (and no other result invalidated their result) The evaluation of the large array formula dataset goes from ~36s to ~750ms Task: 4036899
df737cd
to
ddd324e
Compare
Yes, my fingers are typing themselves mechanically. I'm writing the word "spreadsheet" too much 😄 |
robodoo r+ |
@LucasLefevre @VincentSchippefilt because this PR has multiple commits, I need to know how to merge it:
|
robodoo rebase-ff |
Task: 4036899 Part-of: #4589 Signed-off-by: Vincent Schippefilt (vsc) <[email protected]>
`JetSet.deleteMany` only deletes the first element found in the set, instead of deleting all elements. The reason is `wasDeleted ||= super.delete(element);`. It's equivalent to `wasDeleted = wasDeleted || super.delete(element);` and as soon as `wasDeleted` is true (when the first element is deleted) the right-hand part (`|| super.delete(element)`) is no longer executed. I simplified the method by removing the return value. It's never used and this data structure has been removed in future versions anyway. Note that it didn't cause any actual functional bug; only some useless computation in some cases. Task: 4036899 Part-of: #4589 Signed-off-by: Vincent Schippefilt (vsc) <[email protected]>
When spreading the result of an array formula, we invalidate formulas depending on the spread zone. Currently, we look for the dependencies one position in the spread zone at a time. It's much more efficient to group all those individual positions in zones and call `this.getCellsDependingOn(...)` with those zones. This commit reduces the time to evaluate the large array formula dataset from 60+ seconds to ~36 seconds. Task: 4036899 Part-of: #4589 Signed-off-by: Vincent Schippefilt (vsc) <[email protected]>
Let's say we have the following array formulas: in B1: =transpose(transpose(A1:A2)) in C1: =transpose(transpose(B1:B2)) in D1: =transpose(transpose(C1:C2)) Each columns depends on the array formula in the previous column. Initially, all cells are evaluated in the order B1,C1,D1 When B1 is evaluated, all cells that depends on its result (B1 and B2) are marked to be re-evaluated in another iteration. => C1 and D1 are added to `this.nextPositionsToUpdate` At the next iteration (which evaluated C1 and D1), when C1 is evaluated it goes again: D1 is added to `this.nextPositionsToUpdate` for a third iteration. Coming back to the initial iteration: even if C1 and D1 are in `this.nextPositionsToUpdate`, they are later computed in the *current* iteration (remember the initial iteration evaluates B1,C1,D1). It's useless to re-compute them since they are already computed (and no other result invalidated their result) The evaluation of the large array formula dataset goes from ~36s to ~750ms closes #4589 Task: 4036899 Signed-off-by: Vincent Schippefilt (vsc) <[email protected]>
Merge method set to rebase and fast-forward. |
How to reproduce: -> see the attached test What is happening? During the evaluation of SUMIFS, it will fetch its ranges. First it will first fetch an empty range (E4:E7) since MUNIT hasn't been evaluated yet. Afterwards, it will fetch H4, which in turn will request C4 and therefore force the evaluation of the spreaded function MUNIT. AT this point, there is a disparity of values because the values of the range (E4:E7) have changed. To address those situations, the evaluator invalidates the dependencies of all the spreaded zone. Which means marking A1 to be recomputed. Unfortunately, A1 is still being evaluated and since #4589 (specifically commit 8d22e86) we remove the cell from the queue after we evaluted it. This revision changes the approach taken in 8d22e86 to allow legitimate invalidations take place while avoiding useless reevaluations. Task: 4252800
How to reproduce: -> see the attached test What is happening? During the evaluation of SUMIFS, it will fetch its ranges. First it will first fetch an empty range (E4:E7) since MUNIT hasn't been evaluated yet. Afterwards, it will fetch H4, which in turn will request C4 and therefore force the evaluation of the spreaded function MUNIT. AT this point, there is a disparity of values because the values of the range (E4:E7) have changed. To address those situations, the evaluator invalidates the dependencies of all the spreaded zone. Which means marking A1 to be recomputed. Unfortunately, A1 is still being evaluated and since #4589 (specifically commit 8d22e86) we remove the cell from the queue after we evaluted it. This revision changes the approach taken in 8d22e86 to allow legitimate invalidations take place while avoiding useless reevaluations. Task: 4252800
How to reproduce: -> see the attached test What is happening? During the evaluation of SUMIFS, it will fetch its ranges. First it will first fetch an empty range (E4:E7) since MUNIT hasn't been evaluated yet. Afterwards, it will fetch H4, which in turn will request C4 and therefore force the evaluation of the spreaded function MUNIT. AT this point, there is a disparity of values because the values of the range (E4:E7) have changed. To address those situations, the evaluator invalidates the dependencies of all the spreaded zone. Which means marking A1 to be recomputed. Unfortunately, A1 is still being evaluated and since #4589 (specifically commit 8d22e86) we remove the cell from the queue after we evaluted it. This revision changes the approach taken in 8d22e86 to allow legitimate invalidations take place while avoiding useless reevaluations. Task: 4252800
How to reproduce: -> see the attached test What is happening? During the evaluation of SUMIFS, it will fetch its ranges. First it will first fetch an empty range (E4:E7) since MUNIT hasn't been evaluated yet. Afterwards, it will fetch H4, which in turn will request C4 and therefore force the evaluation of the spreaded function MUNIT. AT this point, there is a disparity of values because the values of the range (E4:E7) have changed. To address those situations, the evaluator invalidates the dependencies of all the spreaded zone. Which means marking A1 to be recomputed. Unfortunately, A1 is still being evaluated and since #4589 (specifically commit 8d22e86) we remove the cell from the queue after we evaluted it. This revision changes the approach taken in 8d22e86 to allow legitimate invalidations take place while avoiding useless reevaluations. Task: 4252800
How to reproduce: -> see the attached test What is happening? During the evaluation of SUMIFS, it will fetch its ranges. First it will first fetch an empty range (E4:E7) since MUNIT hasn't been evaluated yet. Afterwards, it will fetch H4, which in turn will request C4 and therefore force the evaluation of the spreaded function MUNIT. AT this point, there is a disparity of values because the values of the range (E4:E7) have changed. To address those situations, the evaluator invalidates the dependencies of all the spreaded zone. Which means marking A1 to be recomputed. Unfortunately, A1 is still being evaluated and since #4589 (specifically commit 8d22e86) we remove the cell from the queue after we evaluted it. This revision changes the approach taken in 8d22e86 to allow legitimate invalidations take place while avoiding useless reevaluations. Task: 4252800 Part-of: #5209 Signed-off-by: Lucas Lefèvre (lul) <[email protected]>
How to reproduce: -> see the attached test What is happening? During the evaluation of SUMIFS, it will fetch its ranges. First it will first fetch an empty range (E4:E7) since MUNIT hasn't been evaluated yet. Afterwards, it will fetch H4, which in turn will request C4 and therefore force the evaluation of the spreaded function MUNIT. AT this point, there is a disparity of values because the values of the range (E4:E7) have changed. To address those situations, the evaluator invalidates the dependencies of all the spreaded zone. Which means marking A1 to be recomputed. Unfortunately, A1 is still being evaluated and since #4589 (specifically commit 8d22e86) we remove the cell from the queue after we evaluted it. This revision changes the approach taken in 8d22e86 to allow legitimate invalidations take place while avoiding useless reevaluations. Task: 4252800 Part-of: #5209 Signed-off-by: Lucas Lefèvre (lul) <[email protected]>
How to reproduce: -> see the attached test What is happening? During the evaluation of SUMIFS, it will fetch its ranges. First it will first fetch an empty range (E4:E7) since MUNIT hasn't been evaluated yet. Afterwards, it will fetch H4, which in turn will request C4 and therefore force the evaluation of the spreaded function MUNIT. AT this point, there is a disparity of values because the values of the range (E4:E7) have changed. To address those situations, the evaluator invalidates the dependencies of all the spreaded zone. Which means marking A1 to be recomputed. Unfortunately, A1 is still being evaluated and since #4589 (specifically commit 8d22e86) we remove the cell from the queue after we evaluted it. This revision changes the approach taken in 8d22e86 to allow legitimate invalidations take place while avoiding useless reevaluations. Task: 4252800
How to reproduce: -> see the attached test What is happening? During the evaluation of SUMIFS, it will fetch its ranges. First it will first fetch an empty range (E4:E7) since MUNIT hasn't been evaluated yet. Afterwards, it will fetch H4, which in turn will request C4 and therefore force the evaluation of the spreaded function MUNIT. AT this point, there is a disparity of values because the values of the range (E4:E7) have changed. To address those situations, the evaluator invalidates the dependencies of all the spreaded zone. Which means marking A1 to be recomputed. Unfortunately, A1 is still being evaluated and since #4589 (specifically commit 8d22e86) we remove the cell from the queue after we evaluted it. This revision changes the approach taken in 8d22e86 to allow legitimate invalidations take place while avoiding useless reevaluations. Task: 4252800 Part-of: #5211 Signed-off-by: Lucas Lefèvre (lul) <[email protected]> Signed-off-by: Rémi Rahir (rar) <[email protected]>
How to reproduce: -> see the attached test What is happening? During the evaluation of SUMIFS, it will fetch its ranges. First it will first fetch an empty range (E4:E7) since MUNIT hasn't been evaluated yet. Afterwards, it will fetch H4, which in turn will request C4 and therefore force the evaluation of the spreaded function MUNIT. AT this point, there is a disparity of values because the values of the range (E4:E7) have changed. To address those situations, the evaluator invalidates the dependencies of all the spreaded zone. Which means marking A1 to be recomputed. Unfortunately, A1 is still being evaluated and since #4589 (specifically commit 8d22e86) we remove the cell from the queue after we evaluted it. This revision changes the approach taken in 8d22e86 to allow legitimate invalidations take place while avoiding useless reevaluations. Task: 4252800
How to reproduce: -> see the attached test What is happening? During the evaluation of SUMIFS, it will fetch its ranges. First it will first fetch an empty range (E4:E7) since MUNIT hasn't been evaluated yet. Afterwards, it will fetch H4, which in turn will request C4 and therefore force the evaluation of the spreaded function MUNIT. AT this point, there is a disparity of values because the values of the range (E4:E7) have changed. To address those situations, the evaluator invalidates the dependencies of all the spreaded zone. Which means marking A1 to be recomputed. Unfortunately, A1 is still being evaluated and since #4589 (specifically commit 8d22e86) we remove the cell from the queue after we evaluted it. This revision changes the approach taken in 8d22e86 to allow legitimate invalidations take place while avoiding useless reevaluations. Task: 4252800 Part-of: #5212 Signed-off-by: Lucas Lefèvre (lul) <[email protected]> Signed-off-by: Rémi Rahir (rar) <[email protected]>
How to reproduce: -> see the attached test What is happening? During the evaluation of SUMIFS, it will fetch its ranges. First it will first fetch an empty range (E4:E7) since MUNIT hasn't been evaluated yet. Afterwards, it will fetch H4, which in turn will request C4 and therefore force the evaluation of the spreaded function MUNIT. AT this point, there is a disparity of values because the values of the range (E4:E7) have changed. To address those situations, the evaluator invalidates the dependencies of all the spreaded zone. Which means marking A1 to be recomputed. Unfortunately, A1 is still being evaluated and since #4589 (specifically commit 8d22e86) we remove the cell from the queue after we evaluted it. This revision changes the approach taken in 8d22e86 to allow legitimate invalidations take place while avoiding useless reevaluations. Task: 4252800
How to reproduce: -> see the attached test What is happening? During the evaluation of SUMIFS, it will fetch its ranges. First it will first fetch an empty range (E4:E7) since MUNIT hasn't been evaluated yet. Afterwards, it will fetch H4, which in turn will request C4 and therefore force the evaluation of the spreaded function MUNIT. AT this point, there is a disparity of values because the values of the range (E4:E7) have changed. To address those situations, the evaluator invalidates the dependencies of all the spreaded zone. Which means marking A1 to be recomputed. Unfortunately, A1 is still being evaluated and since #4589 (specifically commit 8d22e86) we remove the cell from the queue after we evaluted it. This revision changes the approach taken in 8d22e86 to allow legitimate invalidations take place while avoiding useless reevaluations. Task: 4252800
How to reproduce: -> see the attached test What is happening? During the evaluation of SUMIFS, it will fetch its ranges. First it will first fetch an empty range (E4:E7) since MUNIT hasn't been evaluated yet. Afterwards, it will fetch H4, which in turn will request C4 and therefore force the evaluation of the spreaded function MUNIT. AT this point, there is a disparity of values because the values of the range (E4:E7) have changed. To address those situations, the evaluator invalidates the dependencies of all the spreaded zone. Which means marking A1 to be recomputed. Unfortunately, A1 is still being evaluated and since #4589 (specifically commit 8d22e86) we remove the cell from the queue after we evaluted it. This revision changes the approach taken in 8d22e86 to allow legitimate invalidations take place while avoiding useless reevaluations. Task: 4252800 Part-of: #5214 Signed-off-by: Lucas Lefèvre (lul) <[email protected]> Signed-off-by: Rémi Rahir (rar) <[email protected]>
How to reproduce: -> see the attached test What is happening? During the evaluation of SUMIFS, it will fetch its ranges. First it will first fetch an empty range (E4:E7) since MUNIT hasn't been evaluated yet. Afterwards, it will fetch H4, which in turn will request C4 and therefore force the evaluation of the spreaded function MUNIT. AT this point, there is a disparity of values because the values of the range (E4:E7) have changed. To address those situations, the evaluator invalidates the dependencies of all the spreaded zone. Which means marking A1 to be recomputed. Unfortunately, A1 is still being evaluated and since #4589 (specifically commit 8d22e86) we remove the cell from the queue after we evaluted it. This revision changes the approach taken in 8d22e86 to allow legitimate invalidations take place while avoiding useless reevaluations. Task: 4252800 X-original-commit: 07c48aa
How to reproduce: -> see the attached test What is happening? During the evaluation of SUMIFS, it will fetch its ranges. First it will first fetch an empty range (E4:E7) since MUNIT hasn't been evaluated yet. Afterwards, it will fetch H4, which in turn will request C4 and therefore force the evaluation of the spreaded function MUNIT. AT this point, there is a disparity of values because the values of the range (E4:E7) have changed. To address those situations, the evaluator invalidates the dependencies of all the spreaded zone. Which means marking A1 to be recomputed. Unfortunately, A1 is still being evaluated and since #4589 (specifically commit 8d22e86) we remove the cell from the queue after we evaluted it. This revision changes the approach taken in 8d22e86 to allow legitimate invalidations take place while avoiding useless reevaluations. Task: 4252800 X-original-commit: 07c48aa Part-of: #5215 Signed-off-by: Lucas Lefèvre (lul) <[email protected]> Signed-off-by: Rémi Rahir (rar) <[email protected]>
Description:
description of this task, what is implemented and why it is implemented that way.
Task: : TASK_ID
review checklist