Implement and benchmark `ArrowOperationPlus` node #10150

JaroslavTulach · 2024-06-03T05:29:44Z

Pull Request Description

Prototype of #10056 showing + operation implemented in the Arrow language.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

All code follows the
Java,
Unit tests have been written where possible.
Benchmarks are looking good

...ime-language-arrow/src/main/java/org/enso/interpreter/arrow/node/ArrowCastFixedSizeNode.java

…owPlus10056

test/Benchmarks/src/Table/Arithmetic.enso

...time-language-arrow/src/main/java/org/enso/interpreter/arrow/runtime/ArrowOperationPlus.java

Akirathan · 2024-06-03T14:43:46Z

engine/runtime-language-arrow/src/test/java/org/enso/interpreter/arrow/AddArrowTest.java

+
+  @BeforeClass
+  public static void initEnsoContext() {
+    ctx =


Use ContextUtils and declare dependency runtime-language-arrow/Test --> test-utils. Being able to use context and project utils from test-utils was the main motivation to move test-utils into separate project in #10112

...untime-language-arrow/src/main/java/org/enso/interpreter/arrow/runtime/ByteBufferDirect.java

JaroslavTulach · 2024-06-04T04:11:48Z

With 68d9306 we are at

[info] Benchmark                              Mode  Cnt   Score    Error  Units
[info] Arrow_Arithmetic_1000000.Plus_Fitting  avgt    3  34.328 ± 52.472  ms/op

There is a bunch of computations related to RoundingUtil.

That surprising as as they should be constant... but one has to tell the compiler a static int field is final: 1dbbae1 - then this rounding util part of the IGV graph disappears - alas, it has little impact as the computation was done one in the root anyway.

…cException

JaroslavTulach · 2024-06-10T05:44:17Z

The simplest one million of long numbers addition - e.g. the Plus_Fitting benchmarks:

seem to be fine. The Arrow.+ implementation is on par or slightly faster.

JaroslavTulach · 2024-06-10T05:47:20Z

One million of long additions with two hundered thousand of overflowing seems fine as well:

...time-language-arrow/src/main/java/org/enso/interpreter/arrow/runtime/ArrowFixedArrayInt.java

radeusgd · 2024-06-10T14:09:26Z

test/Benchmarks/src/Table/Arithmetic.enso

+    Runtime.assert ((column_arithmetic_plus_fitting data . to_vector) == (arrow_arithmetic_plus_fitting data)) "Column and arrow correctness check one"
+    Runtime.assert ((column_arithmetic_plus_overflowing data . to_vector) == (arrow_arithmetic_plus_overflowing data)) "Column and arrow correctness check two"
+    Runtime.assert ((column_arithmetic_plus_nothing data . to_vector) == (arrow_arithmetic_plus_nothing data)) "Column and arrow correctness check three"


Yeah, I really miss . should_equal & co. when writing benchmarks. How are we supposed to know it computes the right values?

I guess we could use should_equal? It will just throw a panic if it fails.

Since #8778 we probably could use should_equal. Still I am not sure how to do write such tests properly to avoid:

initialization overhead

slowing down the benchmarks

This current Runtime.assert relies on the fact that runEngineDistribution -run test/Benchmarks runs with enabled assertions and thus it will check the assert. While when running benchmarks on the CI as well as std-benchmarks/bench the assertions are disabled and thus this testing code isn't executed at all.

I think it works, but it is a bit fragile.

test/Benchmarks/src/Table/Arithmetic.enso

radeusgd

Enso benchmarks look good.

I don't really understand how the engine part works. I'm not sure if I have to, but maybe the interactive PR review you suggested in some discussion could work well in this kind of PR? As I imagine it would be useful to get a high-level explanation of the decisions here. Without it I'm guessing and I'm not really sure what is happening in this code.

radeusgd

Approving the Enso benchmark changes, assuming @hubertp will look over the engine part.

hubertp

Arrow optimizations look really good. Thanks for doing the investigation

...ime-language-arrow/src/main/java/org/enso/interpreter/arrow/node/ArrowCastFixedSizeNode.java

...time-language-arrow/src/main/java/org/enso/interpreter/arrow/runtime/ArrowFixedArrayInt.java

Co-authored-by: Radosław Waśko <[email protected]>

JaroslavTulach · 2024-06-11T11:06:20Z

...guage-arrow/src/main/java/org/enso/interpreter/arrow/runtime/ArrowFixedSizeArrayBuilder.java

+        put.putNull(builder.buffer, cachedUnit);
+        return;
+      }
+      var number = valueNode.executeAdjust(cachedUnit, value);


Unlike the previous version that was converting and calling putXyz at once place the executeAdjust just converts value to appropriate java.lang.Number and then we use the PutNode to place that value into the buffer.

Truffle gives us a way to separate concerns for operations without loosing the speed - the whole switch will be compiled away and at the end will look just like the old one, but in source it is more structured into individual nodes.

JaroslavTulach added 3 commits May 31, 2024 11:33

Test for +[Int8] behavior

a56aa05

Generalizing parser to support +

a9e07f1

Trivial implementation of ArrowOperationPlus

f6d7e84

JaroslavTulach added CI: No changelog needed Do not require a changelog entry for this PR. -compiler labels Jun 3, 2024

JaroslavTulach self-assigned this Jun 3, 2024

JaroslavTulach requested review from 4e6, hubertp and Akirathan as code owners June 3, 2024 05:29

JaroslavTulach marked this pull request as draft June 3, 2024 05:29

JaroslavTulach commented Jun 3, 2024

View reviewed changes

...ime-language-arrow/src/main/java/org/enso/interpreter/arrow/node/ArrowCastFixedSizeNode.java Show resolved Hide resolved

JaroslavTulach added 2 commits June 3, 2024 08:52

Merge remote-tracking branch 'origin/develop' into wip/jtulach/PocArr…

2f78ea9

…owPlus10056

Benchmark Arrow + implementation against Table one

a044cb0

JaroslavTulach commented Jun 3, 2024

View reviewed changes

test/Benchmarks/src/Table/Arithmetic.enso Outdated Show resolved Hide resolved

JaroslavTulach added 5 commits June 3, 2024 11:25

Invoke foreign function with arguments

9c70d17

Allow Int64 value to be null

a008b3d

Support isNull arguments

a6248a1

Measure just arrow_plus performance

2f16308

Speeding up by using different interop library for different element

2bfba6e

JaroslavTulach linked an issue Jun 3, 2024 that may be closed by this pull request

Benchmark Truffle "add" for arrow-language and Column.+ #10056

Closed

JaroslavTulach added 2 commits June 3, 2024 16:02

Tests for Int64 buffer

760a106

For some reasons there are nulls in the array

f9a802e

Akirathan reviewed Jun 3, 2024

View reviewed changes

Fix byte mask used for index calculation

68d9306

hubertp force-pushed the wip/jtulach/PocArrowPlus10056 branch from a157891 to 68d9306 Compare June 3, 2024 20:13

enso-bot bot mentioned this pull request Jun 4, 2024

Benchmark Truffle "add" for arrow-language and Column.+ #10056

Closed

JaroslavTulach commented Jun 4, 2024

View reviewed changes

...untime-language-arrow/src/main/java/org/enso/interpreter/arrow/runtime/ByteBufferDirect.java Outdated Show resolved Hide resolved

Tell the compiler static values are final constants

1dbbae1

Prefer final fields and hope they will be constant

ea36864

JaroslavTulach added 2 commits June 8, 2024 10:51

Making the graph smaller by eliminating handling of exceptional state

3f49737

Math.addExact is intrinsified. Use it! But only until first Arithmeti…

55ee7cc

…cException

JaroslavTulach changed the title ~~PoC: Implement and benchmark + operation in the _Arrow language_~~ Implement and benchmark + operation in the _Arrow language_ Jun 8, 2024

JaroslavTulach marked this pull request as ready for review June 8, 2024 10:02

JaroslavTulach requested review from jdunkerley, radeusgd, GregoryTravis, AdRiley and marthasharkey as code owners June 8, 2024 10:02

JaroslavTulach requested a review from Akirathan June 8, 2024 10:03

JaroslavTulach changed the title ~~Implement and benchmark + operation in the _Arrow language_~~ Implement and benchmark ArrowOperationPlus node Jun 8, 2024

Benchmark work with an array with occational Nothing values

2bb6ebc

JaroslavTulach added CI: Clean build required CI runners will be cleaned before and after this PR is built. CI: Keep up to date Automatically update this PR to the latest develop. labels Jun 9, 2024

Merge branch 'develop' into wip/jtulach/PocArrowPlus10056

bce1892

JaroslavTulach removed the CI: Keep up to date Automatically update this PR to the latest develop. label Jun 10, 2024

radeusgd reviewed Jun 10, 2024

View reviewed changes

...time-language-arrow/src/main/java/org/enso/interpreter/arrow/runtime/ArrowFixedArrayInt.java Show resolved Hide resolved

radeusgd reviewed Jun 10, 2024

View reviewed changes

test/Benchmarks/src/Table/Arithmetic.enso Outdated Show resolved Hide resolved

radeusgd reviewed Jun 10, 2024

View reviewed changes

radeusgd approved these changes Jun 10, 2024

View reviewed changes

hubertp approved these changes Jun 11, 2024

View reviewed changes

...ime-language-arrow/src/main/java/org/enso/interpreter/arrow/node/ArrowCastFixedSizeNode.java Show resolved Hide resolved

...time-language-arrow/src/main/java/org/enso/interpreter/arrow/runtime/ArrowFixedArrayInt.java Show resolved Hide resolved

create_arrow_columns function

807fd6d

Co-authored-by: Radosław Waśko <[email protected]>

JaroslavTulach force-pushed the wip/jtulach/PocArrowPlus10056 branch from 37343e4 to 807fd6d Compare June 11, 2024 09:14

JaroslavTulach commented Jun 11, 2024

View reviewed changes

JaroslavTulach added the CI: Ready to merge This PR is eligible for automatic merge label Jun 11, 2024

mergify bot merged commit aaaebca into develop Jun 11, 2024
37 checks passed

mergify bot deleted the wip/jtulach/PocArrowPlus10056 branch June 11, 2024 12:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement and benchmark `ArrowOperationPlus` node #10150

Implement and benchmark `ArrowOperationPlus` node #10150

JaroslavTulach commented Jun 3, 2024 •

edited

Loading

Akirathan Jun 3, 2024

JaroslavTulach commented Jun 4, 2024

JaroslavTulach commented Jun 10, 2024

JaroslavTulach commented Jun 10, 2024

radeusgd Jun 10, 2024

JaroslavTulach Jun 10, 2024

radeusgd Jun 10, 2024

JaroslavTulach Jun 11, 2024

radeusgd left a comment

radeusgd left a comment

hubertp left a comment

JaroslavTulach Jun 11, 2024

Implement and benchmark ArrowOperationPlus node #10150

Implement and benchmark ArrowOperationPlus node #10150

Conversation

JaroslavTulach commented Jun 3, 2024 • edited Loading

Pull Request Description

Checklist

Akirathan Jun 3, 2024

Choose a reason for hiding this comment

JaroslavTulach commented Jun 4, 2024

JaroslavTulach commented Jun 10, 2024

JaroslavTulach commented Jun 10, 2024

radeusgd Jun 10, 2024

Choose a reason for hiding this comment

JaroslavTulach Jun 10, 2024

Choose a reason for hiding this comment

radeusgd Jun 10, 2024

Choose a reason for hiding this comment

JaroslavTulach Jun 11, 2024

Choose a reason for hiding this comment

radeusgd left a comment

Choose a reason for hiding this comment

radeusgd left a comment

Choose a reason for hiding this comment

hubertp left a comment

Choose a reason for hiding this comment

JaroslavTulach Jun 11, 2024

Choose a reason for hiding this comment

Implement and benchmark `ArrowOperationPlus` node #10150

Implement and benchmark `ArrowOperationPlus` node #10150

JaroslavTulach commented Jun 3, 2024 •

edited

Loading