Moving Aggregation to Java #3364

jdunkerley · 2022-03-25T18:27:07Z

Pull Request Description

Important Notes

Checklist

Please include the following checklist in your PR:

The documentation has been updated if necessary.
All code conforms to the Scala, Java, and Rust style guides.
All code has been tested:
- Unit tests have been written where possible.
- If GUI codebase was changed: Enso GUI was tested when built using BOTH ./run dist and ./run watch.

jdunkerley · 2022-03-28T14:24:08Z

Test	Ref	New
Count table	122.4	6.1
Max table	166.3	5.0
Sum table	120.6	4.0
StDev table	50.2	2.6
Count grouped	191.4	5.5
Max table	166.3	5.0
Sum table	120.6	4.0
StDev grouped	118.9	4.0
Count 2 level groups	468.5	3.7
Max table	166.3	5.0
Sum table	120.6	4.0
StDev 2 level groups	254.8	5.1

radeusgd

The shape looks good in general, but have a few comments and I still need to have a look as I didn't manage to review everything.

One thing I'm a bit worried about is if we are not doing too much boxing for integer/double columns. This can introduce some performance cost.

We have the "old" aggregates in our Table using the group by=... function which I think are using streams to potentially avoid boxing. There are only most basic operations implemented there (sum, min, max, mean), but it may be good to create a comparative benchmark comparing the performance of old and new aggregations to see where we are at.

I'm not sure if this should be done as part of this particular PR, but I think this is something we should check at some point.

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Column.enso

distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Aggregate_Column_Aggregator.enso

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Table.enso

std-bits/table/src/main/java/org/enso/table/data/index/MultiValueIndex.java

app/ide-desktop/package-lock.json

std-bits/table/src/main/java/org/enso/table/aggregations/Concatenate.java

std-bits/table/src/main/java/org/enso/table/aggregations/AggregateColumn.java

radeusgd · 2022-03-31T16:33:18Z

std-bits/table/src/main/java/org/enso/table/data/table/problems/AggregatedProblems.java

+import java.util.stream.Stream;
+
+public class AggregatedProblems {
+  private final Problem[] problems;


Why not use some ArrayList?

std-bits/table/src/main/java/org/enso/table/data/table/aggregate/AggregateColumnDefinition.java

distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Aggregate_Column_Helper.enso

…lueIndex.java Co-authored-by: Radosław Waśko <[email protected]>

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Table.enso

distribution/lib/Standard/Table/0.0.0-dev/src/Error.enso

distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Aggregate_Column_Helper.enso

std-bits/table/src/main/java/org/enso/table/aggregations/Last.java

std-bits/table/src/main/java/org/enso/table/aggregations/MinOrMax.java

std-bits/table/src/main/java/org/enso/table/aggregations/ShortestOrLongest.java

radeusgd

Looks good overall, although as noted some stuff like comparing performance or ensuring handling of Unicode strings need to be done at some point - but as agreed - will be more efficient to do these separately.

(still would appreciate addressing the earlier comments in one way or another)

radeusgd · 2022-04-01T14:40:35Z

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Table.enso

+         - If grouping on or computing the `Mode` on a floating point number, a `Floating_Point_Grouping`.
+         - If an aggregation fails, an `Invalid_Aggregation_Method`.
+         - If when concatenating values there is an quoted delimited, an `Unquoted_Delimiter`
+         - If there are more than 10 issues with a single column, an `Additional_Warnings`.


If each column can issue this thing, then I think it should contain the origin column as payload.

But I still think we should aggregate these issues in the same way as we do for inputs.

std-bits/table/src/main/java/org/enso/table/aggregations/CountDistinct.java

test/Benchmarks/src/Table/Aggregate.enso

…umn.

jdunkerley force-pushed the wip/jd/multivalueindex branch from 1e851e0 to 3e6ab7b Compare March 31, 2022 13:41

jdunkerley requested a review from radeusgd March 31, 2022 16:18

radeusgd reviewed Mar 31, 2022

View reviewed changes

jdunkerley force-pushed the wip/jd/multivalueindex branch 2 times, most recently from 5979ca7 to d6a5e0e Compare April 1, 2022 07:41

jdunkerley and others added 21 commits April 1, 2022 08:42

First pass at Java

944b55c

Functioning initial Java version

0bfdf3c

Restructure to make faster

226d67d

Average and Sum working

de0660a

More Java based aggregate work

fc2f8a3

Empty table support and problems

2b05311

Concatenate and CountDistinct

a22cac2

Work on tables

037267c

License fixes

c4b1b78

Merge into aggregate function

46272c8

All aggregates working via Java

4b2e185

Fold problems and attach to table

4b98411

Break aggregation into classes for easier reading.

1c9d83d

Make it work again...

11ea968

Rebase work

21dfce3

Restructuring warning set up

b59e1fd

Additional warnings

3e32c50

Update std-bits/table/src/main/java/org/enso/table/data/index/MultiVa…

6771309

…lueIndex.java Co-authored-by: Radosław Waśko <[email protected]>

Some PR comments

de6913a

Revert

98a8b16

Revert

7ef268b

jdunkerley force-pushed the wip/jd/multivalueindex branch from d6a5e0e to 7ef268b Compare April 1, 2022 07:42

jdunkerley added 2 commits April 1, 2022 08:59

Legal review.

44c3095

Use constructors

0891538

jdunkerley added 3 commits April 1, 2022 11:58

Failing test fixes

13ea6de

Warnings being passed back correctly

6796b9e

Tests for warnings

bc89df5

jdunkerley marked this pull request as ready for review April 1, 2022 13:10

jdunkerley requested a review from 4e6 as a code owner April 1, 2022 13:10

jdunkerley requested a review from radeusgd April 1, 2022 13:10

Truncated warning test

0c8988b

radeusgd reviewed Apr 1, 2022

View reviewed changes

jdunkerley added 2 commits April 1, 2022 15:05

PR comments

28113ed

PR comments

10ffa0f

radeusgd approved these changes Apr 1, 2022

View reviewed changes

jdunkerley added 3 commits April 1, 2022 17:28

PR comments

cdde37e

Doc comments

b007545

Adjusted warnings so single warning for each failure type in each col…

f586f2e

…umn.

jdunkerley added the CI: Ready to merge This PR is eligible for automatic merge label Apr 1, 2022

4e6 approved these changes Apr 4, 2022

View reviewed changes

mergify bot merged commit a4dbc9a into develop Apr 4, 2022

mergify bot deleted the wip/jd/multivalueindex branch April 4, 2022 09:12

jdunkerley added a commit that referenced this pull request Apr 5, 2022

Moving Aggregation to Java (#3364)

5b47795

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moving Aggregation to Java #3364

Moving Aggregation to Java #3364

jdunkerley commented Mar 25, 2022 •

edited

Loading

jdunkerley commented Mar 28, 2022

radeusgd left a comment

radeusgd Mar 31, 2022

radeusgd left a comment •

edited

Loading

radeusgd Apr 1, 2022

Moving Aggregation to Java #3364

Moving Aggregation to Java #3364

Conversation

jdunkerley commented Mar 25, 2022 • edited Loading

Pull Request Description

Important Notes

Checklist

jdunkerley commented Mar 28, 2022

radeusgd left a comment

Choose a reason for hiding this comment

radeusgd Mar 31, 2022

Choose a reason for hiding this comment

radeusgd left a comment • edited Loading

Choose a reason for hiding this comment

radeusgd Apr 1, 2022

Choose a reason for hiding this comment

jdunkerley commented Mar 25, 2022 •

edited

Loading

radeusgd left a comment •

edited

Loading