Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving Aggregation to Java #3364

Merged
merged 32 commits into from
Apr 4, 2022
Merged

Moving Aggregation to Java #3364

merged 32 commits into from
Apr 4, 2022

Conversation

jdunkerley
Copy link
Member

@jdunkerley jdunkerley commented Mar 25, 2022

Pull Request Description

Important Notes

Checklist

Please include the following checklist in your PR:

  • The documentation has been updated if necessary.
  • All code conforms to the Scala, Java, and Rust style guides.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed: Enso GUI was tested when built using BOTH ./run dist and ./run watch.

@jdunkerley
Copy link
Member Author

Test Ref New
Count table 122.4 6.1
Max table 166.3 5.0
Sum table 120.6 4.0
StDev table 50.2 2.6
Count grouped 191.4 5.5
Max table 166.3 5.0
Sum table 120.6 4.0
StDev grouped 118.9 4.0
Count 2 level groups 468.5 3.7
Max table 166.3 5.0
Sum table 120.6 4.0
StDev 2 level groups 254.8 5.1

@jdunkerley jdunkerley force-pushed the wip/jd/multivalueindex branch from 1e851e0 to 3e6ab7b Compare March 31, 2022 13:41
@jdunkerley jdunkerley requested a review from radeusgd March 31, 2022 16:18
Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shape looks good in general, but have a few comments and I still need to have a look as I didn't manage to review everything.

One thing I'm a bit worried about is if we are not doing too much boxing for integer/double columns. This can introduce some performance cost.

We have the "old" aggregates in our Table using the group by=... function which I think are using streams to potentially avoid boxing. There are only most basic operations implemented there (sum, min, max, mean), but it may be good to create a comparative benchmark comparing the performance of old and new aggregations to see where we are at.

I'm not sure if this should be done as part of this particular PR, but I think this is something we should check at some point.

app/ide-desktop/package-lock.json Outdated Show resolved Hide resolved
import java.util.stream.Stream;

public class AggregatedProblems {
private final Problem[] problems;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use some ArrayList?

@jdunkerley jdunkerley force-pushed the wip/jd/multivalueindex branch 2 times, most recently from 5979ca7 to d6a5e0e Compare April 1, 2022 07:41
@jdunkerley jdunkerley force-pushed the wip/jd/multivalueindex branch from d6a5e0e to 7ef268b Compare April 1, 2022 07:42
@jdunkerley jdunkerley marked this pull request as ready for review April 1, 2022 13:10
@jdunkerley jdunkerley requested a review from 4e6 as a code owner April 1, 2022 13:10
@jdunkerley jdunkerley requested a review from radeusgd April 1, 2022 13:10
Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, although as noted some stuff like comparing performance or ensuring handling of Unicode strings need to be done at some point - but as agreed - will be more efficient to do these separately.

(still would appreciate addressing the earlier comments in one way or another)

- If grouping on or computing the `Mode` on a floating point number, a `Floating_Point_Grouping`.
- If an aggregation fails, an `Invalid_Aggregation_Method`.
- If when concatenating values there is an quoted delimited, an `Unquoted_Delimiter`
- If there are more than 10 issues with a single column, an `Additional_Warnings`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If each column can issue this thing, then I think it should contain the origin column as payload.

But I still think we should aggregate these issues in the same way as we do for inputs.

test/Benchmarks/src/Table/Aggregate.enso Outdated Show resolved Hide resolved
@jdunkerley jdunkerley added the CI: Ready to merge This PR is eligible for automatic merge label Apr 1, 2022
@mergify mergify bot merged commit a4dbc9a into develop Apr 4, 2022
@mergify mergify bot deleted the wip/jd/multivalueindex branch April 4, 2022 09:12
jdunkerley added a commit that referenced this pull request Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants