-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Moving Aggregation to Java #3364
Conversation
|
1e851e0
to
3e6ab7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shape looks good in general, but have a few comments and I still need to have a look as I didn't manage to review everything.
One thing I'm a bit worried about is if we are not doing too much boxing for integer/double columns. This can introduce some performance cost.
We have the "old" aggregates in our Table using the group by=...
function which I think are using streams to potentially avoid boxing. There are only most basic operations implemented there (sum, min, max, mean), but it may be good to create a comparative benchmark comparing the performance of old and new aggregations to see where we are at.
I'm not sure if this should be done as part of this particular PR, but I think this is something we should check at some point.
distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Aggregate_Column_Aggregator.enso
Outdated
Show resolved
Hide resolved
std-bits/table/src/main/java/org/enso/table/data/index/MultiValueIndex.java
Outdated
Show resolved
Hide resolved
std-bits/table/src/main/java/org/enso/table/aggregations/Concatenate.java
Outdated
Show resolved
Hide resolved
std-bits/table/src/main/java/org/enso/table/aggregations/AggregateColumn.java
Outdated
Show resolved
Hide resolved
import java.util.stream.Stream; | ||
|
||
public class AggregatedProblems { | ||
private final Problem[] problems; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use some ArrayList?
std-bits/table/src/main/java/org/enso/table/data/table/aggregate/AggregateColumnDefinition.java
Outdated
Show resolved
Hide resolved
distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Aggregate_Column_Helper.enso
Outdated
Show resolved
Hide resolved
5979ca7
to
d6a5e0e
Compare
…lueIndex.java Co-authored-by: Radosław Waśko <[email protected]>
d6a5e0e
to
7ef268b
Compare
distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Aggregate_Column_Helper.enso
Outdated
Show resolved
Hide resolved
std-bits/table/src/main/java/org/enso/table/aggregations/Last.java
Outdated
Show resolved
Hide resolved
std-bits/table/src/main/java/org/enso/table/aggregations/Last.java
Outdated
Show resolved
Hide resolved
std-bits/table/src/main/java/org/enso/table/aggregations/MinOrMax.java
Outdated
Show resolved
Hide resolved
std-bits/table/src/main/java/org/enso/table/aggregations/ShortestOrLongest.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, although as noted some stuff like comparing performance or ensuring handling of Unicode strings need to be done at some point - but as agreed - will be more efficient to do these separately.
(still would appreciate addressing the earlier comments in one way or another)
- If grouping on or computing the `Mode` on a floating point number, a `Floating_Point_Grouping`. | ||
- If an aggregation fails, an `Invalid_Aggregation_Method`. | ||
- If when concatenating values there is an quoted delimited, an `Unquoted_Delimiter` | ||
- If there are more than 10 issues with a single column, an `Additional_Warnings`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If each column can issue this thing, then I think it should contain the origin column as payload.
But I still think we should aggregate these issues in the same way as we do for inputs.
Pull Request Description
Important Notes
Checklist
Please include the following checklist in your PR:
./run dist
and./run watch
.