Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR comments from apache#34311 #1

Merged
merged 5 commits into from
Mar 3, 2023

Conversation

icexelloss
Copy link
Collaborator

Some comments from apache#34311

@icexelloss icexelloss changed the base branch from main to GH-32884 March 2, 2023 22:04
@github-actions
Copy link

github-actions bot commented Mar 2, 2023

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@icexelloss icexelloss force-pushed the GH-32884-ordered-agg branch from db68040 to 27ef9ba Compare March 2, 2023 22:06
@icexelloss icexelloss changed the title Gh 32884 ordered agg PR comments from apache#34311 Mar 2, 2023
@rtpsw
Copy link
Owner

rtpsw commented Mar 3, 2023

@icexelloss, the proposed changes are mostly OK. There is a conflict to resolve. I also suspect some automated conflict-resolutions did not go well, because I see both ResetAggregates and ResetKernelStates. Do you want to resolve the conflict(s) or should I? As for renaming symbols, I'm fine with RowSegmenter but SegmentPiece is odd, as a segment already means a piece. How about RowSegment for consistency?

@icexelloss
Copy link
Collaborator Author

I'm fine with RowSegmenter but SegmentPiece is odd, as a segment already means a piece. How about RowSegment for consistency?

I found the this confusing when first reading the code so I renamed this. Segment and segment piece are different concept. In my mind a segment is a set of rows with the same key and can span multiple batches. A segment piece a slice of a batch that belongs to a segment. I don't have strong opinions about the sames but if you want those are two things and we should be explicit about it. If what I describe as "segment piece" is what you call "segment" we need another term to represent "a collection of continuous segment that has the same key"

@icexelloss icexelloss force-pushed the GH-32884-ordered-agg branch from 27ef9ba to 7d972a0 Compare March 3, 2023 16:18
@icexelloss
Copy link
Collaborator Author

@rtpsw I named "Segment Piece" back to "Segment" and called the "a collection of continuous segment that has the same key" tentatively "segment group". Also resolved the conflict.

@icexelloss icexelloss force-pushed the GH-32884-ordered-agg branch from 7d972a0 to e96ac77 Compare March 3, 2023 16:24
@rtpsw
Copy link
Owner

rtpsw commented Mar 3, 2023

A segment piece a slice of a batch that belongs to a segment.

I see. OK, then I guess the name is fine, just needs a docstring.

/// example, in ordered time series processing, segment key can be "date", and a segment
/// group can be all the rows that belong to the same date.) A segment group can span
/// across multiple exec batches. A segment is a chunk of continous rows that has the same
/// segment key within a given batch. When a ? span cross batches, it will have multiple
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the question mark here is a typo?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines 33 to 34
/// \brief A segment.
/// A segment group is a chunk of continous rows that has the same segment key. (For
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit confusing to start with a description of a segment group when the docstring is for segment. I'd say here something like "A segment is a chunk of contiguous rows that have the same segment key. Multiple contiguous segments having a common segment key form a segment group.".

Note that there is a typo: "that has" should be "that have".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it doesn't really make sense to talk about segment without talking about segment group first - segment group is the top-level concept, segment is how we track batches within a segment group

/// group can be all the rows that belong to the same date.) A segment group can span
/// across multiple exec batches. A segment is a chunk of continous rows that has the same
/// segment key within a given batch. When a ? span cross batches, it will have multiple
/// segments. A segment never span cross batches. The segment data structure only makes
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "span" should be "spans".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch fixed.

@rtpsw rtpsw merged commit 5ac8c65 into rtpsw:GH-32884 Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants