-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PR comments from apache#34311 #1
Conversation
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format?
or
In the case of PARQUET issues on JIRA the title also supports:
See also: |
db68040
to
27ef9ba
Compare
@icexelloss, the proposed changes are mostly OK. There is a conflict to resolve. I also suspect some automated conflict-resolutions did not go well, because I see both |
I found the this confusing when first reading the code so I renamed this. Segment and segment piece are different concept. In my mind a segment is a set of rows with the same key and can span multiple batches. A segment piece a slice of a batch that belongs to a segment. I don't have strong opinions about the sames but if you want those are two things and we should be explicit about it. If what I describe as "segment piece" is what you call "segment" we need another term to represent "a collection of continuous segment that has the same key" |
27ef9ba
to
7d972a0
Compare
@rtpsw I named "Segment Piece" back to "Segment" and called the "a collection of continuous segment that has the same key" tentatively "segment group". Also resolved the conflict. |
7d972a0
to
e96ac77
Compare
I see. OK, then I guess the name is fine, just needs a docstring. |
cpp/src/arrow/compute/row/grouper.h
Outdated
/// example, in ordered time series processing, segment key can be "date", and a segment | ||
/// group can be all the rows that belong to the same date.) A segment group can span | ||
/// across multiple exec batches. A segment is a chunk of continous rows that has the same | ||
/// segment key within a given batch. When a ? span cross batches, it will have multiple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the question mark here is a typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
cpp/src/arrow/compute/row/grouper.h
Outdated
/// \brief A segment. | ||
/// A segment group is a chunk of continous rows that has the same segment key. (For |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit confusing to start with a description of a segment group when the docstring is for segment. I'd say here something like "A segment is a chunk of contiguous rows that have the same segment key. Multiple contiguous segments having a common segment key form a segment group.".
Note that there is a typo: "that has" should be "that have".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it doesn't really make sense to talk about segment without talking about segment group first - segment group is the top-level concept, segment is how we track batches within a segment group
cpp/src/arrow/compute/row/grouper.h
Outdated
/// group can be all the rows that belong to the same date.) A segment group can span | ||
/// across multiple exec batches. A segment is a chunk of continous rows that has the same | ||
/// segment key within a given batch. When a ? span cross batches, it will have multiple | ||
/// segments. A segment never span cross batches. The segment data structure only makes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: "span" should be "spans".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch fixed.
Some comments from apache#34311