Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-38462: [Go][Parquet] Handle Boolean RLE encoding/decoding #38367
GH-38462: [Go][Parquet] Handle Boolean RLE encoding/decoding #38367
Changes from 2 commits
c121de0
e1ac231
a7bfc0f
80d1eef
0d86243
b9fad93
cd2fc77
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should
dec.nvals
dec in this branch?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in theory the return of the error would indicate that the decoder should not be used any further, so I don't know if we necessarily need to decode
dec.nvals
but it also wouldn't be wrong to do so.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a point in returning
max - n
here? What is the caller supposed to do with it?(note you aren't actually writing out the decoded values, so I assume the decoded bytes are lost, which means the decoder can't be used anymore afterwards)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we may meet the case if user use
Decoder
directly.However, during arrow reading it, I guess the upper reader will keep non-null value count and didn't hit the branch here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are writing out the decoded values via the loop at line 165. The input to this function is an output slice to write to and we populate it after the call to
GetBatch
. So returningmax - n
here informs the caller of how many values were populated into that slice before the error was hit.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But
dec.rleDec.GetBatch
has consumed some input and decoded some bytes that are just lost inbuf
, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. After receiving an error, a given decoder should not continue to be used, but we return the number of successfully output values so that a caller knows what values are there before the error was encountered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wouldn't matter without using raw-decode api. Because RecordReader will handing
Decode
well. But it's still make the syntax a bit inconsistentThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, you have a point though I think that would be a greater issue to think about attempting to fix.
For DataPageV2Header it contains the number of nulls so we can easily handle changing
dec.nvals
to the correct number, but for DataPageV1 you have a point that technically this might be slightly off or is at least a bug waiting to happen that should be addressed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, lets create an issue for that. I think user would not easily touch it because we always suggest using RecordReader, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or going through the ColumnChunkReader which has similar correct handling. We don't actually expose the raw decoder api for users to access at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha that's right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, what is the rationale for using
xerrors
here rather than the go stdlib as in other places?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at the time I originally wrote the parquet code,
xerrors
was used for particular benefits of wrapping and otherwise that weren't available in the stdlib. Since then, all the features ofxerrors
have been folded into the go stdlib and there really isn't a reason to use it anymore, I intend to phase it out as I make changes to the code. So I'm going to fix this to just useerrors
, it was my mistake to propagatexerrors
here so thanks for catching it.Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.