-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-38399: [Go][Parquet] DeltaBitPack decoder reset usedFirst after SetData #38413
GH-38399: [Go][Parquet] DeltaBitPack decoder reset usedFirst after SetData #38413
Conversation
|
@zeroshade I've investigate and submit a fixing. It's a bit late in UTC-8. I'll add tests tomorrow. |
return xerrors.New("parquet: eof exception") | ||
} | ||
if d.miniBlocksPerBlock == 0 { | ||
return xerrors.New("parquet: cannot have zero miniblock per block") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xerrors
is no longer necessary as it has been absorbed into the stdlib, so for new code please just use errors
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I replace the xerrors
in this file? Since mixing xerror
and standard lib is so weird :-(
if err != nil { | ||
return 0, err | ||
} | ||
d.currentMiniBlockVals = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this redundant due to the above if statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Previously I though it might raise from data on last block( already-read + d.currentBlockVals would greater than max). However finally I found that here it will limit by out
and max
. So this line can be removed.
This looks great to me @mapleFU thanks! I'll review again once you add the tests |
I've finished testing and resolve the comments @zeroshade . I'm not familiar with go enough, so don't know why CI failed... |
@mapleFU The failing appveyor test doesn't appear related at all so you're fine there. Are either of the tests you added able to reproduce the initial failure that was reported in the issue (without your fix)? If not, can you add a test that does reproduce the initial failure? |
// Using same Decoder to decode the data.
dec := encoding.NewDecoder(parquet.Types.Int32, parquet.Encodings.DeltaBinaryPacked, column, memory.DefaultAllocator)
for i := 0; i < 5; i += 1 {
dec.(encoding.Int32Decoder).SetData(len(values), buf.Bytes())
valueBuf := make([]int32, 100)
for i, j := 0, len(valueBuf); j <= len(values); i, j = i+len(valueBuf), j+len(valueBuf) {
dec.(encoding.Int32Decoder).Decode(valueBuf)
assert.Equalf(t, values[i:j], valueBuf, "indexes %d:%d", i, j)
}
} I've add the same test for int32 and int64, please refer to "test decoding multiple pages". @zeroshade In the case, |
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 7b1281a. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…ter SetData (apache#38413) ### Rationale for this change As apache#38399 says. DeltaBitPack will corrupt when we meet a column chunk with more than one page. During first page decoding, it works well. But when the second page comes, the `d.usedFirst` haven't been reset, which cause the bug. ### What changes are included in this PR? 1. Some style enhancement 2. Bug fix ### Are these changes tested? Currently not ### Are there any user-facing changes? bugfix * Closes: apache#38399 Authored-by: mwish <[email protected]> Signed-off-by: Matt Topol <[email protected]>
…ter SetData (apache#38413) ### Rationale for this change As apache#38399 says. DeltaBitPack will corrupt when we meet a column chunk with more than one page. During first page decoding, it works well. But when the second page comes, the `d.usedFirst` haven't been reset, which cause the bug. ### What changes are included in this PR? 1. Some style enhancement 2. Bug fix ### Are these changes tested? Currently not ### Are there any user-facing changes? bugfix * Closes: apache#38399 Authored-by: mwish <[email protected]> Signed-off-by: Matt Topol <[email protected]>
…ter SetData (apache#38413) ### Rationale for this change As apache#38399 says. DeltaBitPack will corrupt when we meet a column chunk with more than one page. During first page decoding, it works well. But when the second page comes, the `d.usedFirst` haven't been reset, which cause the bug. ### What changes are included in this PR? 1. Some style enhancement 2. Bug fix ### Are these changes tested? Currently not ### Are there any user-facing changes? bugfix * Closes: apache#38399 Authored-by: mwish <[email protected]> Signed-off-by: Matt Topol <[email protected]>
Rationale for this change
As #38399 says. DeltaBitPack will corrupt when we meet a column chunk
with more than one page. During first page decoding, it works well. But when the second page comes, the
d.usedFirst
haven't been reset, which cause the bug.What changes are included in this PR?
Are these changes tested?
Currently not
Are there any user-facing changes?
bugfix