-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Data corruption while reading an ORC file with StructColumn
#9395
Comments
This issue has been labeled |
This issue has been labeled |
Based on Enable struct columns for the OrcReader:
Enable struct children in the dataset generator:
and the following test script:
For now I'm satisfied that there aren't significant issues with struct support in cuIO read_orc. If we find a minimum reproducer for a Orc reader failure, please open a new issue. |
After more investigation, I believe there is something non-compliant or slightly off-spec with the original file. Re-writing the file with
Note the zeros in the first dataframe and the non-zero values in the second dataframe.
|
Describe the bug
There appears to be a data corruption issue while reading an ORC file generated in fuzz-testing.
Pyarrow
is able to read the file correctly but notcudf
.Steps/Code to reproduce bug
'bug.orc'
is made available internally to the cuio team as it is large for a github upload.Expected behavior
Return same results as pyarrow.
Environment overview (please complete the following information)
Environment details
Please run and paste the output of the
cudf/print_env.sh
script here, to gather any other relevant environment detailsClick here to see environment details
Additional context
Data is randomly generated in fuzz-testing.
The text was updated successfully, but these errors were encountered: