-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Parquet] Add exhaustive integration testing with all possible Parquet types #37943
Milestone
Comments
Hi dane, I'd like to do with it fuzzing, but still we have lots of types that we cannot support :-( |
An older issue about this: #22325 |
jduo
added a commit
to jduo/arrow
that referenced
this issue
Oct 12, 2023
Add a reference file with all supported types and corresponding test case to validate that the Dataset API generates this consistently.
jduo
added a commit
to jduo/arrow
that referenced
this issue
Oct 12, 2023
Add a reference file with all supported types and corresponding test case to validate that the Dataset API generates this consistently.
jduo
added a commit
to jduo/arrow
that referenced
this issue
Oct 12, 2023
Add a reference file with all supported types and corresponding test case to validate that the Dataset API generates this consistently.
jduo
added a commit
to jduo/arrow
that referenced
this issue
Oct 13, 2023
Add a reference file with all supported types and corresponding test case to validate that the Dataset API generates this consistently.
jduo
added a commit
to jduo/arrow
that referenced
this issue
Oct 19, 2023
Add a reference file with all supported types and corresponding test case to validate that the Dataset API generates this consistently.
lidavidm
pushed a commit
that referenced
this issue
Oct 20, 2023
### Rationale for this change Validate the types the Dataset APIs support when generating Parquet files. ### What changes are included in this PR? Add a reference file with all supported types and corresponding test case to validate that the Dataset API generates this consistently. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: #37943 Authored-by: James Duong <[email protected]> Signed-off-by: David Li <[email protected]>
JerAguilon
pushed a commit
to JerAguilon/arrow
that referenced
this issue
Oct 23, 2023
…che#38249) ### Rationale for this change Validate the types the Dataset APIs support when generating Parquet files. ### What changes are included in this PR? Add a reference file with all supported types and corresponding test case to validate that the Dataset API generates this consistently. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#37943 Authored-by: James Duong <[email protected]> Signed-off-by: David Li <[email protected]>
JerAguilon
pushed a commit
to JerAguilon/arrow
that referenced
this issue
Oct 25, 2023
…che#38249) ### Rationale for this change Validate the types the Dataset APIs support when generating Parquet files. ### What changes are included in this PR? Add a reference file with all supported types and corresponding test case to validate that the Dataset API generates this consistently. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#37943 Authored-by: James Duong <[email protected]> Signed-off-by: David Li <[email protected]>
loicalleyne
pushed a commit
to loicalleyne/arrow
that referenced
this issue
Nov 13, 2023
…che#38249) ### Rationale for this change Validate the types the Dataset APIs support when generating Parquet files. ### What changes are included in this PR? Add a reference file with all supported types and corresponding test case to validate that the Dataset API generates this consistently. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#37943 Authored-by: James Duong <[email protected]> Signed-off-by: David Li <[email protected]>
dgreiss
pushed a commit
to dgreiss/arrow
that referenced
this issue
Feb 19, 2024
…che#38249) ### Rationale for this change Validate the types the Dataset APIs support when generating Parquet files. ### What changes are included in this PR? Add a reference file with all supported types and corresponding test case to validate that the Dataset API generates this consistently. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#37943 Authored-by: James Duong <[email protected]> Signed-off-by: David Li <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the enhancement requested
Arrow and Parquet does not have exhaustive integration testing for all possible Parquet data types.
For example, it would be useful if there was a single simple sample Parquet file that had only 1 or 2 rows of data, but covered as much of the type feature space as possible. This would also be useful for testing backwards compatibility of versions e.g. to help catch issues like these[1].
The arrow testing data currently lives in a separate repo[2].
We should:
[1]https://lists.apache.org/thread/4sw2vfmdx60kl2psolwvch8h2297zdkb
[2]https://github.com/apache/arrow-testing/tree/47f7b56b25683202c1fd957668e13f2abafc0f12
Component(s)
Parquet
The text was updated successfully, but these errors were encountered: