This repository has been archived by the owner on Feb 18, 2024. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This uses
sample-test
, andsample-arrow2
, which we built specifically for this purpose. See the README for why we feltquickcheck
andproptest
were unsuitable.I'm not sure if we'd rather have the libraries be optional
[dependencies]
instead of[dev-dependencies]
(which cannot be optional). I figured this should be behind a feature flag though.When run exhaustively (see the commented
TODO
lines), this appears to unearth more errors in the parquet IO code. Issues appear to trigger with nesting and nullable fields in combination. Some examples:My prior experience with the def/rep level encoder obviously leads me to suspect that code. I know it was recently rewritten, but it's a very complex subject and I'm not shocked that it may need more work.
Let me know how I can help assist. In particular, the shrinking behavior in
sample-arrow2
is suboptimal due to chained resampling and some implementation hacks that can probably be improved. I can definitely assist if you're playing around with this and are having trouble shrinking back to useful exemplars.Setting the chunk length to a small value appears to be generating good counterexamples for now.