-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use highest available pickle protocol when serializing #737
Conversation
The change looks good to me. Can you please update docs/release-notes.rst and accept the CLA? Once we land it, I can cut a new release. |
Codecov Report
@@ Coverage Diff @@
## master #737 +/- ##
=======================================
Coverage 86.27% 86.27%
=======================================
Files 85 85
Lines 5084 5084
Branches 787 787
=======================================
Hits 4386 4386
Misses 559 559
Partials 139 139
Continue to review full report at Codecov.
|
bump... Is there something I need to do to get the unit tests to run? |
Sorry. Did not notice you were blocked on the tests. I had to approve the run for a user's first time. Should be ok going forward. Also, note there is a flaky test (some test that has to do with reading from a partitioned dataset). Feel free to rerunning the build if you bump into it. |
@selitvin I think I need permission to run the tests again. They did all pass, except for the ones you mentioned were flaky. |
That's weird. I thought it I am expected to approve running tests only for the first time. Will see if I can reconfigure this. |
Thanks for your help. Looks like the tests passed this time, so it's ready to merge :) |
Beautiful! Thank you for the PR. |
Pickle protocol 5 allows for zero-copy pickling of numpy arrays, but protocol version 4 remains the default for Python 3.8 onwards. When loading datasets with large numpy arrays and
reader_pool_type="process"
, using protocol 5 significantly improves serialization time and overall performance.Protocol 5 is only available in Python 3.8 and onwards, so for backwards and forwards compatibility with future pickle protocol improvements we specify
pickle.HIGHEST_PROTOCOL
.Small test case
master:
11.40 sec
This PR:
8.78 sec