-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming add remfile #1761
Streaming add remfile #1761
Conversation
# Conflicts: # docs/gallery/advanced_io/streaming.py # environment-ros3.yml
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## dev #1761 +/- ##
==========================================
- Coverage 91.99% 83.68% -8.32%
==========================================
Files 27 27
Lines 2623 2623
Branches 685 685
==========================================
- Hits 2413 2195 -218
- Misses 138 344 +206
- Partials 72 84 +12
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@bendichter can we move this out of draft mode? |
* rmv addition of RemFile as an allowed type for NWBHDF5IO
@rly ok, ready for review |
Thanks. Could we make remfile Option 2, or even Option 1 on the streaming docs? See some related discussion #1791 |
Co-authored-by: Ryan Ly <[email protected]>
Co-authored-by: Ryan Ly <[email protected]>
Co-authored-by: Ryan Ly <[email protected]>
Thanks @rly ! Here's a very rough benchmark I put together to compare timings of fsspec and remfile https://github.com/scratchrealm/pynwb_streaming_benchmark Pasting a snapshot of the readme: The script main.py was run repeatedly in two settings: On dandihub and on a laptop in a home network. Below is a summary of the average timings. My assessment is that remfile appears to be faster for initial load time while fsspec method appears to be faster for reading a 30 second sample of ephys data. On the home network, the fsspec method appears to be substantially slower than the remfile method for the initial load time. Some limitations:
On dandihub Average Initial Load Time:
Average 30sec Sample Read Time:
On Jeremy's Home Network Average Initial Load Time:
Average 30sec Sample Read Time:
|
Motivation
remfile is a new package that has some advantages for streaming NWB HDF5 files from S3, so I want to document it as an option.
Depends on hdmf-dev/hdmf#946
Checklist
flake8
from the source directory.