-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel zarr with Queue instance attributes #118
Parallel zarr with Queue instance attributes #118
Conversation
…hdmf-zarr into add_parallel_zarr
ReadTheDocs shows the following error due to the added
Maybe the |
I do believe this is an issue with the version of Python being used to compile the docs - do you know what that is? You can also specify it explicitly in the config file like here: https://github.com/catalystneuro/neuroconv/blob/main/.readthedocs.yaml#L10-L11 Alternative would I suppose be to lower the exact version pin for the CI, but I defer to how y'all prefer to have all that setup |
Some tests define a new |
Ahh good catch: hdmf-zarr/src/hdmf_zarr/backend.py Lines 839 to 840 in 6c13e14
Since the method is not marked as private I'll just instantiate a standard non-parallel Queue at that point then |
Those should be unit tests, since write_dataset is not a a function that a user should ever call, but is used internally. The tests should be adjusted to manually set the dci_queue variable before calling write_dataset. Alternatively, we could also add |
Sorry, I didn't see your comment until after I posted my other response. I agree, "instantiate a standard non-parallel Queue at that point then" is the way to go. |
Surprised that 3.7 is still supported here - is there a timeline for when that will be dropped? |
Otherwise, the currently failing CI is, I believe, due to the version pin of |
This is due to the pin on the HDMF version. Once we have the issue with references on export resolved we'll update the HDMF version and then we can also update the tests. @mavaylon1 is working on the issue. |
To get the tests to run for now, you could just increase the hdmf version on this PR so we can see that the CI is running. There will be a couple of tests that fail for export, but at least we can then see that everything is working in the CI. Aside from the bug on export, I believe you can safely use the current version of HDMF without having to change anything else in the code. |
@CodyCBakerPhD with #120 now merged, the dev branch now supports the latest HDMF. Could you sync this PR with the dev branch to see if that fixes the failing tests now so that we can move forward with merging this PR as well. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## dev #118 +/- ##
==========================================
+ Coverage 84.73% 85.66% +0.92%
==========================================
Files 12 13 +1
Lines 2903 3139 +236
==========================================
+ Hits 2460 2689 +229
- Misses 443 450 +7
☔ View full report in Codecov by Sentry. |
@oruebel Done, not sure what's up with the coverage workflows though |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
fix #101
replace #111
Motivation
Zarr supports efficient parallelization, but enabling it seamlessly with only a single argument (
number_of_jobs
atio.write
) took a bit of effort.Currently seeing progressive speedups with the attached dummy script as the number of jobs increases; on the DANDI Hub ~160s for 1 CPU, . Will make an averaged plot over the number of jobs to use for reference
Will make a full averaged plot over the number of jobs to use for reference
Opening in draft while I assess what all is still necessary and what can still be optimized in terms of worker/job initialization
Also will have to think on how to add tests; I suppose just adding some that use 2 jobs and making sure it works should be enough
How to test the behavior?
Checklist
ruff
from the source directory.