-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression with Zarr: ReadOnlyError #135
Comments
Shoot, I'm still getting the read_only errors with 0.5.1: |
I think you may be hitting a version of zarr-developers/zarr-python#1353 because you are calling m = fs.get_mapper("") Try updating to the latest zarr version, or else creating an FSStore instead. |
Okay, will do! |
Would be helpful to confirm which Zarr version you had installed. |
Hmm, |
Okay, with the latest But the workflow fails near the end of the rechunking process:
The logs from those workers are not available on the dashboard, I guess because the workers died, right? This rechunker workflow was working in December. Should I revert to zarr and rechunker from that era? |
Ideally you would figure out what is going wrong and help us fix it, rather than rolling back to an earlier version. After all, you're a rechunker maintainer now! 😉 Are you sure that all your package versions match on your workers? |
I'm certainly willing to try to help debug it, but don't really know where to start. If you have ideas, I'm game to try them. One of the nice things about nebari/conda-store is the notebook and workers see the same environment (accessed from the conda-store pod), so the versions always match. I added you to the ESIP Nebari deployment if you are interested in checking it out. https://nebari.esipfed.org/hub/user-redirect/lab/tree/shared/users/Welcome.ipynb |
I won't be able to log into the ESIP cluster to debug your failing computation. If you think there has been a regression in rechunker in the new release, I strongly encourage you to develop a minimum reproducible example and share it via the issue tracker.
My first idea would be to freeze every package version except rechunker in your environment, and then try running the exact same workflow with only different rechunker versions (say 0.5.0 vs 0.5.1). Your example has a million moving pieces. Dask, Zarr, kerchunk, xarray, etc etc. It's impossible to say whether your problem is caused by a change in rechunker unless you can isolate this. There have been extremely few changes to rechunker over the past year. Nothing that obviously would cause your dask workers to start running out of memory. |
I've confirmed that my rechunking workflow runs successfully if I pin
|
@gzt5142 has a minimal reproducible example he will post shortly. But should this be raised as a zarr issue? |
Thanks a lot for looking into this Rich!
How minimal is it? Can you decouple it from the dask and rechunker issues? Can you say more about what you think the root problem is? |
Unfortunately it turns out the minimal example we created works fine -- does not trigger the problem described here. :( |
I'm going to reopen this issue. If there is a bug somewhere in our stack that is preventing rechunker from working properly, we really need to get to the bottom of it. |
Tests with the latest dev environment are failing with errors like this
This is the cause of the test failures in #134.
The text was updated successfully, but these errors were encountered: