-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] ENH: Add context manager for zarr store cleanup #113
base: master
Are you sure you want to change the base?
[Proposal] ENH: Add context manager for zarr store cleanup #113
Conversation
The context manager adds: - simple fsspec filesystem handling - checking of existing `target_store` and raises a FileExistsError if needed - cleanup of `temp_store` using fsspec filesystem - the possibility of removing target_store as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bzah - thanks so much for this contribution and so sorry that no one has reviewed your changes so far.
Your PR looks great to me. I would gently ping @martindurant on the fsspec-related issues, but I don't see any problems. This seems very useful.
My only question is on the API. Maybe we just want to make this the new default?
rechunker/api.py
Outdated
|
||
|
||
@contextlib.contextmanager | ||
def rechunk_cm( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice if we could just call rechunk
directly as a context manager and get the right behavior, rather than introducing a new function to the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I renamed rechunk
to _unsafe_rechunk
and rechunk_cm
to rechunk
. Only the later is exposed in init
rechunker/api.py
Outdated
) -> Rechunked: | ||
try: | ||
if isinstance(filesystem, str): | ||
filesystem = fsspec.filesystem(filesystem) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should take the same options as whatever filesystem we are going to operate on (target_options or temp_options?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Martin if you can turn this into a suggested change, I will merge it in. We let @bzah's PR sit for a long time, so I'm not expecting him to be super responsive right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused here why there are temp_options and target_options, but only one filesystem - that's stopping me from knowing which to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. In general we support using different filesystems for temp vs. target. This new function assumes that they are both paths within the same filesystem.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are now 2 filesystems, one for each store.
I tried to add a unit test using a MemoryFileSystem but zarr fails to create the group saying |
I made a small context manager in my project to wrap rechunker.rechunk so I thought it could be nice to share it here.
I did that because I was tired of deleting the stores after each unsuccessful test.
The context manager
rechunk_cm
adds:target_store
exist and raising of a FileExistsError if true (move that torechunk()
?)temp_store
using fsspec filesystemkeep_target_store
flag)The last point might be useful when within the context, the target_store is opened, a computation is done and only the
resulting computation must be saved. In that case,
rechunker
is only used to pre-optimize the computation through a better but temporary chunking on disk.Notes:
MemoryFileSystem
of fsspec but without success.rechunk_cm
could probably be improved.I don't mind discarding this PR if you don't think this feature should be in rechunker.
Thanks for this awesome project anyway.