Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring the io and in-memory lazy vs non-lazy representation #293

Open
LucaMarconato opened this issue Jun 9, 2023 · 6 comments
Open
Assignees
Labels

Comments

@LucaMarconato
Copy link
Member

LucaMarconato commented Jun 9, 2023

Here are the linked issues
Incremental IO
#186
#222
(old PR containing discussion) #138

Lazy vs non-lazy
#243
#153
#297

@giovp giovp self-assigned this Jun 13, 2023
@LucaMarconato
Copy link
Member Author

@lopollar reported that saving extra columns in shapes layers is not working at the moment #311. I think it was working but we didn't have a test for that.

Please @giovp when working on this PR also add a test for this.

@LucaMarconato
Copy link
Member Author

Refactoring the geometry argument of the shapes model: #315

@giovp
Copy link
Member

giovp commented Jul 30, 2023

question regarding incremental, plan is to remove entirely the write/read steps when __setitem__ is called, unless overwrite=True, in that case a call to write follows and hence makes sense to reload data lazy?

@LucaMarconato
Copy link
Member Author

LucaMarconato commented Jul 30, 2023

I'd maybe remove add_image(), etc and leave only __setitem__. In this way there is no option to pass overwrite=True. I would then add a method to resave to disk a specifc element. So the workflow would be:

sdata.images['my_image'] = im
# or even the following, that is already implemented and uses single dispatch to forward 
# the right `__setitem__`
sdata['my_image'] = im
sdata.write('my_data.zarr')
# example of in-place operation (can be both lazy or in-memory)
sdata['my_image'] = sdata['my_image'] + 1
sdata.write(element='my_image')

We could add a parameter reload=True to write() that does the lazy re-reading, and let the user disable it for specific cases, like when the data is all full in-memory and the user wants to keep it there.

@LucaMarconato
Copy link
Member Author

I'd then have something like this to load an element in-memory.

# all is read lazy by default
sdata = read_zarr('my_data.zarr')
sdata.load_in_memory(element='my_image')

@LucaMarconato
Copy link
Member Author

We should also consider adding the option to keep data lazy but making it persistent (.persist()).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants