Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec v3: progressive encoding #80

Open
davidbrochart opened this issue Jun 15, 2020 · 6 comments
Open

spec v3: progressive encoding #80

davidbrochart opened this issue Jun 15, 2020 · 6 comments
Labels
protocol-extension Protocol extension related issue

Comments

@davidbrochart
Copy link
Contributor

I'm wondering if progressive encoding could be supported in Zarr. It is a technique often used on the web, where a low resolution image can be downloaded and displayed first, and then refined as the download continues (see e.g. https://cloudinary.com/blog/progressive_jpegs_and_green_martians).
Zarr currently supports only full resolution contiguous chunks, so if you want to have a global view of the data, even if you are going to coarsen it afterwards, you have to first get all the data. Progressive encoding would allow to save a lot of bandwidth in this case, which is particularly useful for e.g. visualization.
But I'm not sure if it would be easy to fit into the current architecture, or if there is interest in it.

@davidbrochart
Copy link
Contributor Author

Or maybe this could be handled by a special compressor, provided that we can request a given resolution from the Zarr store. Each chunk would then have all the resolutions in them, which could look like:

0/0/0_res0
0/0/0_res1
0/0/0_res2

For full resolution, you would have to read (and combine) the 3 files, and for low resolution only 0/0/0_res0.

@davidbrochart
Copy link
Contributor Author

Might be a duplicate of #23.

@Carreau
Copy link
Contributor

Carreau commented Jun 15, 2020

Might be a duplicate of #23.

I'm not sure it's the same in the sens that in #23 you do completly store multiple resolution; while what you are requesting here is basically having the "lower frequencies" early in the chunks if I understand correctly.

I would put that under the category "partial read" or "partial decompress" use case.

@davidbrochart
Copy link
Contributor Author

Yes the point is not to duplicate data, even a shrunk version of it, but to decompose the data into progressive layers of details, a kind of Fourier transform to pick up your analogy with frequencies.
It might be a good compressor anyway because of the correlation between successive layers, e.g. the first layer is a coarsen average, the next is the finer grained differences on top of it, etc.

@joshmoore
Copy link
Member

@davidbrochart : my hope would have been to find a way with v3 to have jpeg2000-like compression across the multiscales. However, that was moved out as a "convention" rather than as part of the spec. I'd certainly be interested in having the two interact reasonably together.

@davidbrochart
Copy link
Contributor Author

Thinking about it, it might be possible to implement progressive encoding already, by adding a new dimension representing the resolution. When reading a zarr array, the decoding would be done according to the resolution value (i.e. reading all the chunks corresponding to the lower resolutions up to the requested one and combining them), and the resolution dimension would be removed from the result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
protocol-extension Protocol extension related issue
Projects
None yet
Development

No branches or pull requests

4 participants