-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDF5 filter and plugin based on c-blosc2? #29
Comments
That would be great. The new API for C-Blosc2 is backward compatible with C-Blosc, so this should be easy. Just remember that the C-Blosc2 binary format is backward compatible, but not forward compatible. |
Does that mean that a HDF5 plugin based on c-blosc1 would not be able to read chunks compressed by a HDF5 plugin based on c-blosc2? |
That's correct. |
Is there anyway to have c-blosc2 produce a backwards-compatible binary format? |
I don't think so. At first I was trying to keep a format that was forward-compatible, but it was too much hassle, and decided not to do it. |
Hi, What is the guideline (is there any?...) regarding registered HDF5 filters compatibility and when it should use a different ID. The bottom line is being able to read old compressed data with new versions of the filter, and I would expect passing parameters to the filter through HDF5 to be compatible as well. |
By the way, Blosc2 is registered with a new ID, 32026: |
Question: In the event of using |
We would need to try, but I am pretty sure that an error will be raised when using C-Blosc1 with chunks generated with C-Blosc2. On the other hand, the way we are currently using the registered Blosc2 (ID 32026) is by using a CFrame. The CFrame is more flexible than a regular chunk, and will allow to use multidim metalayers, which should be useful for optimizing dataset reads. After reflecting more on this, one path that we could follow is to add a check in the current HDF5 Blosc1 filter so that, when an error would be detected, check whether what we are decompressing is a CFrame, and if so, call the actual Blosc2 filter. BTW, we have a preliminary version of a HDF5 Blosc2 filter at https://github.com/PyTables/PyTables/tree/direct-chunking-append/hdf5-blosc2/src, and one can use this for the standalone future hdf5-blosc2 filter. |
I did a very quick test (no shuffle/default compressor) with updating
If you want to take the way of switching to |
Great to see a HDF5 Blosc2 filter coming-up! |
After pondering a bit more about Blosc/Blosc2 compatibility, I think a better approach is to make the two filters totally separate. So the current hdf5-blosc will continue supporting just the C-Blosc 1.x series, while future hdf5-blosc2 will support just C-Blosc 2.x series. Also, having separate HDF5 Filter IDs will help in this. |
It would be convenient if we could decompress Blosc1 with c-blosc2 though. |
From what I tested, compiling hdf5-blosc1 filter with c-blosc2 can decompress chunks compressed with c-blosc1 (it is backward compatible)... but not the other way around. |
Any updates on this? There seem to be some blosc2 plugins availlable (eg pytables) but none support arbitrary filters as far as I can tell. I need BYTEDELTA to get a good compression ratio, I really want to be able to read/write hdf5 datasets and specify the blosc2 filters used to compress. |
I've seen that before, how does that answer my question? Is hdf5 going to adopt that proposal? |
Have you seen https://github.com/Blosc/b2h5py ? I'm not sure if I understand your question. The Blosc2 filter has been registered as ID 32026. There's nothing more for The HDF Group to do. |
b2h5py is mostly out of scope for me. To be clear:
You mention the plugin 32026, where is the authoritative implementation of that plugin?
I guess I could compress data with blosc2 myself and write it with |
Ok, I tried what I suggested above, but I get an error on decompression because |
@froody You are right that, with the current API, we cannot use the full functionality of Blosc2 pipeline inside HDF5. The solution would be to use the Meanwhile, I am glad that you figured out the best workaround, i.e. using direct chunking in HDF5 (via |
It's in PyTables: https://github.com/PyTables/PyTables/tree/master/hdf5-blosc2/src It's also embedded in hdf5plugin for usage with h5py. |
I think we can close this. |
I would be interested in seeing this plugin updated to work with c-blosc2 code.
The text was updated successfully, but these errors were encountered: