-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add Zarr driver #3411
[WIP] Add Zarr driver #3411
Conversation
3b47b4b
to
3c190f2
Compare
gdal/configure.ac
Outdated
@@ -1630,7 +1664,7 @@ OGRFORMATS_ENABLED= | |||
OGRFORMATS_ENABLED_CFLAGS= | |||
OGRFORMATS_DISABLED= | |||
|
|||
AC_DEFUN([INTERNAL_FORMATS],[aaigrid adrg aigrid airsar arg blx bmp bsb cals ceos ceos2 coasp cosar ctg dimap dted e00grid elas envisat ers esric fit gff gsg gxf hf2 idrisi ignfheightasciigrid ilwis ingr iris iso8211 jaxapalsar jdem kmlsuperoverlay l1b leveller map mrf msgn ngsgeoid nitf northwood pds prf r raw rmf rs2 safe saga sdts sentinel2 sgi sigdem srtmhgt stacta terragen tga til tsx usgsdem xpm xyz zmap]) | |||
AC_DEFUN([INTERNAL_FORMATS],[aaigrid adrg aigrid airsar arg blx bmp bsb cals ceos ceos2 coasp cosar ctg dimap dted e00grid elas envisat ers esric fit gff gsg gxf hf2 idrisi ignfheightasciigrid ilwis ingr iris iso8211 jaxapalsar jdem kmlsuperoverlay l1b leveller map mrf msgn ngsgeoid nitf northwood pds prf r raw rmf rs2 safe saga sdts sentinel2 sgi sigdem srtmhgt stacta terragen tga til tsx usgsdem xpm xyz zarr zmap]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zarr is not an internal driver as it requires a direct dependency. you should follow the same logic as in e.g. hdf5, kakadu, openjpeg...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tbonfort, this should be fixed now.
|
||
.. shortname:: Zarr | ||
|
||
.. versionadded:: 3.2.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be 3.3.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -51,7 +51,9 @@ CXXFLAGS="-std=c++17 -O1 $ARCH_FLAGS" CFLAGS="-O1 $ARCH_FLAGS" ./configure --pre | |||
--with-kea=/usr/bin/kea-config \ | |||
--with-tiledb \ | |||
--with-crypto \ | |||
--with-ecw=/opt/libecwj2-3.3 | |||
--with-ecw=/opt/libecwj2-3.3 \ | |||
--with-zarr \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice if --with-zarr could take a non default install location
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean, we can pass any directory to --with-zarr
already? Or do you mean in the CI specifically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed that part. I did mean that the non-standard path can be selected by the user outside of the CI.
However for checking for the lib/headers in default location you should use something similar to https://github.com/OSGeo/gdal/blob/64db832b3e475bb70d5acae99820c5ee405c9838/gdal/configure.ac#L2397 instead of looking up a hard-coded path. Namely -I/usr/include should never be added manually to the CFLAGS
--with-ecw=/opt/libecwj2-3.3 | ||
--with-ecw=/opt/libecwj2-3.3 \ | ||
--with-zarr \ | ||
--enable-driver-zarr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should not be necessary with a correct configure script, --with-zarr should be sufficient to activate the driver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
#include <algorithm> | ||
|
||
CPL_CVSID("$Id$") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, I'm not sure why we keep this. Recently added STACTA driver also contains such line, likely, by copy & paste carry-over. Do we still need such cruft @rouault ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need such cruft @rouault ?
They are substitued by the mkgdaldist.sh script. This is probably only useful to spy and know which version of GDAL some binary build embeds.
gdal/frmts/zarr/zarrdataset.cpp
Outdated
template <typename T> | ||
void assign_chunk(void* pImage, xt::zarray& z, int nBlockYSize, int nBlockXSize, | ||
int nBlockYOff, int nBlockXOff, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For general hygiene, such utilities in .cpp
file should live in unnamed namespace:
namespace
{
template <typename T>
void assign_chunk(....
}
gdal/frmts/zarr/zarrdataset.cpp
Outdated
|
||
public: | ||
ZarrRasterBand( ZarrDataset *, int, xt::xzarr_hierarchy<T>& ); | ||
virtual ~ZarrRasterBand(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
virtual
is redundant, if anything it should have ~ZarrRasterBand() override
gdal/frmts/zarr/zarrdataset.cpp
Outdated
|
||
public: | ||
ZarrDataset(); | ||
~ZarrDataset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could read ~ZarrDataset() override
gdal/frmts/zarr/zarrdataset.cpp
Outdated
ZarrRasterBand( ZarrDataset *, int, xt::xzarr_hierarchy<T>& ); | ||
virtual ~ZarrRasterBand(); | ||
|
||
virtual CPLErr IReadBlock( int, int, void * ) override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
virtual
is redundant
gdal/frmts/zarr/zarrdataset.cpp
Outdated
ZarrDataset::ZarrDataset() : | ||
fp(nullptr) | ||
{ | ||
std::fill_n(abyHeader, CPL_ARRAYSIZE(abyHeader), static_cast<GByte>(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::memset
may be better optimised
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
gdal/frmts/zarr/zarrdataset.cpp
Outdated
//xt::xzarr_register_compressor<T, xt::xio_blosc_config>(); // TODO | ||
|
||
// Create a corresponding GDALDataset. | ||
ZarrDataset *poDS = new ZarrDataset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can any of the following zarr_*
calls throw? Then this will leak.
Wrapping it with std::unique_ptr
and .release()
ing it at the return
may be a bit safer way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they can throw. I wrapped the pointer with std::unique_ptr
, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they can throw.
If xtensor-zarr can throw exceptions, they must be caught by any of the methods of the GDAL driver API (virtual methods, and methods like Open()), since GDAL core doesn't expect exceptions to propagate to it.
c6b72c6
to
0774b5b
Compare
Not sure why the |
This check says:
|
Thanks @mloskot, I'm testing only on Ubuntu 20.04 for now, and it seems the driver is found: |
That log says
but there is no trace of building anything in I suspect |
Honest question: if the current trend in gdal is toward minimising the code base and removing drivers, doesn't that include limiting inclusion of totally new drivers like this? |
908226b
to
6dbf40f
Compare
@nyalldawson Feeling it's gonna happen, that is precisely why I long-delayed merge of my JPEG XR (#203) ;-) Seriously though, I think such question belongs to gdal-dev ML |
@mloskot Do you think it's still worth working on this PR then? |
@davidbrochart Honest answer: |
33971fe
to
e9cdde2
Compare
In the GDAL mailing-list Tamas mentioned the possibility of drivers being implemented as plugins. I was not aware of that, could you point me to examples of such drivers? |
The pdf driver would be a good example I think: https://github.com/OSGeo/gdal/tree/master/gdal/frmts/pdf |
679f708
to
656e76c
Compare
@davidbrochart @hobu I'd like to see discussion on gdal-dev before this is merged. In my experience zarr is not so much a format as a library for storing and accessing arrays. What's the scope of this work? How many of zarr's compressors and storage schemes will be supported here? What about xtensor-zarr's warning that it is not ready for general use? Questions like these need to be explored yet. |
@sgillies I agree that more discussion is needed, in any case this PR is not ready to be merged yet. There's still work to be done in the xtensor stack for xtensor-zarr to be fully usable. I started this work as a potential application of xtensor-zarr based on a discussion in the Zarr community.
While I agree that Zarr is mostly known for the Python library, there exists implementations in other languages (Zarr.js in JavaScript and now xtensor-zarr in C++) and they all implement the Zarr protocol. Version 3 of the specification is being defined here.
At the moment, xtensor-zarr only supports (uncompressed) raw binary, GZip and Blosc, but we plan to support more compressors. As for the storage schemes, xtensor-zarr supports the local file system, AWS S3, Google Cloud Storage, plus all the ones supported by GDAL's Virtual File System. I understand that GDAL will require RFCs for new format drivers, and I would be happy to submit one for Zarr. |
Email sent to gdal-dev: https://lists.osgeo.org/pipermail/gdal-dev/2021-February/053384.html. |
Just dropping in on this conversation from perspective of a zarr developer.
Zarr is a specification, with many implementations in different languages. The python implementation is the most complete. The V2 spec is currently under review with OGC on the "community standard" track. The V3 spec is under active development and is open for public comment. My understanding is that the xtensor-zarr implementation will support both versions of the spec. |
That is correct, in addition to supporting Zarr v3, xtensor-zarr supports a part of Zarr v2. Actually this PR specifically targets version 2. |
4c5d573
to
2f397b4
Compare
faa9e9a
to
877d81f
Compare
I've given a try at building the driver. My initial attempt on Linux lead to my computer freezing to death due to lack of available RAM. After rebooting with more RAM available, here are my observations:
So there's a very significant RAM usage required (around 5 GB for a -O2 -g build which is typical of how a package is built on a Linux distro). I'm afraid this could be a blocker on restricted build environments (let's say I want to build GDAL on some small cloud VM with just 2 GB RAM). Could that be significantly reduced ? Given that in a GDAL driver context, we just need to access the values in Zarr chunks without any fancy processing, I'm wondering if drawing xtensor and other dependencies isn't overkill. My initial gut feeling for a Zarr driver was to do it "at hand" without any extra dependencies (apart maybe from a few compression methods) than the one already used by GDAL |
Hey @rouault, thanks for checking this out! Yeah it is a bit silly that we are not providing zarray in a precompiled form, which should resolve the RAM consumption issue, and mitigate the final binary size. I will let @JohanMabille comment on this in greater detail. |
The different features provided by |
46a4439
to
4c68fd6
Compare
The discussion corteva/rioxarray#246 seems pertinent here: if GDAL can handle fsspec (python) files, then zarr-in-gdal would be more useful - when called from python. |
Hi, I just wanted to mention that I'll have soon to work on a GDAL Zarr driver. However, my current position would be to completely go from scratch, and not rely on xtensor-zarr. There are several reasons for that:
|
Hi Even, all the concerns you raised are fair. We need to make a better job at making xtensor-zarr more lightweight when no computational capability is needed, which is the use-case of this driver. We have some ideas about reducing compilation time and resources. About custom data types, it is true that it is not possible today, although definitely feasible. |
Another approach at creating a Zarr driver in #3896 |
What does this PR do?
This is an effort to add a Zarr driver, based on xtensor-zarr.
What are related issues/pull requests?
Tasklist
Environment
Provide environment details, if relevant: