Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OGRLayer::WriteArrowBatch type fallback implementation breaks if OGRLayer::GetDataset is not overridden #9568

Closed
himikof opened this issue Mar 27, 2024 · 2 comments · Fixed by #9572
Assignees

Comments

@himikof
Copy link
Contributor

himikof commented Mar 27, 2024

What is the bug?

#8517 added a generic type fallback mechanism for writing non-native field types as strings/JSON. The Arrow-based writing API with this fallback is very useful for all layer types when the data is already present in the Arrow format in a driver-agnostic application (for example, through the new WIP Arrow writing path in pyogrio). Unfortunately, the fallback implementation currently relies on the OGRLayer::GetDataset being overridden by a specific layer to fetch the list of field data types supported by its driver, and starts silently creating fields with non-supported data types and passing through such data into CreateFeature if this method is not implemented or returns nullptr.

The documentation on this method seems to be a bit outdated, and states the following:

/** Return the dataset associated with this layer.
 *
 * NOTE: that method is implemented in very few drivers, and cannot generally
 * be relied on. It is currently only used by the GetRecordBatchSchema()
 * method to retrieve the field domain associated with a field, to fill the
 * dictionary field of a struct ArrowSchema.
 *
 * @return dataset, or nullptr when unknown.
 * @since GDAL 3.6
 */
GDALDataset *OGRLayer::GetDataset()
{
    return nullptr;
}

I would argue that at least the type fallback mechanism needs either to be able to rely on this method, in which case it needs to be implemented in all layers, or there should be some other reliable way to access the driver metadata. I stumbled on this issue when trying to write some data with a StringList field into FlatGeoBuf, and it was very confusing to see it fail when the Shapefile write succeeded, while they should be using the same generic implementation.

It seems that indeed only a few drivers implement this method currently, with FlatGeoBuf added in master (#8860, not in 3.8). Maybe it would be possible to at least backport the OGRFlatGeobufLayer::GetDataset implementation from there into 3.8?

Steps to reproduce the issue

Autotest test_ogr_shape_write_arrow_fallback_types should work for all drivers without native datetime/list support.

Versions and provenance

Issue reproduced on GDAL 3.8.3 built into the pyogrio wheels on Linux x86_64.

Additional context

No response

@rouault
Copy link
Member

rouault commented Mar 27, 2024

I would argue that at least the type fallback mechanism needs either to be able to rely on this method, in which case it needs to be implemented in all layers

Help welcome :-)
There are close to 50 drivers with vector creation capabilities...

../ogr/ogrsf_frmts/xlsx/ogrxlsxdatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/ili/ogrili2datasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/ili/ogrili1datasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/jsonfg/ogrjsonfgdataset.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/selafin/ogrselafindatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/csv/ogrcsvdatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/mysql/ogrmysqldatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/pgdump/ogrpgdumpdatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/carto/ogrcartodatasource.cpp:    if (bReadWrite && EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/gpx/ogrgpxdatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/shape/ogrshapedatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/parquet/ogrparquetwriterdataset.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/pg/ogrpgdatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer) || EQUAL(pszCap, ODsCDeleteLayer) ||
../ogr/ogrsf_frmts/wasp/ogrwaspdatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer) && oLayer.get() == nullptr)
../ogr/ogrsf_frmts/vdv/ogrvdvdatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/dxf/ogrdxfwriterds.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/openfilegdb/ogropenfilegdbdatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer) || EQUAL(pszCap, ODsCDeleteLayer) ||
../ogr/ogrsf_frmts/ngw/gdalngwdataset.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/mssqlspatial/ogrmssqlspatialdatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer) || EQUAL(pszCap, ODsCDeleteLayer))
../ogr/ogrsf_frmts/mitab/mitab_ogr_datasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/libkml/ogrlibkmldatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/mapml/ogrmapmldataset.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/oci/ogrocidatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer) && bDSUpdate)
../ogr/ogrsf_frmts/arrow/ogrfeatherwriterdataset.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/kml/ogrkmldatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/georss/ogrgeorssdatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/mem/ogrmemdatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/geojson/ogrgeojsondatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/geojson/ogrgeojsonseqdriver.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/flatgeobuf/ogrflatgeobufdataset.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/gpkg/ogrgeopackagedatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer) || EQUAL(pszCap, ODsCDeleteLayer) ||
../ogr/ogrsf_frmts/gmt/ogrgmtdatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/amigocloud/ogramigocloudtablelayer.cpp:        EQUAL(pszCap, OLCDeleteFeature) || EQUAL(pszCap, ODsCCreateLayer) ||
../ogr/ogrsf_frmts/amigocloud/ogramigoclouddatasource.cpp:    if (bReadWrite && EQUAL(pszCap, ODsCCreateLayer) && nLayers == 0)
../ogr/ogrsf_frmts/ods/ogrodsdatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/dwg/ogrdgnv8datasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/geoconcept/ogrgeoconceptdatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/elastic/ogrelasticdatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer) || EQUAL(pszCap, ODsCDeleteLayer) ||
../ogr/ogrsf_frmts/filegdb/FGdbDatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/mvt/ogrmvtdataset.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/gpsbabel/ogrgpsbabelwritedatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/dgn/ogrdgndatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/sqlite/ogrsqlitedatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer) || EQUAL(pszCap, ODsCDeleteLayer) ||
../ogr/ogrsf_frmts/cad/gdalcaddataset.cpp:    if (EQUAL(pszCap, ODsCCreateLayer) || EQUAL(pszCap, ODsCDeleteLayer))
../ogr/ogrsf_frmts/idb/ogridbdatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/gml/ogrgmldatasource.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/jml/ogrjmldataset.cpp:    if (EQUAL(pszCap, ODsCCreateLayer))
../ogr/ogrsf_frmts/mongodbv3/ogrmongodbv3driver.cpp:    if (EQUAL(pszCap, ODsCCreateLayer) || EQUAL(pszCap, ODsCDeleteLayer) ||

or there should be some other reliable way to access the driver metadata

Unfortunately there's none when you have just the layer object. You need to browser through layer -> dataset -> driver

Maybe it would be possible to at least backport the OGRFlatGeobufLayer::GetDataset implementation from there into 3.8?

done per #9570

@rouault
Copy link
Member

rouault commented Mar 27, 2024

Global pass done in #9572

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants