Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken Output from ST_AsText #118

Closed
hd-code opened this issue Aug 21, 2023 · 3 comments · Fixed by #124
Closed

Broken Output from ST_AsText #118

hd-code opened this issue Aug 21, 2023 · 3 comments · Fixed by #124

Comments

@hd-code
Copy link

hd-code commented Aug 21, 2023

I have encountered a very strange bug with the ST_AsText function. This did not happen in v0.7.1, but now in 0.8.1 it does, so it was newly introduced!

I was working on a dataset from opendatasoft, which contains all countries with their administrative borders. The data is provided as a GeoJSON containing a FeatureCollection with a Feature per country. The file can be downloaded here:
https://public.opendatasoft.com/api/explore/v2.1/catalog/datasets/world-administrative-boundaries/exports/geojson?lang=en&timezone=Europe%2FBerlin

When I now run this SQL Script:

INSTALL spatial;
LOAD spatial;

WITH countries AS (
    SELECT
        name,
        ST_AsText(ST_GeomFromWKB(wkb_geometry)) as shape,
    FROM ST_Read('countries.json')
)
SELECT
    name,
    ST_GeomFromText(shape),
FROM countries

I get this error:
duckdb.InvalidInputException: Invalid Input Error: ParseException: Expected ')' or ',' but encountered: '('

Bascially, I want to export the geometries per country into WKT. But the output of ST_AsText is broken, so it cannot be parsed again (e.g. with ST_GeomFromText, but it also does not work when using other python libraries).

Then I dug a little deeper. The culprit is actually 'Italy'. All other countries are parsed and converted to WKT correctly, except Italy.

Now, I exported the WKT for Italy only to a txt file. First with duckdb, then with shapely (using geopandas). Here is the code to reproduce:

DuckDB:

INSTALL spatial;
LOAD spatial;

COPY (
    SELECT
        ST_AsText(ST_GeomFromWKB(wkb_geometry)) as geometry,
    FROM ST_Read('countries.json')
    WHERE name = 'Italy'
) TO 'wkt_italy_duckdb.txt' (SEP ';');

Shapely/Geopandas:

import pathlib
import geopandas

geojson_file = pathlib.Path("countries.json").resolve().read_text()
target_file = pathlib.Path("wkt_italy_shapely.txt").resolve()

countries = geopandas.read_file(geojson_file)
shape = countries[countries["name"] == "Italy"]["geometry"].values[0]

target_file.write_text(shape.wkt)

So, both scripts just parse the original GeoJSON, filter for Italy and dump its geometry as WKT into a txt file.

And here is the error:

Duckdb for some reason added a comma , just after the opening brackets ( in the last two polygons. (See picture)
wrong_wkt_italy

The WKT created by shapely does not have these two commas in the output. It only happens with DuckDB and only for Italy and only in the last two polygons of Italy's geometry. Again, this bug occurs now in v0.8.1. When you run this stuff in v0.7.1 everything works as expected.

So, why does this happen and can it be fixed?

@Maxxen
Copy link
Member

Maxxen commented Aug 21, 2023

Hi! Thanks for filing this issue, and including a reproducible example! The only thing that comes to mind is that we changed the WKT number formatting to behave the same as PostGIS, but there might be an edge case we're missing as it looks like the comma is simply a pair of coordinates that don't get printed. I'll take a look into this later.

@hd-code
Copy link
Author

hd-code commented Aug 22, 2023

I just checked the GeoJSON. All the points are there, none is missing. So, duckdb did not drop the points, but instead inserted these two additional commas at the beginning of the polygon. Maybe that helps your investigation.

Thanks for looking into this ^^

@Maxxen
Copy link
Member

Maxxen commented Sep 3, 2023

Alright turns it this is just me being clumsy with where to put the paren when the innermost polygons contain multiple rings. I've implemented a fix for this on my fork so it will get into the next merge before the upcoming release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants