-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved WKB geometry parsing #16
Comments
This is partially solved in georust/geozero#39, where it removes one copy by parsing WKB from the arrow array reference directly instead of a |
I noticed that when I read a geojson file into geopolars that the shapely geometry is missing. it just says [binary data]. Is this the same issue? If so, I think I've built a successful way to parse the geometry (see image below). I might need some help with creating a pull request. I'm not a pro with git. |
There's been a lot of progress in https://github.com/geoarrow/geoarrow-rs that hasn't quite been integrated into here, including a new WKB parser. So I have most of the groundwork done, but haven't had time yet to integrate it into geopolars |
Gald to hear there's a solution on the way! I know you're planning to implement the geoarrow parser that you've put together, but I was wondering if you could provide some feedback on this solution as well. I'm assuming the main reason you are making the change is because pyogrio is maintained by geopandas and you want to remove that dependency. from __future__ import annotations
from typing import TYPE_CHECKING, cast
import polars as pl
from polars import DataFrame
from geopolars.geodataframe import GeoDataFrame
if TYPE_CHECKING:
from pathlib import Path
def read_file(
path_or_buffer: Path | str | bytes,
/,
layer: int | str | None = None,
engine='arrow',
encoding: str | None = None,
columns=None,
read_geometry: bool = True,
force_2d: bool = False,
skip_features: int = 0,
max_features: int | None = None,
where: str | None = None,
bbox: tuple[float, float, float, float] | None = None,
fids=None,
sql=None,
sql_dialect=None,
return_fids=False,
**kwargs
) -> DataFrame | GeoDataFrame:
if engine == 'arrow':
from pyogrio.raw import read_arrow as _read_arrow
from shapely.wkb import loads
import pyarrow as pa
metadata, table = _read_arrow(
path_or_buffer,
layer=layer,
encoding=encoding,
columns=columns,
read_geometry=read_geometry,
force_2d=force_2d,
skip_features=skip_features,
max_features=max_features,
where=where,
bbox=bbox,
fids=fids,
sql=sql,
sql_dialect=sql_dialect,
return_fids=return_fids,
)
# # TODO: check for metadata['geometry_type'] not Unknown
geom=[]
for i in table['wkb_geometry']:
wkb_geometry_binary = pa.array([i])
wkb_hex = wkb_geometry_binary[0].as_py().hex()
# Convert WKB to Shapely geometry
shapely_geometry = loads(bytes.fromhex(wkb_hex))
geom.append(shapely_geometry)
# print(shapely_geometry)
df = pl.from_arrow(table)
df = df.replace('wkb_geometry',pl.Series(geom))
return GeoDataFrame(df) |
That's not exactly right
There will most likely be an initial release in January of the python bindings to geoarrow-rs, and then I can focus on geopolars |
It seems that the geozero API only supports
Vec<u8>
input. It would be ideal to find a way that geozero can read an Arrow u8 array without first copying into aVec<u8>
The text was updated successfully, but these errors were encountered: