-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce bounding box column definition #191
Changes from 10 commits
0882e85
e74af7d
8790bb4
388f743
7a02a39
bf95d7d
6b3fc80
8bfb5b7
44097bf
e0b3d47
40ebb37
4a3c59c
c540208
f3edcaf
eb6d3e0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -58,6 +58,8 @@ Each geometry column in the dataset MUST be included in the `columns` field abov | |
| edges | string | Name of the coordinate system for the edges. Must be one of `"planar"` or `"spherical"`. The default value is `"planar"`. | | ||
| bbox | \[number] | Bounding Box of the geometries in the file, formatted according to [RFC 7946, section 5](https://tools.ietf.org/html/rfc7946#section-5). | | ||
| epoch | number | Coordinate epoch in case of a dynamic CRS, expressed as a decimal year. | | ||
| covering | object | Object containing bounding box column names to help accelerate spatial data retrieval | | ||
|
||
|
||
#### crs | ||
|
||
|
@@ -134,6 +136,36 @@ For non-geographic coordinate reference systems, the items in the bbox are minim | |
|
||
The bbox values are in the same coordinate reference system as the geometry. | ||
|
||
#### covering | ||
|
||
The covering field specifies optional simplified representations of each geometry. The keys of the "covering" object MUST be a supported encoding. Currently the only supported encoding is "bbox" which specifies the names of [bounding box columns](#bounding-box-columns) | ||
|
||
Example: | ||
``` | ||
"covering": { | ||
"bbox": { | ||
"xmin": ["bbox", "xmin"], | ||
"ymin": ["bbox", "ymin"], | ||
"xmax": ["bbox", "xmax"], | ||
"ymax": ["bbox", "ymax"] | ||
} | ||
} | ||
``` | ||
|
||
##### bbox covering encoding | ||
|
||
Including a per-row bounding box can be useful for accelerating spatial queries by allowing consumers to inspect row group bounding box summary statistics. Furthermore a bounding box may be used to avoid complex spatial operations by first checking for bounding box overlaps. This field captures the column name and fields containing the bounding box of the geometry for every row. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that pages should be also mentioned besides row groups, as bbox column also works for page level indexes. Wrote a longer rant about this in #188 (comment) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just updated to reflect this. Thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @csringhofer Can you take a look at 40ebb37? Pretty small change just to also mention page indexes in addition to row groups. There aren't any other discussions of row groups in the spec so that's the only change. Let me know if you think that works. |
||
|
||
The format of the `bbox` encoding is `{"xmin": ["column_name", "xmin"], "ymin": ["column_name", "ymin"], "xmax": ["column_name", "xmax"], "ymax": ["column_name", "ymax"]}`. The arrays represent Parquet schema paths for nested groups. In this example, `column_name` is a Parquet group with fields `xmin`, `ymin`, `xmax`, `ymax`. The value in `column_name` MUST exist in the Parquet file and meet the criteria in the [Bounding Box Column](#bounding-box-columns) definition. In order to constrain this value to a single bounding group field, the second item in each element MUST be `xmin`, `ymin`, etc. All values MUST use the same column name. | ||
|
||
Note: the value specified in this field should not be confused with the top-level [`bbox`](#bbox) field which contains the single bounding box of this geometry over the whole GeoParquet file. | ||
|
||
### Bounding Box Columns | ||
|
||
A bounding box column MUST be a Parquet group field with 4 child fields named `xmin`, `xmax`, `ymin`, and `ymax` representing the geometry's coordinate range. For three dimensions the additional fields `zmin` and `zmax` MAY be present but are not required. The fields MUST be of Parquet type `FLOAT` or `DOUBLE`. The repetition of a bounding box column MUST match the geometry column's [repetition](#repetition). A row MUST contain a bounding box value if and only if the row contains a geometry value. In cases where the geometry is optional and a row does not contain a geometry value, the row MUST NOT contain a bounding box value. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe it could be added the the coordinates must have the same type, e.g. all FLOAT or all DOUBLE About repetition: while the struct's repetition must be the same as geometry column's, the nested fields' repetation must be "required", right? So they can never be null if their parent is not null. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I concur with that! This is an assumption I've actually made in my GDAL implementation There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Made the update in 40ebb37. Let me know if okay or not. |
||
|
||
The bounding box column MUST be at the root of the schema. The bounding box column MUST NOT be nested in a group. | ||
|
||
### Additional information | ||
|
||
#### Feature identifiers | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first the column name "bbox" was confusing to me as it is the same as the json struct name. Maybe "bbox_col" would be clearer? Afterwards it could be added that if there is a single geometry column, then the recommended bbox column name is simply "bbox".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@csringhofer Are you referring to the example as in:
I'm hesitant to change it because our recommendation really is to call it "bbox". I agree it's a bit confusing. If there's anything to rename it might be the "bbox" under covering. It used to be called just "box" in earlier versions of the PR but now that it's just the bbox columns, I put it back. I'm open to other ideas though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's indeed a bit confusing here in the example, but for the actual spec I would also keep "bbox" both for the recommended column name as the key here in the metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe another example could be added with bbox column for multiple geometry columns.
It is also not clear what is the recommended name in that case - there is an example with "any_column", but using something like "geom_column_name_bbox" seems clearer to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, that's the convention I've used in the GDAL writer