-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
geojson: demarcate collections #35
base: main
Are you sure you want to change the base?
geojson: demarcate collections #35
Conversation
Ok(()) | ||
} | ||
|
||
/// Process top-level GeoJSON items (geometry only) | ||
fn process_geojson_geom<P: GeomProcessor>(gj: &GeoGeoJson, processor: &mut P) -> Result<()> { | ||
match *gj { | ||
GeoGeoJson::FeatureCollection(ref collection) => { | ||
processor.geometrycollection_begin(collection.features.len(), 0)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right to me. A GeoJSON feature collection can contain any geometry type, not only GeometryCollections.
@@ -372,7 +365,7 @@ mod test { | |||
let mut wkt_data: Vec<u8> = Vec::new(); | |||
assert!(read_geojson_fc(geojson.as_bytes(), &mut WktWriter::new(&mut wkt_data)).is_ok()); | |||
let wkt = std::str::from_utf8(&wkt_data).unwrap(); | |||
assert_eq!(wkt, "MULTIPOLYGON(((173.020375 -40.919052,173.247234 -41.331999,173.958405 -40.926701,174.247587 -41.349155,174.248517 -41.770008,173.876447 -42.233184,173.22274 -42.970038,172.711246 -43.372288,173.080113 -43.853344,172.308584 -43.865694,171.452925 -44.242519,171.185138 -44.897104,170.616697 -45.908929,169.831422 -46.355775,169.332331 -46.641235,168.411354 -46.619945,167.763745 -46.290197,166.676886 -46.219917,166.509144 -45.852705,167.046424 -45.110941,168.303763 -44.123973,168.949409 -43.935819,169.667815 -43.555326,170.52492 -43.031688,171.12509 -42.512754,171.569714 -41.767424,171.948709 -41.514417,172.097227 -40.956104,172.79858 -40.493962,173.020375 -40.919052)),((174.612009 -36.156397,175.336616 -37.209098,175.357596 -36.526194,175.808887 -36.798942,175.95849 -37.555382,176.763195 -37.881253,177.438813 -37.961248,178.010354 -37.579825,178.517094 -37.695373,178.274731 -38.582813,177.97046 -39.166343,177.206993 -39.145776,176.939981 -39.449736,177.032946 -39.879943,176.885824 -40.065978,176.508017 -40.604808,176.01244 -41.289624,175.239567 -41.688308,175.067898 -41.425895,174.650973 -41.281821,175.22763 -40.459236,174.900157 -39.908933,173.824047 -39.508854,173.852262 -39.146602,174.574802 -38.797683,174.743474 -38.027808,174.697017 -37.381129,174.292028 -36.711092,174.319004 -36.534824,173.840997 -36.121981,173.054171 -35.237125,172.636005 -34.529107,173.007042 -34.450662,173.551298 -35.006183,174.32939 -35.265496,174.612009 -36.156397)))"); | |||
assert_eq!(wkt, "GEOMETRYCOLLECTION(MULTIPOLYGON(((173.020375 -40.919052,173.247234 -41.331999,173.958405 -40.926701,174.247587 -41.349155,174.248517 -41.770008,173.876447 -42.233184,173.22274 -42.970038,172.711246 -43.372288,173.080113 -43.853344,172.308584 -43.865694,171.452925 -44.242519,171.185138 -44.897104,170.616697 -45.908929,169.831422 -46.355775,169.332331 -46.641235,168.411354 -46.619945,167.763745 -46.290197,166.676886 -46.219917,166.509144 -45.852705,167.046424 -45.110941,168.303763 -44.123973,168.949409 -43.935819,169.667815 -43.555326,170.52492 -43.031688,171.12509 -42.512754,171.569714 -41.767424,171.948709 -41.514417,172.097227 -40.956104,172.79858 -40.493962,173.020375 -40.919052)),((174.612009 -36.156397,175.336616 -37.209098,175.357596 -36.526194,175.808887 -36.798942,175.95849 -37.555382,176.763195 -37.881253,177.438813 -37.961248,178.010354 -37.579825,178.517094 -37.695373,178.274731 -38.582813,177.97046 -39.166343,177.206993 -39.145776,176.939981 -39.449736,177.032946 -39.879943,176.885824 -40.065978,176.508017 -40.604808,176.01244 -41.289624,175.239567 -41.688308,175.067898 -41.425895,174.650973 -41.281821,175.22763 -40.459236,174.900157 -39.908933,173.824047 -39.508854,173.852262 -39.146602,174.574802 -38.797683,174.743474 -38.027808,174.697017 -37.381129,174.292028 -36.711092,174.319004 -36.534824,173.840997 -36.121981,173.054171 -35.237125,172.636005 -34.529107,173.007042 -34.450662,173.551298 -35.006183,174.32939 -35.265496,174.612009 -36.156397))))"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect a MULTIPOLYGON
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you expect if the input FeatureCollection had a MultiPolygon and then a LineString?
// Feature Collections may contain multiple geometries, so we need to wrap | ||
// them in a GeometryCollection to ensure we output valid geometry. | ||
fn dataset_begin(&mut self, _name: Option<&str>) -> Result<()> { | ||
self.geometrycollection_begin(0, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here's a confusion between FeatureCollections and GeometryCollections. The latter is a special geometry type for one feature (https://datatracker.ietf.org/doc/html/rfc7946#appendix-A.7).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll respond just here - since I think all three of your comments are essentially about the same issue.
I think here's a confusion between FeatureCollections and GeometryCollections
This was intentional - I am proposing that we explicitly interpret a feature collection as a geometry collection in WKT.
You're right that in geojson GeometryCollection
and FeatureCollection
are different things. But WKT, being concerned only with Geometry, has no concept of Feature
or FeatureCollection
.
So then, if we get a FeatureCollection from geojson and want to convert it to WKT, how should we do that?
Our current approach is to output a series of comma separated WKT geometries. Unfortunately, this output cannot be parsed by postgis, gdal, jts, nor georust::wkt (thus not by geozero itself!).
A corollary exists with the geo-types geometry writer. Similar to WKT, geo-types is strictly Geometry - no Feature/FeatureCollections. And in the same way, a geojson FeatureCollection with multiple features will be output as a top level geo_types::GeometryCollection
.
Do you see a practical downside with the approach I'm proposing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, strange, but if it's fixes WKT, then I can live with a hacky solution. But I would still expect, that my first remark in geojson_reader.rs
would also affect other output formats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that this affects other output formats. My goal is to make sure all the "geometry only" formats — those which don't allow feature data: geo-types, WKT, WKB — behave sanely.
For example, here's something I consider a problem with the current to_wkb()
implementation:
As input, consider this geojson feature collection with two Point features.
let geojson = r#"{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"population": 100
},
"geometry": {
"type": "Point",
"coordinates": [10.0, -20.0]
}
},
{
"type": "Feature",
"properties": {
"population": 100
},
"geometry": {
"type": "Point",
"coordinates": [30.0, -40.0]
}
},
]
} "#;
Then this contrived use case:
let sql = format!("SELECT ST_Centroid({}::geometry)", geojson.to_wkb());
Does that seem like a reasonable use case? What centroid would you expect to be computed?
Currently it's:
$ SELECT ST_AsText(wkb_geometry) FROM( SELECT
ST_Centroid('0101000000000000000000244000000000000034c001010000000000000000003e4000000000000044c0'::geometry)
AS wkb_geometry
) AS f;
st_astext
---------------
POINT(10 -20) <---- 🚨🚨🚨This is not what I'd expect!
Taking a closer look at the value of `geojson.to_wkb():
0101000000000000000000244000000000000034c001010000000000000000003e4000000000000044c0
To break that down, the first part is:
// WKB of POINT(10.0, -20.0)
0101000000000000000000244000000000000034c0
concatenated directly to:
// WKB of POINT(30.0, -40.0)
01010000000000000000003e4000000000000044c0
I don't know if it's valid WKB to simply concatenate multiple WKB's together, but for better or worse, postgis actually seems to parse it:
$ SELECT ST_AsText(wkb_geometry) FROM ( SELECT
'0101000000000000000000244000000000000034c001010000000000000000003e4000000000000044c0'::geometry
AS wkb_geometry
) AS f;
st_astext
---------------
POINT(10 -20)
The problem is that the second point has apparently just been ignored!
With the changes proposed in this PR, just like with the WKT example, the output of converting a FeatureCollection to WKB has changed to be wrapped in a GeometryCollection.
So now geojson.to_wkb()
:
$ SELECT ST_AsText(wkb_geometry) FROM ( SELECT
'0107000000020000000101000000000000000000244000000000000034C001010000000000000000003E4000000000000044C0'::geometry
AS wkb_geometry
) AS f;
st_astext
-------------------------------------------------
GEOMETRYCOLLECTION(POINT(10 -20),POINT(30 -40))
I think this is much more likely to behave how the user expects. The centroid selected from the example will be the centroid of all the geometries from all the features.
$ SELECT ST_AsText(wkb_geometry) FROM ( SELECT
ST_Centroid('0107000000020000000101000000000000000000244000000000000034C001010000000000000000003E4000000000000044C0'::geometry)
AS wkb_geometry
) AS f;
st_astext
---------------
POINT(20 -30) <--- 👍 that's better
Do you agree that the current situation is problematic, or is my use case overly contrived?
Do you think there's a different solution?
FYI I pushed up another commit with some tests that show the new wkb behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this approach generally makes sense to you, I can add some more thorough testing to the other output formats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation and the good example. The main problem is IMO, that we currently do not distinct between multi-feature (dataset) capable formats like GeoJSON and single feature formats like WKB. It's a good moment to find a proper solution for that, since I'm replicating the GeozeroDatasource
and FeatureProcessor
traits based on the new GeomEventProcessor
in the coming days. What you're describing is collecting geometries from multiple features into a single feature GeometryCollection. I also think that's a valid use case, but maybe it should be made explict for the user. My current idea for the event API is that we'll have in addtion to to_wkb()
an extended method to_wkb_with(v: dyn GeomVisitor)
, which will allow to pass a processing chain e.g. including a reprojection. Maybe a GeometryCollector
could also be a part of a GeomVisitor
chain, or it could be a visitor flag, or a special method like collect_to_wkb()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collecting geometries from multiple features into a single feature GeometryCollection. I also think that's a valid use case, but maybe it should be made explict for the user.
It seems like you have some reservations about this - so I want to make sure we're on the same page.
Setting aside implementation details for now, and just focusing on behavior: I think it's a good expectation that all format conversions (e.g. to_wkb()
) should produce valid output by default without needing to pass in any kind of special options. Why would someone want invalid wkb? Do you agree with that part?
Test cases and examples can be helpful. Can you help think of some code that behaves surprisingly with the proposed behavior - I think that'd help me understand your reservations about making this default behavior. And maybe then I could be able to address them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any questions I can answer or changes I can make to move this forward?
(Also, I just now rebased against latest.)
This allows WKT to emit a single top level item And allows for lossless roundtripping of geojson->fgb->geojson
664dde7
to
f07e33e
Compare
I think, my first review comment is still relevant. Forcing the output of any |
What output are you expecting when mapping a feature collection to a geometry-only format like WKT? Here's the current behavior of geo-types:
It seems reasonable to me. geo-types similarly used to behave incorrectly for multiple geometries. It confused people (#11). For geo-types, the fix was: since multiple geometries cannot exist at the top-level they must be wrapped in a GeometryCollection. I'm proposing the same fix here. |
The question is not whether the output format is geometry only, but whether multiple features are supported. If you e.g. use the GDAL CSV driver with WKT geometries (
If you want to collect the geometries of all features, then a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple years has passed... I keep stumbling into this from time to time. It surprised me again today.
I still think we should address this - it's a real problem that geozero-cli outputs invalid WKT and WKB.
If we're outputting geometry (as opposed to features), we need some kind of container to match the fact that there could be potentially multiple features represented in the output geometry.
@@ -372,7 +365,7 @@ mod test { | |||
let mut wkt_data: Vec<u8> = Vec::new(); | |||
assert!(read_geojson_fc(geojson.as_bytes(), &mut WktWriter::new(&mut wkt_data)).is_ok()); | |||
let wkt = std::str::from_utf8(&wkt_data).unwrap(); | |||
assert_eq!(wkt, "MULTIPOLYGON(((173.020375 -40.919052,173.247234 -41.331999,173.958405 -40.926701,174.247587 -41.349155,174.248517 -41.770008,173.876447 -42.233184,173.22274 -42.970038,172.711246 -43.372288,173.080113 -43.853344,172.308584 -43.865694,171.452925 -44.242519,171.185138 -44.897104,170.616697 -45.908929,169.831422 -46.355775,169.332331 -46.641235,168.411354 -46.619945,167.763745 -46.290197,166.676886 -46.219917,166.509144 -45.852705,167.046424 -45.110941,168.303763 -44.123973,168.949409 -43.935819,169.667815 -43.555326,170.52492 -43.031688,171.12509 -42.512754,171.569714 -41.767424,171.948709 -41.514417,172.097227 -40.956104,172.79858 -40.493962,173.020375 -40.919052)),((174.612009 -36.156397,175.336616 -37.209098,175.357596 -36.526194,175.808887 -36.798942,175.95849 -37.555382,176.763195 -37.881253,177.438813 -37.961248,178.010354 -37.579825,178.517094 -37.695373,178.274731 -38.582813,177.97046 -39.166343,177.206993 -39.145776,176.939981 -39.449736,177.032946 -39.879943,176.885824 -40.065978,176.508017 -40.604808,176.01244 -41.289624,175.239567 -41.688308,175.067898 -41.425895,174.650973 -41.281821,175.22763 -40.459236,174.900157 -39.908933,173.824047 -39.508854,173.852262 -39.146602,174.574802 -38.797683,174.743474 -38.027808,174.697017 -37.381129,174.292028 -36.711092,174.319004 -36.534824,173.840997 -36.121981,173.054171 -35.237125,172.636005 -34.529107,173.007042 -34.450662,173.551298 -35.006183,174.32939 -35.265496,174.612009 -36.156397)))"); | |||
assert_eq!(wkt, "GEOMETRYCOLLECTION(MULTIPOLYGON(((173.020375 -40.919052,173.247234 -41.331999,173.958405 -40.926701,174.247587 -41.349155,174.248517 -41.770008,173.876447 -42.233184,173.22274 -42.970038,172.711246 -43.372288,173.080113 -43.853344,172.308584 -43.865694,171.452925 -44.242519,171.185138 -44.897104,170.616697 -45.908929,169.831422 -46.355775,169.332331 -46.641235,168.411354 -46.619945,167.763745 -46.290197,166.676886 -46.219917,166.509144 -45.852705,167.046424 -45.110941,168.303763 -44.123973,168.949409 -43.935819,169.667815 -43.555326,170.52492 -43.031688,171.12509 -42.512754,171.569714 -41.767424,171.948709 -41.514417,172.097227 -40.956104,172.79858 -40.493962,173.020375 -40.919052)),((174.612009 -36.156397,175.336616 -37.209098,175.357596 -36.526194,175.808887 -36.798942,175.95849 -37.555382,176.763195 -37.881253,177.438813 -37.961248,178.010354 -37.579825,178.517094 -37.695373,178.274731 -38.582813,177.97046 -39.166343,177.206993 -39.145776,176.939981 -39.449736,177.032946 -39.879943,176.885824 -40.065978,176.508017 -40.604808,176.01244 -41.289624,175.239567 -41.688308,175.067898 -41.425895,174.650973 -41.281821,175.22763 -40.459236,174.900157 -39.908933,173.824047 -39.508854,173.852262 -39.146602,174.574802 -38.797683,174.743474 -38.027808,174.697017 -37.381129,174.292028 -36.711092,174.319004 -36.534824,173.840997 -36.121981,173.054171 -35.237125,172.636005 -34.529107,173.007042 -34.450662,173.551298 -35.006183,174.32939 -35.265496,174.612009 -36.156397))))"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you expect if the input FeatureCollection had a MultiPolygon and then a LineString?
Does the comparison to ogr2ogr make sense here? I'm not talking about compressing multiple features into a single feature in the output. I'm talking about having valid geometry output when converting to format that doesn't support features at all. |
I'm not sure if this is controversial - and it's definitely a bit nuanced. Let me know what you think!
The goal:
Allow WKT to emit a single top level item - multiple top level WKT items like
POINT(1 1), POINT(2 2)
are not parseable by e.g. postgis or georust/wkt (we removed support in georust/wkt#72)And this change also allows for lossless roundtripping of geojson->fgb->geojson, see https://github.com/flatgeobuf/flatgeobuf/compare/master...michaelkirk:mkirk/rust-lossless-geojson-collections?expand=1