-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update centroid calculation of geo_shape for geo-centroid #49887
Comments
Pinging @elastic/es-analytics-geo (:Analytics/Geo) |
Here is a proposed WIP implementation of above: talevy@a713f5c Updating the CentroidCalculator seemed pretty straightforward. The one extra requirement that has yet to be tackled is how to determine which types of shapes exist in the geometry. There are two ideas I have there: one requires yet another iteration of the geometry, another requires more space/variables for each dimension. |
We discussed this in our analytics/geo team meeting and concluded that:
|
This commit serializes the ShapeType of the indexed geometry. The ShapeType can be useful for other future features. For one thing: elastic#49887 depends on the ability to determine what the highest dimensional shape is for centroid calculations. GeometryCollection is reduced to the sub-shape of the higest dimension relates elastic#37206.
Information about a shape's dimension for centroid calculation is introduced in #50104. |
+1 this is also more consistent with the current behavior of the centroid aggregation on points in the case of a multi-valued field |
excuse me. I forgot to mention that there is more than just the shape's type/dimension that needs to be serialized: the sum of the weight (can be negative) |
This commit serializes the ShapeType of the indexed geometry. The ShapeType can be useful for other future features. For one thing: #49887 depends on the ability to determine what the highest dimensional shape is for centroid calculations. GeometryCollection is reduced to the sub-shape of the highest dimension relates #37206.
This commit serializes the ShapeType of the indexed geometry. The ShapeType can be useful for other future features. For one thing: #49887 depends on the ability to determine what the highest dimensional shape is for centroid calculations. GeometryCollection is reduced to the sub-shape of the highest dimension relates #37206.
This PR implements proper centroid calculations of geometries according to the definition defined in #49887. To compute things correctly, an additional variable encoded long representing the total weight for the centroid of the geometry in a tree. This weight is always positive. Some tests are fixed, as they did not have valid geometries. closes #49887.
this was implemented in #50297. closing |
This PR implements proper centroid calculations of geometries according to the definition defined in #49887. To compute things correctly, an additional variable encoded long representing the total weight for the centroid of the geometry in a tree. This weight is always positive. Some tests are fixed, as they did not have valid geometries. closes #49887.
where things are now
As the
geoshape-doc-values
branch stands today, the centroid isdefined as the centroid of all the geometry's vertices.
Why the update?
After a re-visit of this strategy, we decided that it would be worth
modifying this calculation to be closer in line with how popular
geo-spatial data-stores compute the centroid.
After exploring existing implementations for some of these, it seems there are a few commonalities. I'll list in the detailed definition below.
Proposed detailed definition of a centroid of a geo_shape field
Proposed definition of a geo-centroid of a bucket
The un-weighted centroid of the centroid of each shape within the bucket.
Below is a summary of the descriptions from various sources:
The OGC standard does not specify any specific way one
should calculate the centroid.
The Oracle definition does not specify how one should implement and define
weight
.It interestingly also does not describe the behavior for line-strings.
Similarly, this description does not go into detail about how either the "center of mass" or the "weighted length" is calculated.
Discussion Item
For example, there can be multiple interpretations of how
the centroid as an aggregate function should be calculated.
These specs define ST_Centroid as a simple function on one
field value. Geo Centroid aggregation is the centroid of a field
in multiple documents. Should all these field values be treated as
a singular GeometryCollection which the centroid is derived from,
or should we treat each document as a singular point (computed using above)
with equal weight.
The text was updated successfully, but these errors were encountered: