-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds GeoHex grid documentation #1761
Changes from 4 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,377 @@ | ||||||
--- | ||||||
layout: default | ||||||
title: GeoHex grid aggregations | ||||||
parent: Aggregations | ||||||
nav_order: 4 | ||||||
has_children: false | ||||||
--- | ||||||
|
||||||
# GeoHex grid aggregations | ||||||
|
||||||
The Hexagonal Hierarchical Geospatial Indexing System (H3) partitions the Earth's areas into identifiable hexagon-shaped cells. | ||||||
|
||||||
The H3 grid system works well for proximity applications, because it overcomes the shortcomings of geohash's non-uniform partitions. Geohash encodes latitude and longitude pairs, leading to significantly smaller partitions near the poles and a degree of longitude near the equator. However, the H3 grid system's distortions are low and limited to five partitions of 122. These five partitions are placed in low-use areas (for example, in the middle of the ocean), leaving the essential areas error-free. Thus, grouping documents based on the H3 grid system provides a better aggregation than the geohash grid. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would try to avoid "overcomes the shortcomings" (replace one of them with a synonym). Should "geohash" be capitalized in the two instances where it is lowercase? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Replaced with "overcomes the limitations". Geohash is lowercase in general. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just checked, and Geohash is capitalized. Will change. |
||||||
|
||||||
The GeoHex grid aggregation groups [geopoints]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point/) into grid cells for geographical analysis. Each grid cell corresponds to an [H3 cell](https://h3geo.org/docs/core-library/h3Indexing/#h3-cell-indexp) and is identified using the [H3Index representation](https://h3geo.org/docs/core-library/h3Indexing/#h3index-representation). | ||||||
|
||||||
## Precision | ||||||
|
||||||
The `precision` parameter controls the level of granularity that determines the grid cell size. The lower the precision, the larger the grid cells. | ||||||
|
||||||
The following example illustrates low-precision and high-precision aggregation requests. | ||||||
|
||||||
To start, create an index and map the `location` field as a `geo_point`: | ||||||
|
||||||
```json | ||||||
PUT national_parks | ||||||
{ | ||||||
"mappings": { | ||||||
"properties": { | ||||||
"location": { | ||||||
"type": "geo_point" | ||||||
} | ||||||
} | ||||||
} | ||||||
} | ||||||
``` | ||||||
|
||||||
Index the following documents into the sample index: | ||||||
|
||||||
```json | ||||||
PUT national_parks/_doc/1 | ||||||
{ | ||||||
"name": "Yellowstone National Park", | ||||||
"location": "44.42, -110.59" | ||||||
} | ||||||
|
||||||
PUT national_parks/_doc/2 | ||||||
{ | ||||||
"name": "Yosemite National Park", | ||||||
"location": "37.87, -119.53" | ||||||
} | ||||||
|
||||||
PUT national_parks/_doc/3 | ||||||
{ | ||||||
"name": "Death Valley National Park", | ||||||
"location": "36.53, -116.93" | ||||||
} | ||||||
``` | ||||||
|
||||||
You can index geopoints in several formats. For a list of all supported formats, see the [geopoint documentation]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point#formats). | ||||||
{: .note} | ||||||
|
||||||
## Low-precision requests | ||||||
|
||||||
Run a low-precision request that should bucket all three documents together: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this read "..that buckets all..." There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is what's expected to happen after you run this request. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that "should" reads strangely here and would go with @vagimeli suggestion, if accurate. |
||||||
|
||||||
```json | ||||||
GET national_parks/_search | ||||||
{ | ||||||
"aggregations": { | ||||||
"grouped": { | ||||||
"geohex_grid": { | ||||||
"field": "location", | ||||||
"precision": 1 | ||||||
} | ||||||
} | ||||||
} | ||||||
} | ||||||
``` | ||||||
|
||||||
You can use either `GET` or `POST` HTTP method for GeoHex grid aggregation queries. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this read "You can use either the |
||||||
{: .note} | ||||||
|
||||||
The response groups documents 2 and 3 together because they are close enough to be bucketed in one grid cell: | ||||||
|
||||||
```json | ||||||
{ | ||||||
"took" : 4, | ||||||
"timed_out" : false, | ||||||
"_shards" : { | ||||||
"total" : 1, | ||||||
"successful" : 1, | ||||||
"skipped" : 0, | ||||||
"failed" : 0 | ||||||
}, | ||||||
"hits" : { | ||||||
"total" : { | ||||||
"value" : 3, | ||||||
"relation" : "eq" | ||||||
}, | ||||||
"max_score" : 1.0, | ||||||
"hits" : [ | ||||||
{ | ||||||
"_index" : "national_parks", | ||||||
"_id" : "1", | ||||||
"_score" : 1.0, | ||||||
"_source" : { | ||||||
"name" : "Yellowstone National Park", | ||||||
"location" : "44.42, -110.59" | ||||||
} | ||||||
}, | ||||||
{ | ||||||
"_index" : "national_parks", | ||||||
"_id" : "2", | ||||||
"_score" : 1.0, | ||||||
"_source" : { | ||||||
"name" : "Yosemite National Park", | ||||||
"location" : "37.87, -119.53" | ||||||
} | ||||||
}, | ||||||
{ | ||||||
"_index" : "national_parks", | ||||||
"_id" : "3", | ||||||
"_score" : 1.0, | ||||||
"_source" : { | ||||||
"name" : "Death Valley National Park", | ||||||
"location" : "36.53, -116.93" | ||||||
} | ||||||
} | ||||||
] | ||||||
}, | ||||||
"aggregations" : { | ||||||
"grouped" : { | ||||||
"buckets" : [ | ||||||
{ | ||||||
"key" : "8129bffffffffff", | ||||||
"doc_count" : 2 | ||||||
}, | ||||||
{ | ||||||
"key" : "8128bffffffffff", | ||||||
"doc_count" : 1 | ||||||
} | ||||||
] | ||||||
} | ||||||
} | ||||||
} | ||||||
``` | ||||||
|
||||||
## High-precision requests | ||||||
|
||||||
Now run a higher precision request: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
```json | ||||||
GET national_parks/_search | ||||||
{ | ||||||
"aggregations": { | ||||||
"grouped": { | ||||||
"geohex_grid": { | ||||||
"field": "location", | ||||||
"precision": 6 | ||||||
} | ||||||
} | ||||||
} | ||||||
} | ||||||
``` | ||||||
|
||||||
All three documents are bucketed separately because of higher granularity: | ||||||
|
||||||
```json | ||||||
{ | ||||||
"took" : 5, | ||||||
"timed_out" : false, | ||||||
"_shards" : { | ||||||
"total" : 1, | ||||||
"successful" : 1, | ||||||
"skipped" : 0, | ||||||
"failed" : 0 | ||||||
}, | ||||||
"hits" : { | ||||||
"total" : { | ||||||
"value" : 3, | ||||||
"relation" : "eq" | ||||||
}, | ||||||
"max_score" : 1.0, | ||||||
"hits" : [ | ||||||
{ | ||||||
"_index" : "national_parks", | ||||||
"_id" : "1", | ||||||
"_score" : 1.0, | ||||||
"_source" : { | ||||||
"name" : "Yellowstone National Park", | ||||||
"location" : "44.42, -110.59" | ||||||
} | ||||||
}, | ||||||
{ | ||||||
"_index" : "national_parks", | ||||||
"_id" : "2", | ||||||
"_score" : 1.0, | ||||||
"_source" : { | ||||||
"name" : "Yosemite National Park", | ||||||
"location" : "37.87, -119.53" | ||||||
} | ||||||
}, | ||||||
{ | ||||||
"_index" : "national_parks", | ||||||
"_id" : "3", | ||||||
"_score" : 1.0, | ||||||
"_source" : { | ||||||
"name" : "Death Valley National Park", | ||||||
"location" : "36.53, -116.93" | ||||||
} | ||||||
} | ||||||
] | ||||||
}, | ||||||
"aggregations" : { | ||||||
"grouped" : { | ||||||
"buckets" : [ | ||||||
{ | ||||||
"key" : "8629ab6dfffffff", | ||||||
"doc_count" : 1 | ||||||
}, | ||||||
{ | ||||||
"key" : "8629857a7ffffff", | ||||||
"doc_count" : 1 | ||||||
}, | ||||||
{ | ||||||
"key" : "862896017ffffff", | ||||||
"doc_count" : 1 | ||||||
} | ||||||
] | ||||||
} | ||||||
} | ||||||
} | ||||||
``` | ||||||
|
||||||
## Filtering requests | ||||||
|
||||||
High-precision requests are resource-intensive, so we recommend using a filter like `geo_bounding_box` to limit the geographical area. For example, the following query applies a filter to limit the search area: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Compound modifiers only need to be hyphenated when they precede the noun they modify. So it's a resource-intensive request, but the request is resource intensive. |
||||||
|
||||||
```json | ||||||
GET national_parks/_search | ||||||
{ | ||||||
"size" : 0, | ||||||
"aggregations": { | ||||||
"filtered": { | ||||||
"filter": { | ||||||
"geo_bounding_box": { | ||||||
"location": { | ||||||
"top_left": "38, -120", | ||||||
"bottom_right": "36, -116" | ||||||
} | ||||||
} | ||||||
}, | ||||||
"aggregations": { | ||||||
"grouped": { | ||||||
"geohex_grid": { | ||||||
"field": "location", | ||||||
"precision": 6 | ||||||
} | ||||||
} | ||||||
} | ||||||
} | ||||||
} | ||||||
} | ||||||
``` | ||||||
|
||||||
The response contains the two documents that are within the `geo_bounding_box` bounds: | ||||||
|
||||||
```json | ||||||
{ | ||||||
"took" : 4, | ||||||
"timed_out" : false, | ||||||
"_shards" : { | ||||||
"total" : 1, | ||||||
"successful" : 1, | ||||||
"skipped" : 0, | ||||||
"failed" : 0 | ||||||
}, | ||||||
"hits" : { | ||||||
"total" : { | ||||||
"value" : 3, | ||||||
"relation" : "eq" | ||||||
}, | ||||||
"max_score" : null, | ||||||
"hits" : [ ] | ||||||
}, | ||||||
"aggregations" : { | ||||||
"filtered" : { | ||||||
"doc_count" : 2, | ||||||
"grouped" : { | ||||||
"buckets" : [ | ||||||
{ | ||||||
"key" : "8629ab6dfffffff", | ||||||
"doc_count" : 1 | ||||||
}, | ||||||
{ | ||||||
"key" : "8629857a7ffffff", | ||||||
"doc_count" : 1 | ||||||
} | ||||||
] | ||||||
} | ||||||
} | ||||||
} | ||||||
} | ||||||
``` | ||||||
|
||||||
You can also restrict the geographical area by providing the coordinates of the bounding envelope in the `bounds` parameter. Both `bounds` and `geo_bounding_box` coordinates can be specified in any of the [geopoint formats]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point#formats). The following query uses the WKT "POINT(`longitude` `latitude`)" format for the `bounds` parameter: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
```json | ||||||
GET national_parks/_search | ||||||
{ | ||||||
"size": 0, | ||||||
"aggregations": { | ||||||
"grouped": { | ||||||
"geohex_grid": { | ||||||
"field": "location", | ||||||
"precision": 6, | ||||||
"bounds": { | ||||||
"top_left": "POINT (-120 38)", | ||||||
"bottom_right": "POINT (-116 36)" | ||||||
} | ||||||
} | ||||||
} | ||||||
} | ||||||
} | ||||||
``` | ||||||
|
||||||
The response contains only the two results that are within the specified bounds: | ||||||
|
||||||
```json | ||||||
{ | ||||||
"took" : 3, | ||||||
"timed_out" : false, | ||||||
"_shards" : { | ||||||
"total" : 1, | ||||||
"successful" : 1, | ||||||
"skipped" : 0, | ||||||
"failed" : 0 | ||||||
}, | ||||||
"hits" : { | ||||||
"total" : { | ||||||
"value" : 3, | ||||||
"relation" : "eq" | ||||||
}, | ||||||
"max_score" : null, | ||||||
"hits" : [ ] | ||||||
}, | ||||||
"aggregations" : { | ||||||
"grouped" : { | ||||||
"buckets" : [ | ||||||
{ | ||||||
"key" : "8629ab6dfffffff", | ||||||
"doc_count" : 1 | ||||||
}, | ||||||
{ | ||||||
"key" : "8629857a7ffffff", | ||||||
"doc_count" : 1 | ||||||
} | ||||||
] | ||||||
} | ||||||
} | ||||||
} | ||||||
``` | ||||||
|
||||||
The `bounds` parameter can be used with or without the `geo_bounding_box` filter; these two parameters are independent and can have any spatial relationship to each other. | ||||||
|
||||||
## Supported parameters | ||||||
|
||||||
GeoHex grid aggregation requests support the following parameters. | ||||||
|
||||||
Parameter | Data Type | Description | ||||||
:--- | :--- | :--- | ||||||
field | String | The field that contains the geopoints. This field must be mapped as a `geo_point` field. If the field contains an array, all array values are aggregated. Required. | ||||||
precision | Integer | The zoom level used to determine grid cells for bucketing results. Valid values are in the [0, 15] range. Optional. Default is 5. | ||||||
bounds | Object | The bounding box for filtering geopoints. The bounding box is defined by the top left and bottom right vertices. The vertices are specified as geopoints in one of the following formats: <br>- An object with a latitude and longitude<br>- An array in the [`longitude`, `latitude`] format<br>- A string in the "`latitude`,`longitude`" format<br>- A geohash <br>- Well-known text (WKT).<br> See the [geopoint formats]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point#formats) for format examples. Optional. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should "geohash" be capitalized? "for formatting examples"? |
||||||
size | Integer | The maximum number of buckets to return. When there are more buckets than `size`, OpenSearch returns buckets with more documents. Optional. Default is 10,000. | ||||||
shard_size | Integer | The maximum number of buckets to return from each shard. Optional. Default is max (10, `size` · number of shards), which gives a more accurate count of higher-prioritized buckets. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "more highly prioritized buckets"? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.