This plugin can be used to index geo_shape objects in elasticsearch, then aggregate and/or script-simplify them.
This is an Ingest
, Search
and Script
plugin.
bin/elasticsearch-plugin install https://github.com/opendatasoft/elasticsearch-plugin-geoshape/releases/download/v7.17.6.1/elasticsearch-plugin-geoshape-7.17.6.1.zip"
Built with Java 17 and gradle 7.3.1 (but you should use the packaged gradlew included in this repo anyway).
A new processor geo_extension
adds custom fields to the desired geo_shape data object at ingest time.
Processor name: geo_extension
.
Name | Required | Default | Description |
---|---|---|---|
field |
yes | - | The geo shape field to use. This parameter accepts wildcard to match multiple geo_shape fields |
path |
no | - | The field that contains the field to expand. When using wildcard in field , matching will be done under this path only |
keep_original_shape |
no | true |
Keep the original unfixed shape in a shape field |
shape_field |
no | shape |
Name of sub shape field |
fix_shape |
no | true |
Fix invalid shape. For the moment it only fixes duplicate consecutive coordinates in polygon (elastic/elasticsearch#14014) |
fixed_field |
no | fixed_shape |
Name of sub fixed_shape field |
wkb |
no | true |
Compute wkb from shape field |
wkb_field |
no | wkb |
name of wkb subfield |
type |
no | true |
Compute geo shape type (Polygon, point, LineString, ...) |
type_field |
no | type |
name of type subfield |
area |
no | true |
Compute area of shape |
area_field |
no | area |
name of area subfield |
bbox |
no | true |
Compute geo_point array containing topLeft and bottomRight points of shape envelope |
bbox_field |
no | bbox |
name of bbox subfield |
centroid |
no | true |
Compute geo_point representing shape centroid |
centroid_field |
no | centroid |
name of centroid subfield |
hash |
no | true |
Compute shape digest to perform exact request on shape (in other words: used as a primary key. we may want to use the wkt in the future?) |
hash_field |
no | hash |
name of hash subfield |
PUT _ingest/pipeline/geo_extension
{
"description": "Add extra geo fields to geo_shape objects.",
"processors": [
{
"geo_extension": {
"field": "geoshape_*"
}
}
]
}
PUT main
{
"mappings": {
"dynamic_templates": [
{
"geoshapes": {
"match": "geoshape_*",
"mapping": {
"properties": {
"geoshape": {"type": "geo_shape"},
"hash": {"type": "keyword"},
"wkb": {"type": "binary", "doc_values": true},
"type": {"type": "keyword"},
"area": {"type": "half_float"},
"bbox": {"type": "geo_point"},
"centroid": {"type": "geo_point"}
}
}
}
}
]
}
}
GET main/_mapping
Result:
{
"main": {
"mappings": {
"_doc": {
"dynamic_templates": [
{
"geoshapes": {
"match": "geoshape_*",
"mapping": {
"properties": {
"geoshape": {
"type": "geo_shape"
},
"hash": {
"type": "keyword"
},
"wkb": {
"type": "binary",
"doc_values": true
},
"type": {
"type": "keyword"
},
"area": {
"type": "half_float"
},
"bbox": {
"type": "geo_point"
},
"centroid": {
"type": "geo_point"
}
}
}
}
}
]
}
}
}
}
Document indexing with shape fixing:
POST main/_doc?pipeline=geo_extension
{
"geoshape_0": {
"type": "Polygon",
"coordinates": [
[
[
1.6809082031249998,
49.05227025601607
],
[
2.021484375,
48.596592251456705
],
[
2.021484375,
48.596592251456705
],
[
3.262939453125,
48.922499263758255
],
[
2.779541015625,
49.196064000723794
],
[
2.0654296875,
49.23194729854559
],
[
1.6809082031249998,
49.05227025601607
]
]
]
}
}
GET main/_search
Result:
"hits": [
{
"_source": {
"geoshape_0": {
"area": 0.594432056845634,
"centroid": {
"lat": 48.95553463671871,
"lon": 2.3829210191713015
},
"bbox": [
{
"lat": 48.596592251456705,
"lon": 1.6809082031249998
},
{
"lat": 49.23194729854559,
"lon": 3.262939453125
}
],
"type": "Polygon",
"geoshape": {
"coordinates": [
[
[
1.6809082031249998,
49.05227025601607
],
[
2.021484375,
48.596592251456705
],
[
3.262939453125,
48.922499263758255
],
[
2.779541015625,
49.196064000723794
],
[
2.0654296875,
49.23194729854559
],
[
1.6809082031249998,
49.05227025601607
]
]
],
"type": "Polygon"
},
"hash": "-5012816342630707936",
"wkb": "AAAAAAMAAAABAAAABkAALAAAAAAAQEhMXSKIhttAChqAAAAAAEBIdhR0tDaAQAY8gAAAAABASJkYoAuEDEAAhgAAAAAAQEidsHL20w4/+uT//////0BIhrDKsBJAQAAsAAAAAABASExdIoiG2w=="
}
}
}
Note that the duplicated point has been deduplicated.
This aggregation creates a bucket for each input shape (based on the hash of its WKB representation) and compute a simplified version of the shape in the bucket.
The simplification part is similar to what is done with the simplify script.
The size
parameter allows you to retain only the biggest (longer) N shapes.
Moreover, compared to regular search results, results of an aggregation can be cached by ElasticSearch.
field
(mandatory): the field used for aggregating. Must be of wkb type. E.g.: "geoshape_0.wkb".output_format
: the output_format in [geojson
,wkt
,wkb
]. Default togeojson
.simplify
:zoom
: the zoom level in range [0, 20]. 0 is the most simplified and 20 is the least. Default to 0.algorithm
: simplify algorithm in [DOUGLAS_PEUCKER
,TOPOLOGY_PRESERVING
]. Default toDOUGLAS_PEUCKER
.
size
: can be set to define how many buckets should be returned. See elasticsearch official terms aggregation documentation for more explanation. Buckets are ordered by the length (perimeter for polygons) of their shape, longer shapes first.shard_size
: can be used to minimize the extra work that comes with bigger requestedsize
. See elasticsearch official terms aggregation documentation for more explanation.
GET main/_search?size=0
{
"aggs": {
"geo_preview": {
"geoshape": {
"field": "geoshape_0.wkb",
"output_format": "wkb",
"simplify": {
"zoom": 8,
"algorithm": "douglas_peucker"
},
"size": 10,
"shard_size": 10
}
}
}
}
Result:
"aggregations": {
"geo_preview": {
"buckets": [
{
"key": "AAAAAAMAAAABAAAABkAALAAAAAAAQEhMXSKIhts/+uT//////0BIhrDKsBJAQACGAAAAAABASJ2wcvbTDkAGPIAAAAAAQEiZGKALhAxAChqAAAAAAEBIdhR0tDaAQAAsAAAAAABASExdIoiG2w==",
"digest": "-5012816342630707936",
"type": "Polygon",
"doc_count": 1
}
]
}
}
Search script for simplifying shapes dynamically.
field
: the field to apply the script to.zoom
: the zoom level in range [0, 20]. 0 is the most simplified and 20 is the least. Default to 0.algorithm
: simplify algorithm in [DOUGLAS_PEUCKER
,TOPOLOGY_PRESERVING
]. Default toDOUGLAS_PEUCKER
.output_format
: the output_format in [geojson
,wkt
,wkb
]. Default togeojson
.
GET main/_search
{
"script_fields": {
"simplified_shape": {
"script": {
"lang": "geo_extension_scripts",
"source": "geo_simplify",
"params": {
"field": "geoshape_0",
"zoom": 8,
"output_format": "wkt"
}
}
}
}
}
Result:
"hits": [
{
"fields": {
"simplified_shape": [
{
"real_type": "Polygon",
"geom": "POLYGON ((2.021484375 48.596592251456705, 1.6809082031249998 49.05227025601607, 2.0654296875 49.23194729854559, 2.779541015625 49.196064000723794, 3.262939453125 48.922499263758255, 2.021484375 48.596592251456705))",
"type": "Polygon"
}
]
}
}
Current supported version is Elasticsearch 7.x (7.17.6). You can find past releases here.
The first 3 digits of the plugin version is the corresponding Elasticsearch version. The last digit is used for plugin versioning.
To install it, launch this command in Elasticsearch directory replacing the url by the correct link for your Elasticsearch version (see table)
bin/elasticsearch-plugin install https://github.com/opendatasoft/elasticsearch-plugin-geoshape/releases/download/v7.17.6.1/elasticsearch-plugin-geoshape-7.17.6.1.zip"
Build the plugin using gradle:
./gradlew build
or
./gradlew assemble # (to avoid the test suite)
Then the following command will start a dockerized ES and will install the previously built plugin:
docker-compose up
Please be careful during development: you'll need to manually rebuild the .zip using ./gradlew build
on each code
change before running docker-compose
up again.
NOTE: In
docker-compose.yml
you can uncomment the debug env and attach a REMOTE JVM on*:5005
to debug the plugin.
This software is under The MIT License (MIT).