-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add composite aggregation documentation #7666
base: main
Are you sure you want to change the base?
Changes from all commits
978dcdd
bcf8016
c4144e0
6850148
13d7217
621d0c0
1fdb04d
e8adaf2
5b7b264
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
--- | ||
layout: default | ||
title: Composite | ||
parent: Bucket aggregations | ||
grand_parent: Aggregations | ||
nav_order: 20 | ||
has_children: true | ||
--- | ||
|
||
# Composite | ||
|
||
The `composite` aggregation is a multi-bucket aggregation that creates composite buckets from different sources. It is useful for efficiently paginating multi-level aggregations and retrieving all buckets. Composite buckets are built from combinations of values extracted from documents for each specified source field. | ||
|
||
## Syntax | ||
|
||
```json | ||
{ | ||
"composite": { | ||
"sources": [ | ||
{ | ||
"source_field_1": { | ||
"terms": { | ||
"field": "field_name" | ||
} | ||
} | ||
}, | ||
{ | ||
"source_field_2": { | ||
"terms": { | ||
"field": "another_field_name" | ||
} | ||
} | ||
} | ||
] | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
Property | Description | | ||
---------|------------| | ||
`composite` | The aggregation type. | ||
`sources ` | An array of source objects, where each object defines a source field for the composite buckets. | ||
`terms` | The subaggregation type used to extract the values from the specified field for each source. | ||
`field` | The field name in your documents from which the values will be extracted for the corresponding source. | ||
|
||
For example, consider the following document: | ||
|
||
```json | ||
{ | ||
"product": "T-Shirt", | ||
"category": "Clothing", | ||
"brand": "Acme", | ||
"price": 19.99, | ||
"sizes": ["S", "M", "L"], | ||
"colors": ["red", "blue"] | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
Using `sizes` and `colors` as source fields for the aggregation results in the following composite buckets: | ||
|
||
```json | ||
{ "sizes": "S", "colors": "red" } | ||
{ "sizes": "S", "colors": "blue" } | ||
{ "sizes": "M", "colors": "red" } | ||
{ "sizes": "M", "colors": "blue" } | ||
{ "sizes": "L", "colors": "red" } | ||
{ "sizes": "L", "colors": "blue" } | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Compatibility and limitations | ||
|
||
<SME: What version of OpenSearch is this compatible with? What are the limitations?> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Technical reviewer: Please provide information about compatibility and limitations. |
||
|
||
## Performance considerations | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Technical reviewer: Please provide information about performance considerations, if any. |
||
|
||
<What are the performance implications or best practices for using this aggregation?> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,152 @@ | ||
--- | ||
layout: default | ||
title: Optimizing composite aggregations with early termination | ||
parent: Composite | ||
grand_parent: Bucket aggregations | ||
great_grand_parent: Aggregations | ||
nav_order: 35 | ||
--- | ||
|
||
# Optimizing composite aggregations with early termination | ||
|
||
Composite aggregations can be optimized for better performance by using the early termination feature. Early termination stops processing the aggregation as soon as it has found all the relevant buckets. | ||
|
||
## Setting the index sort | ||
|
||
To enable early termination, you need to set the `sort.field` and `sort.order` settings on your index. These settings define the order in which the documents are sorted in the index, which should match the order of the sources in your composite aggregation. | ||
|
||
The following example request shows how to set the index sort when creating an index, sorting by `username` in ascending order and then by the `timestamp` field in descending order: | ||
|
||
```json | ||
PUT my-index | ||
{ | ||
"settings": { | ||
"index": { | ||
"sort.field": ["username", "timestamp"], | ||
"sort.order": ["asc", "desc"] | ||
} | ||
}, | ||
"mappings": { | ||
"properties": { | ||
"username": { | ||
"type": "keyword", | ||
"doc_values": true | ||
}, | ||
"timestamp": { | ||
"type": "date" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
|
||
## Ordering sources | ||
|
||
For optimal early termination, composite aggregation sources should be ordered to match the index sort, with higher cardinality sources placed first, followed by lower cardinality sources. The field order within the aggregation must align with the index sort order. | ||
|
||
For example, if the index is sorted by `username` (ascending) and then `timestamp` (descending), your composite aggregation should have the same order similar the following query: | ||
|
||
```json | ||
GET /my-index/_search | ||
{ | ||
"size": 0, | ||
"aggs": { | ||
"my_buckets": { | ||
"composite": { | ||
"sources": [ | ||
{ "user_name": { "terms": { "field": "username" } } }, | ||
{ "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d", "order": "desc" } } } | ||
] | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
#### Example response | ||
|
||
```json | ||
{ | ||
"took": 10, | ||
"timed_out": false, | ||
"_shards": { | ||
"total": 1, | ||
"successful": 1, | ||
"skipped": 0, | ||
"failed": 0 | ||
}, | ||
"hits": { | ||
"total": { | ||
"value": 0, | ||
"relation": "eq" | ||
}, | ||
"max_score": null, | ||
"hits": [] | ||
}, | ||
"aggregations": { | ||
"my_buckets": { | ||
"buckets": [] | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Disabling total hit tracking | ||
|
||
To further optimize performance, you can disable the tracking of total hits by setting `track_total_hits` to `false` in your query. This prevents OpenSearch from calculating the total number of matching documents for every page of results. Note that if you need to know the total number of matching documents, you can retrieve it from the first request and skip the calculation for subsequent requests. See the following example query: | ||
|
||
```json | ||
GET /my-index/_search | ||
{ | ||
"size": 0, | ||
"track_total_hits": false, | ||
"aggs": { | ||
"my_buckets": { | ||
"composite": { | ||
"sources": [ | ||
{ "user_name": { "terms": { "field": "username" } } }, | ||
{ "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d", "order": "desc" } } } | ||
] | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
#### Example response | ||
|
||
```json | ||
{ | ||
"took": 13, | ||
"timed_out": false, | ||
"_shards": { | ||
"total": 1, | ||
"successful": 1, | ||
"skipped": 0, | ||
"failed": 0 | ||
}, | ||
"hits": { | ||
"max_score": null, | ||
"hits": [] | ||
}, | ||
"aggregations": { | ||
"my_buckets": { | ||
"buckets": [] | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Additional considerations | ||
|
||
Keep in the following considerations in mind when working with this feature: | ||
|
||
- Multi-valued fields cannot be used for early termination, so it is recommended to place them last in the `sources` array. | ||
- Index sorting can potentially slow down indexing operations, so it is important to test the impact of index sorting on your specific use case and dataset. | ||
- If the index is not sorted, composite aggregations will still attempt early termination if the query matches all documents, for example, a `match_all` query. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,125 @@ | ||
--- | ||
layout: default | ||
title: Handling missing buckets | ||
parent: Composite | ||
grand_parent: Bucket aggregations | ||
great_grand_parent: Aggregations | ||
nav_order: 20 | ||
--- | ||
|
||
## Handling missing buckets | ||
|
||
By default, composite aggregations exclude documents that do not have a value for a particular source. However, you can choose to include these missing values by setting the `missing_bucket` parameter to `true` for the relevant source. | ||
|
||
## Syntax | ||
|
||
The syntax for handling missing values in a composite aggregation requires you to include the `missing_bucket` parameter with a value of `true` within the relevant source definition, as shown in the following example syntax for the `sources` array. | ||
|
||
```json | ||
"sources": [ | ||
{ | ||
"NAME": { | ||
"AGGREGATION": { | ||
"field": "FIELD", | ||
"missing_bucket": true | ||
} | ||
} | ||
} | ||
] | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
--- | ||
|
||
## Example | ||
|
||
For example, the following query groups documents by product name using a `terms` aggregation and includes a bucket for documents that do not have a product name specified: | ||
|
||
```json | ||
GET /sales/_search | ||
{ | ||
"size": 0, | ||
"aggs": { | ||
"sales_by_day_product": { | ||
"composite": { | ||
"sources": [ | ||
{ | ||
"day": { | ||
"date_histogram": { | ||
"field": "timestamp", | ||
"calendar_interval": "1d", | ||
"order": "desc" | ||
} | ||
} | ||
}, | ||
{ | ||
"product": { | ||
"terms": { | ||
"field": "product.keyword", | ||
"order": "asc", | ||
"missing_bucket": true | ||
} | ||
} | ||
} | ||
] | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
#### Example response | ||
|
||
```json | ||
{ | ||
"took": 23, | ||
"timed_out": false, | ||
"_shards": { | ||
"total": 1, | ||
"successful": 1, | ||
"skipped": 0, | ||
"failed": 0 | ||
}, | ||
"hits": { | ||
"total": { | ||
"value": 3, | ||
"relation": "eq" | ||
}, | ||
"max_score": null, | ||
"hits": [] | ||
}, | ||
"aggregations": { | ||
"sales_by_day_product": { | ||
"after_key": { | ||
"day": 1680307200000, | ||
"product": "Product B" | ||
}, | ||
"buckets": [ | ||
{ | ||
"key": { | ||
"day": 1680393600000, | ||
"product": "Product A" | ||
}, | ||
"doc_count": 1 | ||
}, | ||
{ | ||
"key": { | ||
"day": 1680307200000, | ||
"product": "Product A" | ||
}, | ||
"doc_count": 1 | ||
}, | ||
{ | ||
"key": { | ||
"day": 1680307200000, | ||
"product": "Product B" | ||
}, | ||
"doc_count": 1 | ||
} | ||
] | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technical reviewer: Please review this content and confirm the syntax and examples are accurate and relevant to an OpenSearch user. I tested the examples using Dev Tools. If another example is more appropriate, please replace the draft example with your example. Thank you.