-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add flat object field type #3714
Changes from all commits
6b0e254
86a1283
22d87e8
dc94b11
3d368da
0dcc1fb
28957b8
4f0d1bf
12556ce
8c9a033
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,218 @@ | ||||||
--- | ||||||
layout: default | ||||||
title: Flat object | ||||||
nav_order: 43 | ||||||
has_children: false | ||||||
parent: Object field types | ||||||
grand_parent: Supported field types | ||||||
--- | ||||||
|
||||||
# Flat object field type | ||||||
|
||||||
In OpenSearch, you don't have to specify a mapping before indexing documents. If you don't specify a mapping, OpenSearch uses [dynamic mapping]({{site.url}}{{site.baseurl}}/field-types/mappings#dynamic-mapping) to map every field and its subfields in the document automatically. When you ingest documents such as logs, you may not know every field's subfield name and type in advance. In this case, dynamically mapping all new subfields can quickly lead to a "mapping explosion," where the growing number of fields may degrade the performance of your cluster. | ||||||
|
||||||
The flat object field type solves this problem by treating the entire JSON object as a string. Subfields within the JSON object are accessible using standard dot path notation, but they are not indexed for fast lookup. | ||||||
|
||||||
The maximum field value length in the dot notation is 2<sup>24</sup> − 1. | ||||||
{: .note} | ||||||
|
||||||
The flat object field type provides the following benefits: | ||||||
|
||||||
- Efficient reads: Fetching performance is similar to that of a keyword field. | ||||||
- Memory efficiency: Storing the entire complex JSON object in one field without indexing all of its subfields reduces the number of fields in an index. | ||||||
- Space efficiency: OpenSearch does not create an inverted index for subfields in flat objects, thereby saving space. | ||||||
- Compatibility for migration: You can migrate your data from systems that support similar flat types to OpenSearch. | ||||||
|
||||||
Mapping a field as a flat object applies when a field and its subfields are mostly read and not used as search criteria because the subfields are not indexed. Flat objects are useful for objects with a large number of fields or when you don't know the keys in advance. | ||||||
|
||||||
Flat objects support exact match queries with and without dot path notation. For a complete list of supported query types, see [Supported queries](#supported-queries). | ||||||
|
||||||
Searching for a specific value of a nested field in a document may be inefficient because it may require a full scan of the index, which can be an expensive operation. | ||||||
{: .note} | ||||||
|
||||||
Flat objects do not support: | ||||||
|
||||||
- Type-specific parsing. | ||||||
- Numerical operations, such as numerical comparison or numerical sorting. | ||||||
- Text analysis. | ||||||
- Highlighting. | ||||||
- Aggregations of subfields using dot notation. | ||||||
- Filtering by subfields. | ||||||
|
||||||
## Supported queries | ||||||
|
||||||
The flat object field type supports the following queries: | ||||||
|
||||||
- [Term]({{site.url}}{{site.baseurl}}/query-dsl/term#term) | ||||||
- [Terms]({{site.url}}{{site.baseurl}}/query-dsl/term#terms) | ||||||
- [Terms set]({{site.url}}{{site.baseurl}}/query-dsl/term#terms-set) | ||||||
- [Prefix]({{site.url}}{{site.baseurl}}/query-dsl/term#prefix) | ||||||
- [Range]({{site.url}}{{site.baseurl}}/query-dsl/term#range) | ||||||
- [Match]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#match) | ||||||
- [Multi-match]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#multi-match) | ||||||
- [Query string]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#query-string) | ||||||
- [Simple query string]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#simple-query-string) | ||||||
- [Exists]({{site.url}}{{site.baseurl}}/query-dsl/term#exists) | ||||||
|
||||||
## Limitations | ||||||
|
||||||
The following limitations apply to flat objects in OpenSearch 2.7: | ||||||
|
||||||
- Flat objects do not support open parameters. | ||||||
- Painless scripting and wildcard queries are not supported for retrieving values of subfields. | ||||||
|
||||||
This functionality is planned for a future release. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we ever put GH issue links here to get feedback for these features? i understand if documentation isn't the right place, but it could be a really nice way to solicit feedback if someone is this deep into reading the docs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @macohen We do for experimental features. For non-experimental features, we have the feedback panel where people can leave comments on the page. However, in this case, I think a link to a GitHub issue to solicit feedback is appropriate. I can add it if you give me the link. |
||||||
|
||||||
## Using flat object | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wasn't this going to be "Using flat objects"? |
||||||
|
||||||
The following example illustrates mapping a field as a flat object, indexing documents with flat object fields, and searching for leaf values of the flat object in those documents. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Confirm that the last instance of "object" is intentionally singular. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is. |
||||||
|
||||||
Only the root field of a document can be defined as a flat object. You cannot define an object that is part of another JSON object as a flat object because when a flat object is flattened to a string, the nested architecture of the leaves is lost. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I'm not familiar with "leaves" in this usage. Just wanted to confirm that's right. I see "leaf value" referred to in other places. Could this be "leaf values" for the plural? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Leaves or leaf values is the same. Refers to leaves of a tree. |
||||||
{: .note} | ||||||
|
||||||
First, create a mapping for your index, where `issue` is of type `flat_object`: | ||||||
|
||||||
```json | ||||||
PUT /test-index/ | ||||||
{ | ||||||
"mappings": { | ||||||
"properties": { | ||||||
"issue": { | ||||||
"type": "flat_object" | ||||||
} | ||||||
} | ||||||
} | ||||||
} | ||||||
``` | ||||||
{% include copy-curl.html %} | ||||||
|
||||||
Next, index two documents with flat object fields: | ||||||
|
||||||
```json | ||||||
PUT /test-index/_doc/1 | ||||||
{ | ||||||
"issue": { | ||||||
"number": "123456", | ||||||
"labels": { | ||||||
"version": "2.1", | ||||||
"backport": [ | ||||||
"2.0", | ||||||
"1.3" | ||||||
], | ||||||
"category": { | ||||||
"type": "API", | ||||||
"level": "enhancement" | ||||||
} | ||||||
} | ||||||
} | ||||||
} | ||||||
``` | ||||||
{% include copy-curl.html %} | ||||||
|
||||||
```json | ||||||
PUT /test-index/_doc/2 | ||||||
{ | ||||||
"issue": { | ||||||
"number": "123457", | ||||||
"labels": { | ||||||
"version": "2.2", | ||||||
"category": { | ||||||
"type": "API", | ||||||
"level": "bug" | ||||||
} | ||||||
} | ||||||
} | ||||||
} | ||||||
``` | ||||||
{% include copy-curl.html %} | ||||||
|
||||||
To search for a leaf value of the flat object, use either a GET or a POST request. Even if you don't know the field names, you can search for a leaf value in the entire flat object. For example, the following request searches for all issues labeled as bugs: | ||||||
|
||||||
```json | ||||||
GET /test-index/_search | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why changing the request command from POST to GET? the HTTP POST command is used for queries that have a request body, such as search queries. The request body can be large: search queries can be complex. When using the HTTP GET command, the URL has a limited length, and if the request body is too large, it may not fit in the URL. By using the HTTP POST command, the request body can be sent separately in the request payload, and there is no limit to its size. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mingshl GET is the standard for searching in OpenSearch. it was surprising to me, too, that you can send a payload like you do with POST in a GET. Take a look at https://opensearch.org/docs/latest/aggregations/metric-agg/ and other query docs. the HTTP spec says we're putting meaning where it shouldn't be, but it doesn't disallow it: "A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request." https://datatracker.ietf.org/doc/html/rfc7231#section-4.3.1 |
||||||
{ | ||||||
"query": { | ||||||
"match": {"issue": "bug"} | ||||||
} | ||||||
} | ||||||
``` | ||||||
|
||||||
Alternatively, if you know the subfield name in which to search, provide the field's path in dot notation: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We use the verbiage "dot path notation" in preceding sections. Should we be consistent here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sentence already has the word "path" so I think we don't need to specify it again. Let's ask @natebower There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm fine with it as is as long as both "dot path notation" and "dot notation" are commonly understood as referring to the same thing. |
||||||
|
||||||
```json | ||||||
GET /test-index/_search | ||||||
{ | ||||||
"query": { | ||||||
"match": {"issue.labels.category.level": "bug"} | ||||||
} | ||||||
} | ||||||
``` | ||||||
{% include copy-curl.html %} | ||||||
|
||||||
In both cases, the response is the same and contains document 2: | ||||||
|
||||||
```json | ||||||
{ | ||||||
"took": 1, | ||||||
"timed_out": false, | ||||||
"_shards": { | ||||||
"total": 1, | ||||||
"successful": 1, | ||||||
"skipped": 0, | ||||||
"failed": 0 | ||||||
}, | ||||||
"hits": { | ||||||
"total": { | ||||||
"value": 1, | ||||||
"relation": "eq" | ||||||
}, | ||||||
"max_score": 1.0303539, | ||||||
"hits": [ | ||||||
{ | ||||||
"_index": "test-index", | ||||||
"_id": "2", | ||||||
"_score": 1.0303539, | ||||||
"_source": { | ||||||
"issue": { | ||||||
"number": "123457", | ||||||
"labels": { | ||||||
"version": "2.2", | ||||||
"category": { | ||||||
"type": "API", | ||||||
"level": "bug" | ||||||
} | ||||||
} | ||||||
} | ||||||
} | ||||||
} | ||||||
] | ||||||
} | ||||||
} | ||||||
``` | ||||||
|
||||||
Using a prefix query, you can search for all issues for the versions that start with `2.`: | ||||||
|
||||||
```json | ||||||
GET /test-index/_search | ||||||
{ | ||||||
"query": { | ||||||
"prefix": {"issue.labels.version": "2."} | ||||||
} | ||||||
} | ||||||
``` | ||||||
|
||||||
With a range query, you can search for all issues for versions 2.0--2.1: | ||||||
natebower marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
```json | ||||||
GET /test-index/_search | ||||||
{ | ||||||
"query": { | ||||||
"range": { | ||||||
"issue": { | ||||||
"gte": "2.0", | ||||||
"lte": "2.1" | ||||||
} | ||||||
} | ||||||
} | ||||||
} | ||||||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expensive in what sense? Literally, it costs a business money? Or does it use a lot of resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is standard computer science speak. In terms of resources :)