Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flat object field type #3714

Merged
merged 10 commits into from
May 2, 2023
Merged

Add flat object field type #3714

merged 10 commits into from
May 2, 2023

Conversation

kolchfa-aws
Copy link
Collaborator

Fixes #2657

Checklist

  • [ x] By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
@kolchfa-aws kolchfa-aws added 3 - Tech review PR: Tech review in progress v2.7.0 labels Apr 7, 2023
@kolchfa-aws kolchfa-aws self-assigned this Apr 7, 2023

In OpenSearch, you don't have to specify a mapping before indexing documents. If you don't specify a mapping, OpenSearch uses [dynamic mapping]({{site.url}}{{site.baseurl}}/field-types/mappings#dynamic-mapping) to map every field and its subfields in the document automatically. When you ingest documents such as logs, you may not know every field's subfield name and type in advance. In this case, dynamically mapping all new subfields can quickly lead to a "mapping explosion", where the growing number of fields may degrade the performance of your cluster.

Flat object solves this problem by not indexing the field itself or its subfields. Instead, for any JSON object, the values of all subfields and their paths are stored in two string fields. Subfields within the JSON are accessible using standard dot path notation but are not indexed for fast lookup.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Flat object solves this problem by treating the entire JSON object as string and does not indexing its subfields". The parent field is indexed, but the subfields are not indexable fields.


Flat object solves this problem by not indexing the field itself or its subfields. Instead, for any JSON object, the values of all subfields and their paths are stored in two string fields. Subfields within the JSON are accessible using standard dot path notation but are not indexed for fast lookup.

The maximum field path length in the dot notation is 2<sup>24</sup> &minus; 1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "The maximum field value length "


- Efficient reads: Fetching performance is similar to that of a keyword field.
- Memory efficiency: Storing the entire complex JSON object in one field without indexing all its subfields reduces the number of fields in an index.
- Space efficiency: OpenSearch does not create an inverted index for flat objects, thereby saving space.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenSearch does not create an inverted index for subfields in flat objects

Flat objects support:

- Exact match queries.
- Textual sorting.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not supported for sorting, aggregation and filtering

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mingshl So both textual and numerical sorting are not supported?


Mapping a field as flat object applies when a field and its subfields are mostly read and not used as a search criteria because the subfields are not indexed. Flat objects are useful for objects with a large number of fields or when you don't know the keys in advance.

Flat objects support:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supports querying with dot path and without dot path

Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
@kolchfa-aws
Copy link
Collaborator Author

@macohen @macrakis Please review this PR when you get a chance. Thanks!

@mingshl
Copy link
Contributor

mingshl commented Apr 11, 2023

LGTM


The following example illustrates mapping a field as a flat object, indexing documents with flat object fields, and searching for leaf values of the flat object in those documents.

Only root fields of a document can be defined as a flat objects. You cannot define an object that is part of another JSON object as a flat object because when a flat object it is flattened to a string, the nested architecture of the leaves is lost.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the root field of a document can be defined as a flat object. Better to keep the singular form.


# Flat object field type

In OpenSearch, you don't have to specify a mapping before indexing documents. If you don't specify a mapping, OpenSearch uses [dynamic mapping]({{site.url}}{{site.baseurl}}/field-types/mappings#dynamic-mapping) to map every field and its subfields in the document automatically. When you ingest documents such as logs, you may not know every field's subfield name and type in advance. In this case, dynamically mapping all new subfields can quickly lead to a "mapping explosion", where the growing number of fields may degrade the performance of your cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a blocker for this documentation, but it would be good for us to have more documentation on our site to define "mapping explosion." It is a specific issue where too many distinct fields cause issues on the cluster manager. Does that fit into the current doc site somewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@macohen I'm open to having a separate page if we have more information to put on it. If it's only this one definition, then I think the last sentence in this paragraph conveys the same information.


This functionality is planned for a future release.

## Using flat objects
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to Mingshi's comment, should this be "Using flat object?"

- Flat objects do not support open parameters.
- Painless scripting and wildcard queries are not supported for retrieving values of subfields.

This functionality is planned for a future release.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we ever put GH issue links here to get feedback for these features? i understand if documentation isn't the right place, but it could be a really nice way to solicit feedback if someone is this deep into reading the docs.

Copy link
Collaborator Author

@kolchfa-aws kolchfa-aws Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@macohen We do for experimental features. For non-experimental features, we have the feedback panel where people can leave comments on the page. However, in this case, I think a link to a GitHub issue to solicit feedback is appropriate. I can add it if you give me the link.

Signed-off-by: Fanit Kolchina <[email protected]>
Copy link
Collaborator

@vagimeli vagimeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws Some comments that should be quick to resolve.

_field-types/flat-object.md Outdated Show resolved Hide resolved
_field-types/flat-object.md Outdated Show resolved Hide resolved
The maximum field value length in the dot notation is 2<sup>24</sup> &minus; 1.
{: .note}

Flat objects provide the following advantages:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the preceding paragraphs, "flat object" is used in singular form. Is there a distinction between the usage above and the usage in lines 19-33? For example, in line 14, are we referring to the field type? If so, I suggest writing "The flat object field type..."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. Thanks!

_field-types/flat-object.md Outdated Show resolved Hide resolved
_field-types/flat-object.md Outdated Show resolved Hide resolved
}
```

Alternatively, if you know the subfield name in which to search, provide the field's path in dot notation:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the verbiage "dot path notation" in preceding sections. Should we be consistent here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence already has the word "path" so I think we don't need to specify it again. Let's ask @natebower

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with it as is as long as both "dot path notation" and "dot notation" are commonly understood as referring to the same thing.

_field-types/flat-object.md Show resolved Hide resolved
kolchfa-aws and others added 2 commits April 18, 2023 09:13
Signed-off-by: Fanit Kolchina <[email protected]>
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws Approved with comments and changes.

_field-types/flat-object.md Outdated Show resolved Hide resolved
_field-types/flat-object.md Outdated Show resolved Hide resolved
- Space efficiency: OpenSearch does not create an inverted index for subfields in flat objects, thereby saving space.
- Compatibility for migration: You can migrate your data from systems that support similar flat types to OpenSearch.

Mapping a field as a flat object applies when a field and its subfields are mostly read and not used as a search criteria because the subfields are not indexed. Flat objects are useful for objects with a large number of fields or when you don't know the keys in advance.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either "as search criteria" or "as a search criterion"

_field-types/flat-object.md Outdated Show resolved Hide resolved
_field-types/flat-object.md Outdated Show resolved Hide resolved
_field-types/flat-object.md Outdated Show resolved Hide resolved

This functionality is planned for a future release.

## Using flat object
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't this going to be "Using flat objects"?


## Using flat object

The following example illustrates mapping a field as a flat object, indexing documents with flat object fields, and searching for leaf values of the flat object in those documents.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirm that the last instance of "object" is intentionally singular.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is.

}
```

Alternatively, if you know the subfield name in which to search, provide the field's path in dot notation:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with it as is as long as both "dot path notation" and "dot notation" are commonly understood as referring to the same thing.

_field-types/flat-object.md Outdated Show resolved Hide resolved
kolchfa-aws and others added 3 commits April 18, 2023 14:15
@hdhalter hdhalter added 6 - Done but waiting to merge PR: The work is done and ready to merge and removed 3 - Tech review PR: Tech review in progress labels Apr 24, 2023
Copy link
Contributor

@cwillum cwillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.


Flat objects support exact match queries with and without dot path notation. For a complete list of supported query types, see [Supported queries](#supported-queries).

Searching for a specific value of a nested field in a document may be inefficient because it may require a full scan of the index, which can be an expensive operation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expensive in what sense? Literally, it costs a business money? Or does it use a lot of resources?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is standard computer science speak. In terms of resources :)


The following example illustrates mapping a field as a flat object, indexing documents with flat object fields, and searching for leaf values of the flat object in those documents.

Only the root field of a document can be defined as a flat object. You cannot define an object that is part of another JSON object as a flat object because when a flat object is flattened to a string, the nested architecture of the leaves is lost.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Only the root field of a document can be defined as a flat object. You cannot define an object that is part of another JSON object as a flat object because when a flat object is flattened to a string, the nested architecture of the leaves is lost.
Only the root field of a document can be defined as a flat object. You cannot define an object as a flat object when it is part of another JSON object because when a flat object is flattened to a string, the nested architecture of the leaves is lost.

I'm not familiar with "leaves" in this usage. Just wanted to confirm that's right. I see "leaf value" referred to in other places. Could this be "leaf values" for the plural?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaves or leaf values is the same. Refers to leaves of a tree.


```json
POST /test-index/_search
GET /test-index/_search
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why changing the request command from POST to GET? the HTTP POST command is used for queries that have a request body, such as search queries.

The request body can be large: search queries can be complex. When using the HTTP GET command, the URL has a limited length, and if the request body is too large, it may not fit in the URL. By using the HTTP POST command, the request body can be sent separately in the request payload, and there is no limit to its size.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mingshl Changed as per @macohen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mingshl GET is the standard for searching in OpenSearch. it was surprising to me, too, that you can send a payload like you do with POST in a GET. Take a look at https://opensearch.org/docs/latest/aggregations/metric-agg/ and other query docs. the HTTP spec says we're putting meaning where it shouldn't be, but it doesn't disallow it: "A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request." https://datatracker.ietf.org/doc/html/rfc7231#section-4.3.1

@kolchfa-aws kolchfa-aws added the release-notes PR: Include this PR in the automated release notes label Apr 25, 2023
@kolchfa-aws kolchfa-aws merged commit ddbde18 into main May 2, 2023
vagimeli added a commit that referenced this pull request May 4, 2023
* Add flat object field type

Signed-off-by: Fanit Kolchina <[email protected]>

* Adds more examples and notes

Signed-off-by: Fanit Kolchina <[email protected]>

* Implemented tech review comments

Signed-off-by: Fanit Kolchina <[email protected]>

* Update page order

Signed-off-by: Fanit Kolchina <[email protected]>

* Implemented tech review feedback

Signed-off-by: Fanit Kolchina <[email protected]>

* Apply suggestions from code review

Co-authored-by: Melissa Vagi <[email protected]>

* More doc review comments

Signed-off-by: Fanit Kolchina <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>

* Implemented the last editorial comment

Signed-off-by: Fanit Kolchina <[email protected]>

* Changed POST to GET

Signed-off-by: Fanit Kolchina <[email protected]>

---------

Signed-off-by: Fanit Kolchina <[email protected]>
Co-authored-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
vagimeli added a commit that referenced this pull request May 4, 2023
harshavamsi pushed a commit to harshavamsi/documentation-website that referenced this pull request Oct 31, 2023
* Add flat object field type

Signed-off-by: Fanit Kolchina <[email protected]>

* Adds more examples and notes

Signed-off-by: Fanit Kolchina <[email protected]>

* Implemented tech review comments

Signed-off-by: Fanit Kolchina <[email protected]>

* Update page order

Signed-off-by: Fanit Kolchina <[email protected]>

* Implemented tech review feedback

Signed-off-by: Fanit Kolchina <[email protected]>

* Apply suggestions from code review

Co-authored-by: Melissa Vagi <[email protected]>

* More doc review comments

Signed-off-by: Fanit Kolchina <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>

* Implemented the last editorial comment

Signed-off-by: Fanit Kolchina <[email protected]>

* Changed POST to GET

Signed-off-by: Fanit Kolchina <[email protected]>

---------

Signed-off-by: Fanit Kolchina <[email protected]>
Co-authored-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
@Naarcha-AWS Naarcha-AWS deleted the Fix2657-flat-object branch March 28, 2024 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6 - Done but waiting to merge PR: The work is done and ready to merge release-notes PR: Include this PR in the automated release notes v2.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DOC] Flat Object
7 participants