Automatically load unmapped fields from _source? #81357

jpountz · 2021-12-06T09:07:31Z

This relates to #80504 where we would like users to think about field values and make Elasticsearch responsible for figuring out the best way to retrieve them, e.g. doc_values when available with a fallback to _source otherwise. One challenge with this vision is text fields (#81246), another one is unmapped fields that this issue focuses on.

Almost all Elasticsearch APIs only consider fields that exist in the mappings. If a field isn't mapped, queries, aggregations, etc. will treat the field the same way as if it didn't exist. This is one of the reasons why scripting users have to worry about doc vs. _source today, unmapped fields are only available in _source.

One way to address this discrepancy would consist of introducing an abstraction layer in our scripting API so that if a field is unmapped, then the script will look up values from _source instead.

Another option would be to address this more broadly by improving mappings so that they would create transient runtime fields whenever a field that is used in a query, aggregation might exist in _source documents but isn't mapped. This would be quite similar to flattened fields where sub fields are not explicitly mapped, and Elasticsearch creates transient field entries at runtime that look and feel like regular keyword fields. This way, unmapped fields that exist in _source could not only be used in scripts, but also in any query or aggregation using the same semantics as fields of the keyword family (e.g. wildcard.)

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-12-06T09:07:35Z

Pinging @elastic/es-search (Team:Search)

jdconrad · 2021-12-06T16:32:19Z

My preference would be the second solution presented here as that creates a better consistency between scripting and existing queries. We would need a way for users to opt-out/opt-in to avoid the trappy performance hit of extracting values from source. Either solution will require a change in the abstraction layer for how scripts access values in fields.

javanna · 2022-07-04T12:16:41Z

We have discussed this and, similarly to the recent discussion at #80504 (comment), we concluded that we would like to focus at first on how scripts access unmapped fields through the scripting fields API. We could lean on the existing runtime field implementations that load from _source. A question that we did not answer yet is around the expected data-type: source access in scripts is currently driven by the json type of the field that gets accessed, hence if we want feature parity with the existing mechanism loading everything as keyword would not be good enough. We will answer these questions once we have implemented loading from _source for all mapped field types.

Another option would be to address this more broadly by improving mappings so that they would create transient runtime fields whenever a field that is used in a query, aggregation might exist in _source documents but isn't mapped.

This reminds me of the dynamic:runtime behaviour. I do wonder if doing this automatically is required given that transient runtime fields can be easily added as part of the search request. Maybe this surfaces the need to express that everything that is not mapped or matching some pattern should load from _source (aka dynamic_templates as part of the search request)? Yet if we could load from_source automatically, why ask users to set it up? We will get back to this discussion at a later time.

jpountz · 2022-07-04T16:39:02Z

Another option would be to address this more broadly by improving mappings so that they would create transient runtime fields whenever a field that is used in a query, aggregation might exist in _source documents but isn't mapped.

Noting that @felixbarny opened #88249 about this earlier today.

jpountz · 2022-07-04T16:43:34Z

I do wonder if doing this automatically is required given that transient runtime fields can be easily added as part of the search request.

I guess one argument in favor of doing it automatically is that users might not always know if the field exists in the mappings or not. For instance if you run _field_caps against two indices, where one has the field, and the other one has it in the _source but dynamic mappings are disabled, _field_caps will pretend that the field exists so the user wouldn't know that they need a request runtime field to be able to see the field in the second index.

javanna · 2022-07-05T09:32:35Z

Another problem I see with adding a transient runtime field as part of the search request is that it is applied globally and will override any indexed field with the same name, hence if some indices do have the field defined, they effectively lose access to it this way.

romseygeek · 2022-07-05T10:19:21Z

it is applied globally and will override any indexed field with the same name

Not necessarily? We could implement it at the MappingLookup level, so that if you ask for a non-existent field we create a runtime field for you. Then this would all happen at the shard level and wouldn't impact indexes that have a concrete field defined.

javanna · 2022-07-05T10:23:02Z

Not necessarily? We could implement it at the MappingLookup level

I guess that you are talking about what we could do, and the possibilities are almost endless :) but I am talking about how things currently work. Currently if you declare a runtime field in the search request it will override any field with same name in any index, right?

felixbarny · 2022-07-05T10:59:55Z

Here's a related discussion:

Using runtime fields as fallback rather than shadowing mapped fields #86536

javanna · 2022-07-05T15:26:34Z

@felixbarny I was also thinking of #86536 , but I think, and please correct me if I am misunderstanding, that there is an important distinction between falling back to _source and falling back to runtime fields like proposed in #86536 : the latter needs a script to compute a value that was not present in previously indexed documents and can't lean on source fallback, while the former involves loading an existing value from _source as-is.

felixbarny · 2022-07-05T16:14:31Z

Yes, you're right.

elasticsearchmachine · 2024-07-16T09:27:10Z

Pinging @elastic/es-search-foundations (Team:Search Foundations)

jpountz added >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types team-discuss labels Dec 6, 2021

elasticmachine added the Team:Search Meta label for search team label Dec 6, 2021

jdconrad mentioned this issue Dec 6, 2021

Add mapped types to scripting fields api #79105

Open

49 tasks

javanna mentioned this issue Jul 5, 2022

Add support for dynamic runtime fields that are not added to the mapping #88249

Closed

felixbarny mentioned this issue Jul 25, 2022

Optionally use ECS conventions for dynamic mappings #85692

Closed

javanna changed the title ~~How to deal with unmapped fields that exist in _source?~~ Automatically load unmapped fields from _source? Aug 3, 2022

javanna removed the team-discuss label Aug 17, 2022

javanna added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically load unmapped fields from _source? #81357

Automatically load unmapped fields from _source? #81357

jpountz commented Dec 6, 2021

elasticmachine commented Dec 6, 2021

jdconrad commented Dec 6, 2021

javanna commented Jul 4, 2022

jpountz commented Jul 4, 2022

jpountz commented Jul 4, 2022

javanna commented Jul 5, 2022

romseygeek commented Jul 5, 2022

javanna commented Jul 5, 2022 •

edited

Loading

felixbarny commented Jul 5, 2022

javanna commented Jul 5, 2022 •

edited

Loading

felixbarny commented Jul 5, 2022

elasticsearchmachine commented Jul 16, 2024

Automatically load unmapped fields from _source? #81357

Automatically load unmapped fields from _source? #81357

Comments

jpountz commented Dec 6, 2021

elasticmachine commented Dec 6, 2021

jdconrad commented Dec 6, 2021

javanna commented Jul 4, 2022

jpountz commented Jul 4, 2022

jpountz commented Jul 4, 2022

javanna commented Jul 5, 2022

romseygeek commented Jul 5, 2022

javanna commented Jul 5, 2022 • edited Loading

felixbarny commented Jul 5, 2022

javanna commented Jul 5, 2022 • edited Loading

felixbarny commented Jul 5, 2022

elasticsearchmachine commented Jul 16, 2024

javanna commented Jul 5, 2022 •

edited

Loading

javanna commented Jul 5, 2022 •

edited

Loading