Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mappings should fall back to _source when doc values are disabled #80504

Open
jpountz opened this issue Nov 8, 2021 · 7 comments
Open

Mappings should fall back to _source when doc values are disabled #80504

jpountz opened this issue Nov 8, 2021 · 7 comments
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@jpountz
Copy link
Contributor

jpountz commented Nov 8, 2021

People working on scripting have been looking into making scripts fall back automatically to _source when doc values are not enabled, in order to make scripts easier to use. @jdconrad and I discussed it recently, and it would be better to do it at the mapping level. This way, whether doc values or _source is used for scripts would be completely transparent to script engines, and we could also support sorting or aggregations on fields that have doc values disabled. We need to have the ability to read from _source directly for runtime fields anyway, so hopefully we could reuse code?

One question is whether this could be a performance trap at times. Maybe we should look into ways to tell users that their queries could run faster with better mappings?

@jpountz jpountz added >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types labels Nov 8, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Nov 8, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@jdconrad
Copy link
Contributor

This is an important component of the scripting (Painless) fields API that we would like to get prioritized. See (#79105).

@jpountz
Copy link
Contributor Author

jpountz commented Nov 30, 2021

cc @giladgal

@jdconrad
Copy link
Contributor

I discussed this further with @jpountz today, and he suggested that we leverage the existing fielddataBuilder in MappedFieldType to do the fallback in the case when doc values are not enabled. He also suggested that fallback use the existing runtime fields infrastructure to generate the appropriate doc values from source.

I believe this is a good path forward but does require some additional plumbing work to get the source paths for a specific field into scripting and other places that would want to leverage source fallback.

@jpountz also suggested for text fields as noted in (#81246) we could possibly follow up with another fielddataBuilder method that would default to source for specific field types or allow further customization of the values returned even if doc values were to exist. The scripting fields API could then use this second method instead of the first one.

@javanna
Copy link
Member

javanna commented Jun 20, 2022

Heya, I think I missed this discussion back when this issue was opened, catching up now. I am a bit surprised by the direction taken and I think this deserves a high-bandwidth discussion to get alignment on, which is why I am marking it team-discuss.

@javanna
Copy link
Member

javanna commented Jun 24, 2022

We have re-discussed this with the team and decided to decouple the scripting needs (load doc_values or _source transparently, depending on what's available) from the idea of falling back to _source when doc_values are not available for sorting and aggregations.

We briefly discussed whether the script fields API should leverage field fetchers internally or the fielddata builder abstraction. It was mentioned that it's very important that where you load from is transparent, meaning that _source should look and feel like doc_values. This is exactly what runtime fields do today, and we have the chance to reuse the existing fielddata implementations for runtime fields. With this said we decided to introduce a new fielddataBuilder method variant to MappedFieldType that is specific for scripting for now, which one day could become the new behaviour for loading field values also for sorting/aggregations (the existing mappedFieldType#fielddataBuilder method). This is rather easy today for the field types that are supported as runtime fields. We need to figure out the best way to automatically load from _source for all the other field types. With the focus on scripting, we can make incremental steps forward. It feels good also for consistency's sake not to introduce automatically loading from _source for aggs/sorting for a selected set of the field types only. If we make the switch one day we will make it across the board.

Few considerations that were discussed around falling back to _source for sorting/aggregations: there are some concerns that the automatic fallback is too subtle and users are not aware of the performance implications. Today, defining a runtime field to load from _source is easy enough, can be done in the search request, and is the step that should raise awareness about potential performance implications. On the other hand, disabling doc_values in the mappings can also be seen as the step where users declare that performance is not a concern, given that doc_values are enabled by default whenever possible. Also, we now have a much better answer for slow search requests compared to before we had async search and Kibana used it, so loading from _source should now be less of a concern. On the other other hand, the set of users defining the mappings, potentially making the choice to disable doc_values, are not necessarily the same as the ones sending queries, but this is the same also for runtime fields defined in the mappings. Also, given that we default to doc_values enabled, falling back to _source would come into play effectively mostly for text fields and unmapped fields.

An additional question is around the vision for runtime fields: making things more automatic we would be also removing the need for the step of defining script-less runtime fields, though runtime fields would still be needed and used for computations performed within the script that defines them.

While we have a way forward for the script fields API, we will discuss again at a later time whether we want to fall back to _source also for sorting and aggregations. There is no immediate urgency, but it would be good to see if we can find agreement on this, which is why I left the team-discuss label.

@javanna javanna added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

5 participants