-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mappings should fall back to _source when doc values are disabled #80504
Comments
Pinging @elastic/es-search (Team:Search) |
This is an important component of the scripting (Painless) fields API that we would like to get prioritized. See (#79105). |
cc @giladgal |
I discussed this further with @jpountz today, and he suggested that we leverage the existing fielddataBuilder in MappedFieldType to do the fallback in the case when doc values are not enabled. He also suggested that fallback use the existing runtime fields infrastructure to generate the appropriate doc values from source. I believe this is a good path forward but does require some additional plumbing work to get the source paths for a specific field into scripting and other places that would want to leverage source fallback. @jpountz also suggested for text fields as noted in (#81246) we could possibly follow up with another fielddataBuilder method that would default to source for specific field types or allow further customization of the values returned even if doc values were to exist. The scripting fields API could then use this second method instead of the first one. |
Heya, I think I missed this discussion back when this issue was opened, catching up now. I am a bit surprised by the direction taken and I think this deserves a high-bandwidth discussion to get alignment on, which is why I am marking it team-discuss. |
We have re-discussed this with the team and decided to decouple the scripting needs (load doc_values or _source transparently, depending on what's available) from the idea of falling back to _source when doc_values are not available for sorting and aggregations. We briefly discussed whether the script fields API should leverage field fetchers internally or the fielddata builder abstraction. It was mentioned that it's very important that where you load from is transparent, meaning that _source should look and feel like doc_values. This is exactly what runtime fields do today, and we have the chance to reuse the existing fielddata implementations for runtime fields. With this said we decided to introduce a new Few considerations that were discussed around falling back to _source for sorting/aggregations: there are some concerns that the automatic fallback is too subtle and users are not aware of the performance implications. Today, defining a runtime field to load from _source is easy enough, can be done in the search request, and is the step that should raise awareness about potential performance implications. On the other hand, disabling doc_values in the mappings can also be seen as the step where users declare that performance is not a concern, given that doc_values are enabled by default whenever possible. Also, we now have a much better answer for slow search requests compared to before we had async search and Kibana used it, so loading from _source should now be less of a concern. On the other other hand, the set of users defining the mappings, potentially making the choice to disable doc_values, are not necessarily the same as the ones sending queries, but this is the same also for runtime fields defined in the mappings. Also, given that we default to doc_values enabled, falling back to _source would come into play effectively mostly for text fields and unmapped fields. An additional question is around the vision for runtime fields: making things more automatic we would be also removing the need for the step of defining script-less runtime fields, though runtime fields would still be needed and used for computations performed within the script that defines them. While we have a way forward for the script fields API, we will discuss again at a later time whether we want to fall back to _source also for sorting and aggregations. There is no immediate urgency, but it would be good to see if we can find agreement on this, which is why I left the team-discuss label. |
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
People working on scripting have been looking into making scripts fall back automatically to
_source
when doc values are not enabled, in order to make scripts easier to use. @jdconrad and I discussed it recently, and it would be better to do it at the mapping level. This way, whether doc values or_source
is used for scripts would be completely transparent to script engines, and we could also support sorting or aggregations on fields that have doc values disabled. We need to have the ability to read from_source
directly for runtime fields anyway, so hopefully we could reuse code?One question is whether this could be a performance trap at times. Maybe we should look into ways to tell users that their queries could run faster with better mappings?
The text was updated successfully, but these errors were encountered: