-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the ability to disable the retrieval of the metadata fields #19918
Conversation
@areek this is something I started some time ago so it was easy for me to restore it. IMO this is required for the suggestions in order to completely bypass the retrieval of the stored fields when no fields are explicitly required. |
This is a follow up of theses discussions #11887 (comment), #19877 (comment) By default search operations returns the metadata fields of each search hit. This change allows to disable the retrieval of these fields (_source, _id, _type, _routing, _parent, ...) by adding a new parameter called `fetch_metadata`. This parameter can be set to false only if `_source`, `version` or `stored_fields` are not explicitly requested. This could be a nice speed up for the suggestions now that they use the fetch phase to retrieve the _source. The problem is that even if _source is disabled we visit the stored fields for each suggestion and extract the metadata fields, with this change we could bypass this step.
I am a bit concerned about adding a new parameter to the search API. Could we somehow default suggestions to not look at stored fields at all and allow users to get more fields, including metadata like |
Well the FetchPhase treats each doc_id blindly without knowing if it's a suggestion doc id or a search doc id. I find it really nice, for instance you can do:
... which returns all the suggestions enriched with the docvalue fields. If we start to re-differentiate suggestions and search results we could defect the purpose of the initial refactoring. |
What would this flag do when set to true, in case the request is an ordinary search (with just a query section)? What would be returned for each hit? |
I'm not sure I like the flag itself but I love the idea that you can shut off returning some of these fields. That'd let you just return fields from doc_values and skip stored fields entirely, I believe. If that is the goal I think however we build this it'd be nice to have a test that asserts this by building the node in such a way that it'll crash rather than retrieve stored fields. |
The score, the _index and the docvalue_fields if any.
Yes that's the idea. I am ok to change the flag but I don't see a better way ;). Is it the name or the fact that it's a simple boolean ?
Thanks I'll keep that in mind if we find a consensus. |
Sorry I misread your question. Same as before, the metadata I am talking about are |
you didn't, I mistyped it, it should have been "What would this flag do when set to false" :) so with |
Initially I thought the same, but then I wonder, what is the usecase for this (apart from the suggest one addressed with this PR)? |
Well searches that requires fields from doc_values only. In such case we don't need (want) to decompress the stored fields just to retrieve an _id and a _type that the user will ignore. It's not a common use case, more like an edge case but my initial thought was that it would be cool to solve something for the suggestions that could be reuse for the search. It seems that I was wrong on that point ;) |
New proposal:We don't fetch/visit the document at all iff _source is disabled AND
I think it makes sense since it's the only combination where not fetching the metadata speeds up the process. |
I'm fine with it. I like it better than the original proposal. |
I think it could be trappy for an application that has a dynamic list of stored fields to retrieve, as it would need to be careful that if the list is empty, results will not have metadata fields. Maybe we should use a special value like we do for empty lists for stopwords when configuring analyzers? |
This is a follow up of theses discussions #11887 (comment), #19877 (comment)
By default search operations returns the metadata fields of each search hit.
This change allows to disable the retrieval of these fields (_source, _id, _type, _routing, _parent, ...) by adding a new parameter called
fetch_metadata
. This parameter can be set to false only if_source
,version
orstored_fields
are not explicitly requested.This could be a nice speed up for the suggestions now that they use the fetch phase to retrieve the _source.
The problem is that even if _source is disabled we visit the stored fields for each suggestion and extract the metadata fields, with this change we could bypass this step.