Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to disable the retrieval of the metadata fields #19918

Closed
wants to merge 2 commits into from
Closed

Add the ability to disable the retrieval of the metadata fields #19918

wants to merge 2 commits into from

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Aug 10, 2016

This is a follow up of theses discussions #11887 (comment), #19877 (comment)

By default search operations returns the metadata fields of each search hit.
This change allows to disable the retrieval of these fields (_source, _id, _type, _routing, _parent, ...) by adding a new parameter called fetch_metadata. This parameter can be set to false only if _source, version or stored_fields are not explicitly requested.

This could be a nice speed up for the suggestions now that they use the fetch phase to retrieve the _source.
The problem is that even if _source is disabled we visit the stored fields for each suggestion and extract the metadata fields, with this change we could bypass this step.

@jimczi jimczi added >feature :Search/Search Search-related issues that do not fall into other categories v5.0.0-beta1 labels Aug 10, 2016
@jimczi
Copy link
Contributor Author

jimczi commented Aug 10, 2016

@areek this is something I started some time ago so it was easy for me to restore it. IMO this is required for the suggestions in order to completely bypass the retrieval of the stored fields when no fields are explicitly required.

This is a follow up of theses discussions #11887 (comment), #19877 (comment)

By default search operations returns the metadata fields of each search hit.
This change allows to disable the retrieval of these fields (_source, _id, _type, _routing, _parent, ...) by adding a new parameter
called `fetch_metadata`. This parameter can be set to false only if `_source`, `version` or `stored_fields` are not explicitly requested.

This could be a nice speed up for the suggestions now that they use the fetch phase to retrieve the _source.
The problem is that even if _source is disabled we visit the stored fields for each suggestion and extract the metadata fields, with this change we could bypass this step.
@jpountz
Copy link
Contributor

jpountz commented Aug 10, 2016

I am a bit concerned about adding a new parameter to the search API. Could we somehow default suggestions to not look at stored fields at all and allow users to get more fields, including metadata like _id, using the stored_fields option?

@jimczi
Copy link
Contributor Author

jimczi commented Aug 10, 2016

Could we somehow default suggestions to not look at stored fields at all and allow users to get more fields, including metadata like _id, using the stored_fields option?

Well the FetchPhase treats each doc_id blindly without knowing if it's a suggestion doc id or a search doc id. I find it really nice, for instance you can do:

POST music/_search
{
   "_source": false,
   "docvalue_fields": ["suggest_id"],
   "suggest": {
      "song-suggest": {
         "prefix": "nir",
         "completion": {
            "field": "suggest"
         }
      }
   }
}

... which returns all the suggestions enriched with the docvalue fields.

If we start to re-differentiate suggestions and search results we could defect the purpose of the initial refactoring.
I agree that adding another parameter is not the ideal solution but it's not additional, you just need to replace _source:false with fetch_metadata:false since _source is a metadata field.

@javanna
Copy link
Member

javanna commented Aug 10, 2016

What would this flag do when set to true, in case the request is an ordinary search (with just a query section)? What would be returned for each hit?

@nik9000
Copy link
Member

nik9000 commented Aug 10, 2016

I'm not sure I like the flag itself but I love the idea that you can shut off returning some of these fields. That'd let you just return fields from doc_values and skip stored fields entirely, I believe.

If that is the goal I think however we build this it'd be nice to have a test that asserts this by building the node in such a way that it'll crash rather than retrieve stored fields.

@jimczi
Copy link
Contributor Author

jimczi commented Aug 10, 2016

What would be returned for each hit?

The score, the _index and the docvalue_fields if any.

I'm not sure I like the flag itself but I love the idea that you can shut off returning some of these fields. That'd let you just return fields from doc_values and skip stored fields entirely, I believe.

Yes that's the idea. I am ok to change the flag but I don't see a better way ;). Is it the name or the fact that it's a simple boolean ?

If that is the goal I think however we build this it'd be nice to have a test that asserts this by building the node in such a way that it'll crash rather than retrieve stored fields.

Thanks I'll keep that in mind if we find a consensus.

@jimczi jimczi added the discuss label Aug 10, 2016
@jimczi
Copy link
Contributor Author

jimczi commented Aug 10, 2016

What would this flag do when set to true, in case the request is an ordinary search (with just a query section)? What would be returned for each hit?

Sorry I misread your question. Same as before, the metadata I am talking about are id, type and the optional ones like routing or parent. The idea is to keep the current behavior by default (which is to return these fields) and if you don't want to retrieve these fields because you know that you don't need them you can disable the retrieval.

@javanna
Copy link
Member

javanna commented Aug 10, 2016

Sorry I misread your question.

you didn't, I mistyped it, it should have been "What would this flag do when set to false" :)

so with fetch_metadata set to false the search api with a simple query would return _index and score only for each hit? I think I do not like the flag because it seems very specific to suggestions although made generic. I don't have a good proposal as for what I would like more though.

@javanna
Copy link
Member

javanna commented Aug 10, 2016

I love the idea that you can shut off returning some of these fields

Initially I thought the same, but then I wonder, what is the usecase for this (apart from the suggest one addressed with this PR)?

@jimczi
Copy link
Contributor Author

jimczi commented Aug 10, 2016

Initially I thought the same, but then I wonder, what is the usecase for this (apart from the suggest one addressed with this PR)?

Well searches that requires fields from doc_values only. In such case we don't need (want) to decompress the stored fields just to retrieve an _id and a _type that the user will ignore. It's not a common use case, more like an edge case but my initial thought was that it would be cool to solve something for the suggestions that could be reuse for the search. It seems that I was wrong on that point ;)

@jimczi
Copy link
Contributor Author

jimczi commented Aug 16, 2016

New proposal:

We don't fetch/visit the document at all iff _source is disabled AND stored_fields is an explicit empty array:

POST _search
{
   "_source": false,
   "stored_fields": []
}

I think it makes sense since it's the only combination where not fetching the metadata speeds up the process.

@jpountz @javanna @nik9000 WDYT ?

@nik9000
Copy link
Member

nik9000 commented Aug 16, 2016

I think it makes sense since it's the only combination where not fetching the metadata speeds up the process.

I'm fine with it. I like it better than the original proposal.

@jpountz
Copy link
Contributor

jpountz commented Aug 16, 2016

I think it could be trappy for an application that has a dynamic list of stored fields to retrieve, as it would need to be careful that if the list is empty, results will not have metadata fields. Maybe we should use a special value like we do for empty lists for stopwords when configuring analyzers?

@jimczi
Copy link
Contributor Author

jimczi commented Aug 17, 2016

I've opened #20026 which is a spin off of @jpountz's idea (using none as the disabler).

@jimczi jimczi closed this Aug 17, 2016
@jimczi jimczi deleted the fetch_metadata branch February 3, 2017 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss >feature :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants