Memory efficient xcontent filtering (backport of #77154) #77653
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I found myself needing support for something like
filter_path
onXContentParser
. It was simple enough to plug it in so I did. Then Irealized that it might offer more memory efficient source filtering
(#25168) so I put together a quick benchmark comparing the source
filtering that we do in
_search
.Filtering using the parser is about 33% faster than how we filter now
when you select a single field from a 300 byte document:
The top line is the way we filter now. The middle line is adding a
filter to
XContentBuilder
- something we can do right now without anyof my plumbing work. The bottom line is filtering on the parser,
requiring all the new plumbing.
This isn't particularly impresive. 33% sounds great! But 700
nanoseconds per document isn't going to cut into anyone's search times.
If you fetch a thousand docuents that's .7 milliseconds of savings.
But we mostly advise folks to use source filtering on fetch when the
source is large and you only want a small part of it. So I tried when
the source is about 4.3kb and you want a single field:
That's 45% faster. Put another way, 2.7 microseconds a document. Not
bad!
But have a look at how things come out when you want a single field from
a 4 megabyte document:
These documents are very large. I've encountered documents like them in
real life, but they've always been the outlier for me. But a 6.5
millisecond per document savings ain't anything to sneeze at.
Take a look at what you get when I turn on gc metrics: