-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Highlighting can take 50% to 99% of search time #103298
Comments
Testing detailsSummary of findings
The customer results from running the same query with three different conditions (none of which mattered) shows that the search finished quickly, but the highlighting took ~850s. My testing results are shown in the rows below that. I ran with and without highlighting, with and without Finally I also did a second large index where the In all cases of my testing, running with highlighting causes the query to run twice as long. The other variations I tried made to no obvious difference. I captured several hot threads snapshots. The highlighting runs in the second half the query. For the first half, no highlighting activity appears in the hot threads. Below I provide 5 sample hotthreads "dumps" that I took. Hotthreads while highlighting phase is running (toggle to view)
|
Pinging @elastic/es-search (Team:Search) |
This change ensures that the matches implementation of the `SourceConfirmedTextQuery` only checks the current document instead of calling advance on the two phase iterator. The latter tries to find the first doc that matches the query instead of restricting the search to the current doc. This can lead to abnormally slow highlighting if the query is very restrictive and the highlight is done on a non-matching document. Closes elastic#103298
This change ensures that the matches implementation of the `SourceConfirmedTextQuery` only checks the current document instead of calling advance on the two phase iterator. The latter tries to find the first doc that matches the query instead of restricting the search to the current doc. This can lead to abnormally slow highlighting if the query is very restrictive and the highlight is done on a non-matching document. Closes #103298
…5930) This change ensures that the matches implementation of the `SourceConfirmedTextQuery` only checks the current document instead of calling advance on the two phase iterator. The latter tries to find the first doc that matches the query instead of restricting the search to the current doc. This can lead to abnormally slow highlighting if the query is very restrictive and the highlight is done on a non-matching document. Closes elastic#103298
…105983) This change ensures that the matches implementation of the `SourceConfirmedTextQuery` only checks the current document instead of calling advance on the two phase iterator. The latter tries to find the first doc that matches the query instead of restricting the search to the current doc. This can lead to abnormally slow highlighting if the query is very restrictive and the highlight is done on a non-matching document. Closes #103298
Elasticsearch Version
8.10.4 and 8.13.0-SNAPSHOT
Installed Plugins
No response
Java Version
bundled
OS Version
ESS Cloud and MacOS
Problem Description
A customer has reported that highlighting (added by Kibana) is causing their queries in Kibana to extremely slowly. Profiling of the queries showed that 99% of the search time is spent in the Highlight code paths. In their case, the ConstantScoreQuery took less than 1 second, but the highlighting runs for ~850s (this is reproducible).
I was able to reproduce this to some degree locally (details below), where a the highlighting takes ~50% of the total search time.
Root cause is not known. Key points include:
match_only_text
.Steps to Reproduce
Here is how I (partially) reproduced the issue locally.
Use the following mapping to crate an index where
message
hasmatch_only_text
and 1 shard is created (I didn't test with more than 1 shard).Mapping (toggle to view)
Populate this index with at least 2.5 million records in scope for the search (see my table in comments below).
Data generation script (toggle to view)
Loading script (toggle to view)
Note I ran the above two scripts dozens of times to create data files with 100,000 to 200,000 entries, so that most of the data fell within a small-ish @timestamp range to index millions of documents.
Run the following query (has profiling and highlighting turned on).
ES Query (toggle to view)
Logs (if relevant)
No response
The text was updated successfully, but these errors were encountered: