-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Highlighters are slow with thousands of fields #36452
Comments
Pinging @elastic/es-search |
@melissachang I tried a recreation in 6.4 with the information you gave (7K fields and a few of them per document) and highlighting was fast. We added a shortcut in this version to bypass fields that don't appear in the document when highlighting so I guess that you're using a version without this enhancement. |
Apologies, it didn't occur to me to file a bug report instead of a feature request. (And because I was using the feature request template, I didn't include Elasticsearch version.) I am using the docker image docker.elastic.co/elasticsearch/elasticsearch-oss:6.2.2. I am trying 6.5.3 now. I'll update this issue afterwards. |
(I realized when I was searching for similar issues, I came across #34015 Leverage the Lucene's Matches API in a new highlighter type, which influenced me to create a feature request for new highlighter type.) With 6.5.3: Unfortunately my data is private. I'll try to find similar public data and repro. I'll let you know if I do. |
Unfortunately I wasn't able to create a index that reproduces this problem. Here are some properties of my index:
Across all documents, there are a total of 6624 fields. ~4k of the fields are string (as opposed to numeric). A single document may have 2k fields, give or take. If anyone comes across a similar index, please try out the above queries. I work on a tool that indexes Google BigQuery tables. If anyone comes across a public BigQuery table with the above properties (> 6k columns, > 120k rows), I'm happy to run my indexer and try to repro this bug. |
As explained in this comment we have a shortcut to bypass highlighting if the field is empty or null in the current document. I tried to reproduce the slow query in >6.4 and it responded in less than a second so I think that something else is at play in your setup. I am going to close this issue but we can revisit if you provide a clear reproduction since the example in the description should be fixed by #32090. |
I have a
multi_match
query that searches across all fields. Without highlighting, it takes 2 seconds.I'd like to use highlight to tell me which fields match. But highlight makes the query take several minutes long. I gave up after 2 minutes -- I'm not sure if the query ever finishes. I tried all three highlight types (unified, plain, fvh).
So instead of using highlight, I'm manually iterating through the source documents and finding the matching field. This only takes maybe .2 seconds.
(Curious about how Elasticsearch query works -- as part of the query process, does Elasticsearch know which field matches? Say document D contains field F that matches query Q. When Elasticsearch determines that D is a result for Q, does Elasticserach know that F contains Q, as part of that process?)
Would it be possible to create a highlight type that is fast? It would either return only the field that matched, or the field name and entire contents of the field.
Here are some timing stats. My index has 121k documents. Across all documents there are 7k fields; a particular document will have a small subset of the 7k fields.
~/data-explorer (master): curl "localhost:9200/_cat/indices?v&s=index"
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open nurse_s_health_study kMyBnxPVTLC7W5ufetg3UQ 5 1 121701 0 5.6gb 5.6gb
yellow open nurse_s_health_study_fvh snoFPeGQSWqXZL8rGG-1qg 5 1 121701 0 8.9gb 8.9gb
no highlighter
2.2s
unified highlighter
Gave up after 2 mins.
plain highlighter
Gave up after 4 minutes.
fvh highligher
I reindexed with
term_vector = with_positions_offsets
.Gave up after 2 minutes.
The text was updated successfully, but these errors were encountered: