Highlighters are slow with thousands of fields #36452

melissachang · 2018-12-10T20:35:52Z

I have a multi_match query that searches across all fields. Without highlighting, it takes 2 seconds.

I'd like to use highlight to tell me which fields match. But highlight makes the query take several minutes long. I gave up after 2 minutes -- I'm not sure if the query ever finishes. I tried all three highlight types (unified, plain, fvh).

So instead of using highlight, I'm manually iterating through the source documents and finding the matching field. This only takes maybe .2 seconds.

(Curious about how Elasticsearch query works -- as part of the query process, does Elasticsearch know which field matches? Say document D contains field F that matches query Q. When Elasticsearch determines that D is a result for Q, does Elasticserach know that F contains Q, as part of that process?)

Would it be possible to create a highlight type that is fast? It would either return only the field that matched, or the field name and entire contents of the field.

Here are some timing stats. My index has 121k documents. Across all documents there are 7k fields; a particular document will have a small subset of the 7k fields.

~/data-explorer (master): curl "localhost:9200/_cat/indices?v&s=index"
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open nurse_s_health_study kMyBnxPVTLC7W5ufetg3UQ 5 1 121701 0 5.6gb 5.6gb
yellow open nurse_s_health_study_fvh snoFPeGQSWqXZL8rGG-1qg 5 1 121701 0 8.9gb 8.9gb

no highlighter

GET /nurse_s_health_study/_search
{
  "query": {
    "multi_match": {
      "query": "pre",
      "type": "phrase_prefix"
    }
  },
  "size": 10
}

2.2s
unified highlighter

GET /nurse_s_health_study/_search
{
    "query": {
    "multi_match": {
      "query": "pre",
      "type": "phrase_prefix"
    }
  },
  "highlight": {
    "fields": {
      "*": {
        "type": "unified"
      }
    }
  },
  "size": 10
}

Gave up after 2 mins.
plain highlighter

GET /nurse_s_health_study/_search
{
    "query": {
    "multi_match": {
      "query": "pre",
      "type": "phrase_prefix"
    }
  },
  "highlight": {
    "fields": {
      "*": {
        "type": "plain"
      }
    }
  },
  "size": 10
}

Gave up after 4 minutes.
fvh highligher
I reindexed with term_vector = with_positions_offsets.

GET /nurse_s_health_study_fvh/_search
{
    "query": {
    "multi_match": {
      "query": "pre",
      "type": "phrase_prefix"
    }
  },
  "highlight": {
    "fields": {
      "*": {
        "type": "fvh"
      }
    }
  },
  "size": 10
}

Gave up after 2 minutes.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-12-11T07:29:33Z

Pinging @elastic/es-search

jimczi · 2018-12-11T08:55:22Z

@melissachang I tried a recreation in 6.4 with the information you gave (7K fields and a few of them per document) and highlighting was fast. We added a shortcut in this version to bypass fields that don't appear in the document when highlighting so I guess that you're using a version without this enhancement.
However the issue with lot of fields should not affect the plain and fvh highlighter so I am not sure if this is what caused the slow queries in your case. Are you sure that the timings you reported are not all linked to the unified highlighter. I tried the same recreation in 6.3.2 and the query took several minutes to complete but the other highlighters responded in less than a second.
Could you please provide us with the version you used in your tests and check if the new (6.5) solved your issue ?
I am also a bit sad when I see the title of your issue, we try our best to provide tools that can be used for various use cases so when something isn't working as you want it is probably a bug or something wrong in your configuration (using 7k fields doesn't help here ;) ). Can you change the title to reflect the real issue here ? Highlighting is slow on mappings with thousands of fields, this is a bug that we hope we fixed in 6.4 for the unified highlighter but again since it shouldn't affect the other highlighters I suspect that something else is at play here so we'll need more informations.

melissachang · 2018-12-11T20:02:59Z

Apologies, it didn't occur to me to file a bug report instead of a feature request. (And because I was using the feature request template, I didn't include Elasticsearch version.) I am using the docker image docker.elastic.co/elasticsearch/elasticsearch-oss:6.2.2. I am trying 6.5.3 now. I'll update this issue afterwards.

melissachang · 2018-12-12T00:29:11Z

(I realized when I was searching for similar issues, I came across #34015 Leverage the Lucene's Matches API in a new highlighter type, which influenced me to create a feature request for new highlighter type.)

With 6.5.3:
unified - Gave up after 4 mins
plain - Worked after 4 mins 15 sec. No logs from elasticsearch.
fvh - Haven't tried yet, haven't reindexed with necessary flags.

Unfortunately my data is private. I'll try to find similar public data and repro. I'll let you know if I do.

melissachang · 2019-01-10T01:31:16Z

Unfortunately I wasn't able to create a index that reproduces this problem.

Here are some properties of my index:

bash-4.4# curl "localhost:9200/_cat/indices?v&s=index"
health status index                       uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   nurse_s_health_study        FxMn0ugWQ9G3vhM4agzfHw   5   1     121701            0     11.2gb          5.3gb

Across all documents, there are a total of 6624 fields. ~4k of the fields are string (as opposed to numeric). A single document may have 2k fields, give or take.

If anyone comes across a similar index, please try out the above queries.

I work on a tool that indexes Google BigQuery tables. If anyone comes across a public BigQuery table with the above properties (> 6k columns, > 120k rows), I'm happy to run my indexer and try to repro this bug.

jimczi · 2019-04-04T09:51:38Z

As explained in this comment we have a shortcut to bypass highlighting if the field is empty or null in the current document. I tried to reproduce the slow query in >6.4 and it responded in less than a second so I think that something else is at play in your setup. I am going to close this issue but we can revisit if you provide a clear reproduction since the example in the description should be fixed by #32090.

matriv added the :Search Relevance/Highlighting How a query matched a document label Dec 11, 2018

jimczi added the feedback_needed label Dec 11, 2018

melissachang changed the title ~~New highlight type that isn't slow~~ Highlighters are slow with thousands of fields Dec 11, 2018

melissachang mentioned this issue Dec 13, 2018

Upgrade Elasticsearch DataBiosphere/data-explorer#250

Closed

melissachang mentioned this issue Dec 21, 2018

Kibaba /_search query causes Chrome tab to hang elastic/kibana#26743

Closed

colings86 removed the feedback_needed label Jan 14, 2019

jimczi closed this as completed Apr 4, 2019

javanna added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Highlighters are slow with thousands of fields #36452

Highlighters are slow with thousands of fields #36452

melissachang commented Dec 10, 2018

elasticmachine commented Dec 11, 2018

jimczi commented Dec 11, 2018

melissachang commented Dec 11, 2018

melissachang commented Dec 12, 2018 •

edited

Loading

melissachang commented Jan 10, 2019

jimczi commented Apr 4, 2019

Highlighters are slow with thousands of fields #36452

Highlighters are slow with thousands of fields #36452

Comments

melissachang commented Dec 10, 2018

elasticmachine commented Dec 11, 2018

jimczi commented Dec 11, 2018

melissachang commented Dec 11, 2018

melissachang commented Dec 12, 2018 • edited Loading

melissachang commented Jan 10, 2019

jimczi commented Apr 4, 2019

melissachang commented Dec 12, 2018 •

edited

Loading