Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified highlighter fails by default in hybrid search with knn #101141

Closed
benwtrent opened this issue Oct 19, 2023 · 8 comments
Closed

Unified highlighter fails by default in hybrid search with knn #101141

benwtrent opened this issue Oct 19, 2023 · 8 comments
Assignees
Labels
>bug :Search Relevance/Highlighting How a query matched a document Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.10.0

Comments

@benwtrent
Copy link
Member

Elasticsearch Version

8.10+

Installed Plugins

No response

Java Version

bundled

OS Version

any

Problem Description

Unified highlighter fails with new setting

index.highlight.weight_matches_mode.enabled: "true"

This KnnScoreDocQuery was created by a different reader is the log line for the failure.

Relevant PR for the cause: #96068

Steps to Reproduce

This yaml rest test reproduces the bug.

---
"Test hybrid search with knn":
  - skip:
      version: ' - 8.3.99'
      reason: 'kNN added to search endpoint in 8.4'

  - do:
      indices.create:
        index: test-highlighting-knn
        body:
          mappings:
            "properties":
              "vectors":
                "type": "dense_vector"
                "dims": 2
                "index": true
                "similarity": "l2_norm"
              "text":
                "type": "text"
                "fields":
                  "fvh":
                    "type": "text"
                    "term_vector": "with_positions_offsets"
                  "postings":
                    "type": "text"
                    "index_options": "offsets"
  - do:
      index:
        index: test-highlighting-knn
        id:    "1"
        body:
          "text" : "The quick brown fox is brown."
          "vectors": [1, 2]
  - do:
      indices.refresh: {}

  - do:
      search:
        index: test-highlighting-knn
        body: {
          "query": { "multi_match": { "query": "quick brown fox", "type": "phrase", "fields": [ "text*" ] } },
          "highlight": { "type": "unified", "fields": { "*": { } } },
          "knn": { "field": "vectors", "query_vector": [1, 2], "k": 10, "num_candidates": 10 } }

  - match: { hits.hits.0.highlight.text.0: "The <em>quick brown fox</em> is brown." }
  - match: { hits.hits.0.highlight.text\.fvh.0: "The <em>quick brown fox</em> is brown." }
  - match: { hits.hits.0.highlight.text\.postings.0: "The <em>quick brown fox</em> is brown." }

  - do:
      indices.put_settings:
        index: test-highlighting-knn
        body:
          index.highlight.weight_matches_mode.enabled: "false"

  - do:
      search:
        index: test-highlighting-knn
        body: {
          "query" : { "multi_match" : { "query" : "quick brown fox", "type": "phrase", "fields" : [ "text*"] } },
          "highlight" : { "type" : "unified", "fields" : { "*" : {} } },
          "knn": { "field": "vectors", "query_vector": [1, 2], "k": 10, "num_candidates": 10 } }

  - match: { hits.hits.0.highlight.text.0: "The <em>quick brown fox</em> is brown." }
  - match: { hits.hits.0.highlight.text\.fvh.0: "The <em>quick brown fox</em> is brown." }
  - match: { hits.hits.0.highlight.text\.postings.0: "The <em>quick brown fox</em> is brown." }

Logs (if relevant)

The failure stack trace looks as follows:

  1> Caused by: java.lang.IllegalStateException: This KnnScoreDocQuery was created by a different reader
  1>    at [email protected]/org.elasticsearch.search.vectors.KnnScoreDocQuery.createWeight(KnnScoreDocQuery.java:71)
  1>    at [email protected]/org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:896)
  1>    at [email protected]/org.apache.lucene.search.BooleanWeight.<init>(BooleanWeight.java:59)
  1>    at [email protected]/org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:245)
  1>    at [email protected]/org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumsWeightMatcher(FieldOffsetStrategy.java:146)
  1>    at [email protected]/org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumFromReader(FieldOffsetStrategy.java:74)
  1>    at [email protected]/org.apache.lucene.search.uhighlight.MemoryIndexOffsetStrategy.getOffsetsEnum(MemoryIndexOffsetStrategy.java:119)
  1>    at [email protected]/org.apache.lucene.search.uhighlight.FieldHighlighter.highlightFieldForDoc(FieldHighlighter.java:80)
  1>    at [email protected]/org.elasticsearch.lucene.search.uhighlight.CustomFieldHighlighter.highlightFieldForDoc(CustomFieldHighlighter.java:63)
  1>    at org.elasticsearch.s  1> [email protected]/org.elasticsearch.lucene.search.uhighlight.CustomUnifiedHighlighter.highlightField(CustomUnifiedHighlighter.java:148)
  1>    at [email protected]/org.elasticsearch.search.fetch.subphase.highlight.DefaultHighlighter.highlight(DefaultHighlighter.java:81)
  1>    at [email protected]/org.elasticsearch.search.fetch.subphase.highlight.HighlightPhase$1.process(HighlightPhase.java:69)
  1>    at [email protected]/org.elasticsearch.search.fetch.FetchPhase$1.nextDoc(FetchPhase.java:164)
  1>    at [email protected]/org.elasticsearch.search.fetch.FetchPhaseDocsIterator.iterate(FetchPhaseDocsIterator.java:70)
  1>    at [email protected]/org.elasticsearch.search.fetch.FetchPhase.buildSearchHits(FetchPhase.java:170)
  1>    at [email protected]/org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:79)
  1>    at [email protected]/org.elasticsearch.search.SearchService.lambda$executeFetchPhase$9(SearchService.java:863)
  1>    at [email protected]/org.elasticsearch.action.ActionRunnable$2.accept(ActionRunnable.java:51)
  1>    at [email protected]/org.elasticsearch.action.ActionRunnable$2.accept(ActionRunnable.java:48)
  1>    at [email protected]/org.elasticsearch.action.ActionRunnable$3.doRun(ActionRunnable.java:73)
  1>    at [email protected]/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
  1>    ... 6 more
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Oct 19, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@benwtrent
Copy link
Member Author

To get around this bug, the following setting should be applied to the index:

PUT /<INDEX_NAME>/_settings
{
  "index" : {
    "highlight.weight_matches_mode.enabled" : "false"
  }
}

@benwtrent
Copy link
Member Author

MemoryIndexOffsetStrategy creates a new OverlaySingleDocTermsLeafReader with the provided reader.

That reader is a FilterLeafReader, whenever LeafReader is constructed a NEW context object is created:

  private final LeafReaderContext readerContext = new LeafReaderContext(this);

Since we compare readerContext.id(), it seems to me that even when using a new FilterLeafReader that encapsulates a previously used LeafReader, we will throw here:

if (searcher.getIndexReader().getContext().id() != contextIdentity) {
            throw new IllegalStateException("This KnnScoreDocQuery was created by a different reader");
        }

@benwtrent
Copy link
Member Author

            throw new IllegalStateException("This KnnScoreDocQuery was created by a different reader");
        }

Should we FilterLeafReader.unwrap(searcher.getIndexReader()).getContext().id?

The main idea is that we don't want to fail due to segment offsets being different between the query rewrite and the query scoring.

@benwtrent
Copy link
Member Author

well, unwrapping doesn't help. Still trying to find out why we are using a different reader when highlighting vs. building the query.

@benwtrent
Copy link
Member Author

OK, I think the issue is this:

Doing Weight#matches against KnnScoreDocQuery with a sub-reader instead of the top-level will fail every time from what I can tell.

@jimczi @javanna ^ What do y'all think?

benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Oct 19, 2023
@benwtrent
Copy link
Member Author

FYI, here is a branch with some tests that SHOULD pass, but don't. https://github.com/elastic/elasticsearch/compare/main...benwtrent:elasticsearch:bugfix/issue-101141?expand=1

@benwtrent
Copy link
Member Author

I am gonna open a PR to auto-disable weight_mode highlighting when a knn query is present. I think truly fixing it when kNN is around will take more work

@javanna javanna added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search Relevance/Highlighting How a query matched a document Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.10.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants