Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highlighting Error with span_field_masking Requires Indexing Offsets Unexpectedly #101804

Open
ahoogol opened this issue Nov 4, 2023 · 11 comments · Fixed by #103490
Open

Highlighting Error with span_field_masking Requires Indexing Offsets Unexpectedly #101804

ahoogol opened this issue Nov 4, 2023 · 11 comments · Fixed by #103490
Labels
>enhancement :Search Relevance/Highlighting How a query matched a document Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@ahoogol
Copy link

ahoogol commented Nov 4, 2023

Elasticsearch Version

8.10.4

Installed Plugins

No response

Java Version

bundled

OS Version

Elastic Cloud - GCP - Iowa (us-central1)

Problem Description

I encountered an issue when using the span_field_masking feature in Elasticsearch. When attempting to use the highlighter with this feature, the following error is thrown:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "field 'text' was indexed without offsets, cannot highlight"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "test_mask",
        "node": "jUZ9p0ZtR6-xYevegW6O_Q",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "field 'text' was indexed without offsets, cannot highlight"
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "field 'text' was indexed without offsets, cannot highlight",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "field 'text' was indexed without offsets, cannot highlight"
      }
    }
  },
  "status": 400
}

If I set "index_options": "offsets" in the mapping of the masked field 'stem', highlighting works as expected. However, I'm puzzled as to why the highlighter requires indexing offsets. I'd like to understand why the highlighter doesn't re-analyze the text to calculate offsets dynamically. My concern is that indexing offsets increases the index size, which I want to avoid.

Steps to Reproduce

PUT test_mask
{
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "analyzer": "whitespace"
      },
      "stem": {
        "type": "text",
        "analyzer": "whitespace"
      }
    }
  }
}

PUT test_mask/_doc/1
{
  "text": "a _ a b",
  "stem": "_ b _ _"
}

GET test_mask/_search
{
  "query": {
    "span_near": {
      "clauses": [
        {
          "span_term": {
            "text": {
              "value": "a"
            }
          }
        },
        {
          "span_field_masking": {
            "field": "text", 
            "query": {
              "span_term": {
                "stem": {
                  "value": "b"
                }
              }
            }
          }
        }
      ],
      "slop": 0,
      "in_order": true
    }
  },
  "highlight": {
    "pre_tags": "(", 
    "post_tags": ")", 
    "fields": {
      "*": {}
    },
    "type": "unified"
  }
}

Expected result

I was expecting the highlight to look like this:

"highlight": {
  "text": [
    "(a) (_) a b"
  ]
}
@ahoogol ahoogol added >bug needs:triage Requires assignment of a team area label labels Nov 4, 2023
@ahoogol ahoogol changed the title Unexpected Need to index offsets when using span_field_masking Highlighting Error with span_field_masking Requires Indexing Offsets Unexpectedly Nov 4, 2023
@pxsalehi pxsalehi added :Search Relevance/Highlighting How a query matched a document and removed needs:triage Requires assignment of a team area label labels Nov 6, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Nov 6, 2023
@benwtrent
Copy link
Member

This is due to highlight.weight_matches_mode.enabled. I am not 100% sure why we are trying to get the offsets here.

But, to get around this bug,

PUT test_mask/_settings
{
  "index" : {
    "highlight.weight_matches_mode.enabled" : "false"
  }
}

Need to still dig into the correct fix here.

@benwtrent
Copy link
Member

error-trace:

java.lang.IllegalArgumentException: field 'text' was indexed without offsets, cannot highlight
  at [email protected]/org.apache.lucene.search.uhighlight.FieldHighlighter.highlightOffsetsEnums(FieldHighlighter.java:157)
  at [email protected]/org.elasticsearch.lucene.search.uhighlight.CustomFieldHighlighter.highlightOffsetsEnums(CustomFieldHighlighter.java:106)
  at [email protected]/org.apache.lucene.search.uhighlight.FieldHighlighter.highlightFieldForDoc(FieldHighlighter.java:83)
  at [email protected]/org.elasticsearch.lucene.search.uhighlight.CustomFieldHighlighter.highlightFieldForDoc(CustomFieldHighlighter.java:63)
  at [email protected]/org.elasticsearch.lucene.search.uhighlight.CustomUnifiedHighlighter.highlightField(CustomUnifiedHighlighter.java:148)
  at [email protected]/org.elasticsearch.search.fetch.subphase.highlight.DefaultHighlighter.highlight(DefaultHighlighter.java:81)
  at [email protected]/org.elasticsearch.search.fetch.subphase.highlight.HighlightPhase$1.process(HighlightPhase.java:69)
  at [email protected]/org.elasticsearch.search.fetch.FetchPhase$1.nextDoc(FetchPhase.java:163)
  at [email protected]/org.elasticsearch.search.fetch.FetchPhaseDocsIterator.iterate(FetchPhaseDocsIterator.java:70)
  at [email protected]/org.elasticsearch.search.fetch.FetchPhase.buildSearchHits(FetchPhase.java:169)
  at [email protected]/org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:78)
  at [email protected]/org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:711)
  at [email protected]/org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:682)
  at [email protected]/org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:543)
  at [email protected]/org.elasticsearch.action.ActionRunnable$2.accept(ActionRunnable.java:51)
  at [email protected]/org.elasticsearch.action.ActionRunnable$2.accept(ActionRunnable.java:48)
  at [email protected]/org.elasticsearch.action.ActionRunnable$3.doRun(ActionRunnable.java:73)
  at [email protected]/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
  at [email protected]/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
  at [email protected]/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)
  at [email protected]/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
  at java.base/java.lang.Thread.run(Thread.java:1583)

@ahoogol
Copy link
Author

ahoogol commented Nov 7, 2023

This is due to highlight.weight_matches_mode.enabled. I am not 100% sure why we are trying to get the offsets here.

But, to get around this bug,

PUT test_mask/_settings
{
  "index" : {
    "highlight.weight_matches_mode.enabled" : "false"
  }
}

Need to still dig into the correct fix here.

@benwtrent Thank you for your suggestion. While running your suggested command, the error no longer occurs. However, I've noticed that the generated highlight doesn't match my expected output.

With your command:

"highlight": {
   "text": [
     "(a) _ (a) b"
   ],
   "stem": [
     "_ (b) _ _"
   ]
 }

I was expecting the highlight to look like this:

"highlight": {
  "text": [
    "(a) (_) a b"
  ]
}

Is there a way to achieve this expected result while avoiding the error?

@benwtrent
Copy link
Member

@ahoogol turn on offsets for the fields and use "highlight.weight_matches_mode.enabled" : "true"

@ahoogol
Copy link
Author

ahoogol commented Nov 8, 2023

Thank you for your suggestion, @benwtrent. Yes, it highlights correctly when enabling offsets. But, my concern remains about the increase in index size. I'm still exploring alternative approaches to achieve the desired highlight without the need to turn on offsets to keep the index size manageable. If you have any further insights or suggestions, they would be greatly appreciated.

@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented Dec 14, 2023

@ahoogol If you use "require_field_match" : false as a highlighter option, you will get expected results without enabling offsets.

"highlight": {
    "require_field_match" : false,
    "pre_tags": "(", 
    "post_tags": ")", 
    "fields": {
      "*": {}
    },
    "type": "unified"
  }

Why it breaks is that internally we check that we the field we highlight on "text" is the same that the field that has matches "stem", but in this case there are different. That's the failure.

@mayya-sharipova
Copy link
Contributor

I will add this to documentation for span_field_masking query and will close this issue.

mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this issue Dec 15, 2023
Improvement includes:
1. Remove reference to Lucene queries (this information is not necessary
for Elastic users, and can be outdated)
2. For `span_field_masking` include a node to use
"require_field_match" : false parameter for highlighters to work.

Closes elastic#101804
@ahoogol
Copy link
Author

ahoogol commented Dec 16, 2023

@mayya-sharipova I included "require_field_match": false in the highlighter options, but the resulting output remains different from what I expected:

Your suggestion output: (i tested it in 8.10.0 and 8.11.3)

"highlight": {
  "text": [
    "a _ (a) (b)"
  ]
}

Expected output:

"highlight": {
  "text": [
    "(a) (_) a b"
  ]
}

@mayya-sharipova mayya-sharipova self-assigned this Dec 18, 2023
@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented Dec 19, 2023

@ahoogol Indeed you are right about the expected behaviour, but it is not supported on span_field_masking query. And it would be not easy to support it (without indexing with offsets).

The highlighting behaviour that you expect is based on Matches and was added from 8.10. But it relies on the fact that the highlighted field contains query terms, which is not your case.


I have added a documentation clarifying that span_field_masking query has unexpected highlighting behaviour and should be used with require_field_match = false.

I also modified the type of this issue as a "feature", that we may tackle sometime in the future.

mayya-sharipova added a commit that referenced this issue Dec 19, 2023
Improvement includes:
1. Remove reference to Lucene queries (this information is not necessary
for Elastic users, and can be outdated)
2. For `span_field_masking` include a node to use
"require_field_match" : false parameter for highlighters to work.

Closes #101804
@mayya-sharipova mayya-sharipova removed their assignment Dec 20, 2023
navarone-feekery pushed a commit to navarone-feekery/elasticsearch that referenced this issue Dec 22, 2023
Improvement includes:
1. Remove reference to Lucene queries (this information is not necessary
for Elastic users, and can be outdated)
2. For `span_field_masking` include a node to use
"require_field_match" : false parameter for highlighters to work.

Closes elastic#101804
@javanna javanna added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 12, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Relevance/Highlighting How a query matched a document Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants