Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with returning incomplete fragment for plain highlighter. #110707

Merged
merged 2 commits into from
Jul 11, 2024

Conversation

ioanatia
Copy link
Contributor

@ioanatia ioanatia commented Jul 10, 2024

This is a bug that was noticed in the context of App Search which uses the plain highlighter.
It seems that when the text field value is ending in special characters and no_match_size is equal or greater than the size of the text, we are not returning the full text.
This is because in findGoodEndForNoHighlightExcerpt we are consuming the token stream and we just return the position of where the last token ends, thus ignoring any special characters the original text might end with that are not part of a token.

Steps to replicate:


POST my-index/_doc/1?refresh=true
{
  "my_text_field": ")hey)"
}

GET my-index/_search
{
  "query": {
    "bool": {
      "must": {
        "match_all": {}
      }
    }
  },
  "highlight": {
    "type": "plain",
    "fields": {
      "my_text_field": {
        "fragment_size": 100,
        "no_match_size": 100
      }
    },
    "highlight_query": {
      "match_all": {}
    }
  }
}

response - the highlight is )hey instead of )hey):

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "my-index",
        "_id": "1",
        "_score": 1,
        "_source": {
          "my_text_field": ")hey)"
        },
        "highlight": {
          "my_text_field": [
            ")hey"
          ]
        }
      }
    ]
  }
}

@ioanatia ioanatia added :Search Relevance/Highlighting How a query matched a document Team:Search Meta label for search team auto-backport Automatically create backport pull requests when merged auto-backport-and-merge v7.17.23 v8.15.0 and removed auto-backport Automatically create backport pull requests when merged labels Jul 10, 2024
@ioanatia ioanatia marked this pull request as ready for review July 10, 2024 13:11
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@ioanatia ioanatia added the >bug label Jul 10, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @ioanatia, I've created a changelog YAML for you.

@@ -218,6 +218,9 @@ private static int findGoodEndForNoHighlightExcerpt(int noMatchSize, Analyzer an
// Can't split on term boundaries without offsets
return -1;
}
if (contents.length() <= noMatchSize) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this to like 216, even before analyzing contents string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was asking myself the same, but I think the behaviour would not be consistent which could cause confusion - when no offsets are provided we would some return highlights for some document fields, but not for others and the reason why that happens might not be apparent.
If that's not actually a problem I am okay to change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about it, you are right. It is better to have the consistent behaviour. Let's keep your changes as they are.

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ioanatia Thanks for fixing, +1 to merge after addressing my small. comment

@ioanatia ioanatia merged commit 0f6c01a into elastic:main Jul 11, 2024
15 checks passed
@ioanatia ioanatia deleted the plain_highlighter_incomplete_fragment branch July 11, 2024 13:36
ioanatia added a commit to ioanatia/elasticsearch that referenced this pull request Jul 11, 2024
…lastic#110707)

* Fix issue with noMatchSize for plain highlighter

* Update docs/changelog/110707.yaml
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.15
7.17 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 110707

elasticsearchmachine pushed a commit that referenced this pull request Jul 11, 2024
…110707) (#110756)

* Fix issue with noMatchSize for plain highlighter

* Update docs/changelog/110707.yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants