Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FVH Highlighter does not work with the inner hits properly #65533

Closed
babayotakun opened this issue Nov 26, 2020 · 2 comments · Fixed by #65641
Closed

FVH Highlighter does not work with the inner hits properly #65533

babayotakun opened this issue Nov 26, 2020 · 2 comments · Fixed by #65641
Assignees
Labels
>bug :Search Relevance/Highlighting How a query matched a document Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@babayotakun
Copy link

Elasticsearch version: 7.10.0
Running in docker from image docker.elastic.co/elasticsearch/elasticsearch:7.10.0

Fvh highligher does not highlight actual content of the inner hit. It seems to alway returning highlight for the first relevant inner hit but with the highlight offsets of the current inner hit. Other highlighters ("plain" and "unified") are working correctly.

Steps:

  1. Create index with the nested field:
curl -H 'Content-type: application/json' -XPUT 'http://localhost:9200/test' -d '{
 "mappings": {
  "properties": {
   "obj": {
    "type": "nested",
    "properties": {
     "content": {
      "type": "text",
      "norms": true,
      "term_vector": "with_positions_offsets"
     }
    }
   }
  }
 }
}'
  1. Put a document:
curl -H 'Content-type: application/json' -XPOST 'http://localhost:9200/test/_doc/' -d '{
 "obj": [
  {
   "content": "hello world"
  },
  {
   "content": "world hello"
  }
 ]
}'
  1. Run a nested search query with the highlight:
curl -H 'Content-type: application/json' -XPOST 'http://localhost:9200/test/_search' -d '{
 "query": {
  "bool": {
   "should": [
    {
     "nested": {
      "query": {
       "match": {
        "obj.content": {
         "query": "hello",
         "operator": "OR",
         "prefix_length": 0,
         "max_expansions": 50,
         "minimum_should_match": "1<-1%",
         "fuzzy_transpositions": true,
         "lenient": true,
         "zero_terms_query": "NONE",
         "auto_generate_synonyms_phrase_query": true,
         "boost": 1
        }
       }
      },
      "path": "obj",
      "ignore_unmapped": false,
      "score_mode": "none",
      "boost": 1,
      "inner_hits": {
       "name": "obj|116088734",
       "ignore_unmapped": false,
       "from": 0,
       "size": 10,
       "version": false,
       "seq_no_primary_term": false,
       "explain": false,
       "track_scores": false,
       "highlight": {
        "fields": {
         "obj.content": {
          "fragment_size": 350,
          "number_of_fragments": 0,
          "type": "fvh",
          "order": "score",
          "no_match_size": 0,
          "matched_fields": [
           "obj.content"
          ]
         }
        }
       }
      }
     }
    }
   ],
   "adjust_pure_negative": true,
   "boost": 1
  }
 },
 "_source": false
}'

Result:

{ - 
  "took": 5,
  "timed_out": false,
  "_shards": { - 
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": { - 
    "total": { - 
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0,
    "hits": [ - 
      { - 
        "_index": "test",
        "_type": "_doc",
        "_id": "h4HGA3YBAHjK9WUfKSvs",
        "_score": 0,
        "inner_hits": { - 
          "obj|116088734": { - 
            "hits": { - 
              "total": { - 
                "value": 2,
                "relation": "eq"
              },
              "max_score": 0.18232156,
              "hits": [ - 
                { - 
                  "_index": "test",
                  "_type": "_doc",
                  "_id": "h4HGA3YBAHjK9WUfKSvs",
                  "_nested": { - 
                    "field": "obj",
                    "offset": 0
                  },
                  "_score": 0.18232156,
                  "_source": { - 
                    "content": "hello world"
                  },
                  "highlight": { - 
                    "obj.content": [ - 
                      "<em>hello</em> world"
                    ]
                  }
                },
                { - 
                  "_index": "test",
                  "_type": "_doc",
                  "_id": "h4HGA3YBAHjK9WUfKSvs",
                  "_nested": { - 
                    "field": "obj",
                    "offset": 1
                  },
                  "_score": 0.18232156,
                  "_source": { - 
                    "content": "world hello"
                  },
                  "highlight": { - 
                    "obj.content": [ - 
                      "hello <em>world</em>"
                    ]
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

Highlights are returned with the correct offsets and indents but with the wrong contents.
Should be world hello for the second inner hit.

@babayotakun babayotakun added >bug needs:triage Requires assignment of a team area label labels Nov 26, 2020
@henningandersen henningandersen added the :Search Relevance/Highlighting How a query matched a document label Nov 30, 2020
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Nov 30, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@jtibshirani jtibshirani self-assigned this Nov 30, 2020
@jtibshirani jtibshirani removed the needs:triage Requires assignment of a team area label label Nov 30, 2020
jtibshirani added a commit that referenced this issue Dec 9, 2020
This PR fixes a regression where fvh fragments could be loaded from the wrong
document _source.

Some `FragmentsBuilder` implementations contain a `SourceLookup` to load from
_source. The lookup should be positioned to load from the current hit document.
However, since `FragmentsBuilder` are cached and shared across hits, the lookup
is never updated to load from the new documents. This means we accidentally
load _source from a different document.

The regression was introduced in #60179, which started storing `SourceLookup`
on `FragmentsBuilder`.

Fixes #65533.
@jtibshirani
Copy link
Contributor

Thanks a lot @babayotakun for raising this. A fix will be available in the upcoming releases 7.10.2 and 7.11.

jtibshirani added a commit to jtibshirani/elasticsearch that referenced this issue Dec 9, 2020
…5641)

This PR fixes a regression where fvh fragments could be loaded from the wrong
document _source.

Some `FragmentsBuilder` implementations contain a `SourceLookup` to load from
_source. The lookup should be positioned to load from the current hit document.
However, since `FragmentsBuilder` are cached and shared across hits, the lookup
is never updated to load from the new documents. This means we accidentally
load _source from a different document.

The regression was introduced in elastic#60179, which started storing `SourceLookup`
on `FragmentsBuilder`.

Fixes elastic#65533.
jtibshirani added a commit that referenced this issue Dec 10, 2020
This PR fixes a regression where fvh fragments could be loaded from the wrong
document _source.

Some `FragmentsBuilder` implementations contain a `SourceLookup` to load from
_source. The lookup should be positioned to load from the current hit document.
However, since `FragmentsBuilder` are cached and shared across hits, the lookup
is never updated to load from the new documents. This means we accidentally
load _source from a different document.

The regression was introduced in #60179, which started storing `SourceLookup`
on `FragmentsBuilder`.

Fixes #65533.
jtibshirani added a commit to jtibshirani/elasticsearch that referenced this issue Dec 10, 2020
…5641)

This PR fixes a regression where fvh fragments could be loaded from the wrong
document _source.

Some `FragmentsBuilder` implementations contain a `SourceLookup` to load from
_source. The lookup should be positioned to load from the current hit document.
However, since `FragmentsBuilder` are cached and shared across hits, the lookup
is never updated to load from the new documents. This means we accidentally
load _source from a different document.

The regression was introduced in elastic#60179, which started storing `SourceLookup`
on `FragmentsBuilder`.

Fixes elastic#65533.
jtibshirani added a commit that referenced this issue Dec 10, 2020
This PR fixes a regression where fvh fragments could be loaded from the wrong
document _source.

Some `FragmentsBuilder` implementations contain a `SourceLookup` to load from
_source. The lookup should be positioned to load from the current hit document.
However, since `FragmentsBuilder` are cached and shared across hits, the lookup
is never updated to load from the new documents. This means we accidentally
load _source from a different document.

The regression was introduced in #60179, which started storing `SourceLookup`
on `FragmentsBuilder`.

Fixes #65533.
@javanna javanna added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search Relevance/Highlighting How a query matched a document Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants