Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using multiple highlight_query-es within multiple fields only uses the first highlight_query for all fields with Fast Vector Highlighter #25904

Closed
kori0129 opened this issue Jul 26, 2017 · 1 comment

Comments

@kori0129
Copy link

kori0129 commented Jul 26, 2017

Elasticsearch version: 5.2.0

Plugins installed: []

JVM version (java -version): 1.8.0_131

OS version (uname -a if on a Unix-like system):
inux 3.10.0-514.6.1.el7.x86_64 #1 SMP Wed Jan 18 13:06:36 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
Action: Attempt to highlight multiple fields separately, each having its own highlight_query definition, using Fast Vector Highlighter.
Expected Behaviour: Each higlight_query is used and only used for the field it's specified for.
Actual Behaviour: Only the first field's highlight_query is used, and it's used for all fields.

Steps to reproduce:

  1. Create index with two fields and "term_vector" : "with_positions_offsets", to use Fast Vector Highlighter
PUT highlighttest
{
    "settings" : {
        "index" : {
            "number_of_shards" : 3, 
            "number_of_replicas" : 2 
            
        }
    },
    "mappings": {
    "doc" : {
      "properties": {
        "title" : {
          "type": "text",
          "term_vector" : "with_positions_offsets"
        },
        "body" : {
          "type": "text",
          "term_vector" : "with_positions_offsets"
        }
      }
    }
  }
}
  1. Add a document that shares content between fields
PUT highlighttest/doc/1
{
    "title": "I love cake",
    "body": "I love cake because it's amazing"
}
  1. Query both fields with a term that matches both fields, with a highlight definition that uses both fields, each having its own highlight_query on the field it highlights
GET highlighttest/doc/_search
{
    "query": {
        "bool": {
            "should" :[
                {
                    "match": {
                        "title":{
                            "query":"cake"
                        }
                    }
                },
                {
                    "match": {
                        "body":{
                            "query":"cake"
                        }
                    }
                }
            ]
        }
    },
    "highlight": {
        "fields": {
            "title":{
                "highlight_query": {
                    "match": {
                        "title":{
                            "query":"cake"
                        }
                    }
                }
            },
            "body":{
                "highlight_query": {
                    "match": {
                        "body":{
                            "query":"cake"
                        }
                    }
                }
            }
        }
    }
}
  1. The query will return highlights, but will attempt to use the first highlight_query for all fields, which returns nothing for the second field, since it does not match.
{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 3,
      "successful": 3,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.52058303,
      "hits": [
         {
            "_index": "highlighttest",
            "_type": "doc",
            "_id": "1",
            "_score": 0.52058303,
            "_source": {
               "title": "I love cake",
               "body": "I love cake because it's amazing"
            },
            "highlight": {
               "title": [
                  "I love <em>cake</em>"
               ]
            }
         }
      ]
   }
}
  1. This becomes even more obvious, if you "mix up" the highlight_query-es and use the wrong fields in the "match" expression:
"highlight": {
        "fields": {
            "title":{
                "highlight_query": {
                    "match": {
                        "body":{
                            "query":"cake"
                        }
                    }
                }
            },
            "body":{
                "highlight_query": {
                    "match": {
                        "title":{
                            "query":"cake"
                        }
                    }
                }
            }
        }
    }

In which case the result will be the following, clearly showing that only the first highlight_query is used (with match on body), ignoring the other highlight_query definitions.

{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 3,
      "successful": 3,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.52058303,
      "hits": [
         {
            "_index": "highlighttest",
            "_type": "doc",
            "_id": "1",
            "_score": 0.52058303,
            "_source": {
               "title": "I love cake",
               "body": "I love cake because it's amazing"
            },
            "highlight": {
               "body": [
                  "I love <em>cake</em> because it's amazing"
               ]
            }
         }
      ]
   }
}

Please note that this is a bare-down fabricated example. In our company we have a valid use case for this with complex highlight_query-es for each field and we need each highlight_query to work and only work for the field it's defined for.

@kori0129 kori0129 changed the title Using highlight_query per field only uses the first highlight_query for all fields with Fast Vector Highlighter Using highlight_query for multiple fields only uses the first highlight_query for all fields with Fast Vector Highlighter Jul 26, 2017
@kori0129 kori0129 changed the title Using highlight_query for multiple fields only uses the first highlight_query for all fields with Fast Vector Highlighter Using multiple highlight_query-es within multiple fields only uses the first highlight_query for all fields with Fast Vector Highlighter Jul 26, 2017
@jimczi
Copy link
Contributor

jimczi commented Jul 26, 2017

This bug is fixed by #25197 and will be available in the next minor release for 5.x (5.6).
Thanks for reporting @kori0129

@jimczi jimczi closed this as completed Jul 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants