Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Unified Highlighter does not care about fragment_size #28462

Closed
aslamy opened this issue Jan 31, 2018 · 1 comment
Closed

Bug: Unified Highlighter does not care about fragment_size #28462

aslamy opened this issue Jan 31, 2018 · 1 comment

Comments

@aslamy
Copy link

aslamy commented Jan 31, 2018

I have:

Elasticsearch 6.1.2

Mapping:

PUT my_index
{  
   "mappings":{  
      "my_type":{  
         "properties":{  
            "title":{  
               "term_vector":"with_positions_offsets",
               "type":"text"
            }
         }
      }
   }
}
PUT my_index/my_type/1
{  
   "title":"Hello. This is a test."
}

Search:

{  
   "query":{  
      "match":{  
         "title":"Hello"
      }
   },
   "highlight":{  

      "fields":{  
         "title":{  
            "fragment_size":1000,
            "number_of_fragments":1,
	    "type": "unified"
         }
      }
   }
}

Result:

{  
   "took":0,
   "timed_out":false,
   "_shards":{  
      "total":5,
      "successful":5,
      "skipped":0,
      "failed":0
   },
   "hits":{  
      "total":1,
      "max_score":0.2876821,
      "hits":[  
         {  
            "_index":"my_index",
            "_type":"my_type",
            "_id":"1",
            "_score":0.2876821,
            "_source":{  
               "title":"Hello. This is a test."
            },
            "highlight":{  
               "title":[  
                  "<em>Hello</em>."
               ]
            }
         }
      ]
   }
}

Problem:
It seems that unified highlighter does not care about fragment size. When fragment_size is 1000 it should not split but it does split on first point (dot). I know if fragment_size = 0 then it not split, but that's not what I want.
I have fields with many sentence I want to split around 256 characters, but highlighter splitting always on first point hit.

@jimczi
Copy link
Contributor

jimczi commented Jan 31, 2018

This is expected. In this version the unified highlighter splits the text on sentence boundaries and creates one fragment per sentence or multiple fragments if the sentence is bigger than fragment_size.
Though we changed the behavior in 6.2 where it will be able to select all sentences that fit into fragment_size:
#28132

@jimczi jimczi closed this as completed Jan 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants