Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rescorer does wrong reorder for tanked hits #75363

Open
rudibatt opened this issue Jul 15, 2021 · 6 comments
Open

Rescorer does wrong reorder for tanked hits #75363

rudibatt opened this issue Jul 15, 2021 · 6 comments
Labels
>docs General docs changes :Search Relevance/Ranking Scoring, rescoring, rank evaluation. Team:Docs Meta label for docs team Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@rudibatt
Copy link

Elasticsearch version: 7.13.2
Plugins installed: [elasticsearch-learning-to-rank]
JVM version: AdoptOpenJDK (build 16+36)
OS version: Linux 82865c3b5df8 5.8.0-59-generic #66~20.04.1-Ubuntu SMP Thu Jun 17 11:14:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
(actually that's the official docker image "elasticsearch:7.13.2")

Description:
When a rescorer tanks the scores of the documents within the window_size, requests that fetch the results beyond the window_size will always get the same documents.

Cause:
The QueryRescorer only reorders the top hits (from 0 to "from+size") see org.elasticsearch.search.rescore.QueryRescorer.combine(TopDocs, TopDocs, QueryRescoreContext)
However if the rescorer causes worse scores for the first N, they only get reordered within that top-hits frame.

Steps to reproduce:

PUT my_index/
{
  "mappings": {
    "dynamic_templates": [
      {
        "full_text": {
             "mapping": {
                "type" : "text",
                "analyzer": "whitespace"
             }
        }
      }
    ]
  }
}

POST /_bulk
{"index": {"_index":"my_index", "_id":"1"} }
{ "full_text" : "quick red fox"}
{"index": {"_index":"my_index", "_id":"2"} }
{ "full_text" : "quick green fox"}
{"index": {"_index":"my_index", "_id":"3"} }
{ "full_text" : "quick blue fox"}
{"index": {"_index":"my_index", "_id":"4"} }
{ "full_text" : "lazzy red dog"}
{"index": {"_index":"my_index", "_id":"5"} }
{ "full_text" : "lazzy green dog"}
{"index": {"_index":"my_index", "_id":"6"} }
{ "full_text" : "lazzy blue dog"}
{"index": {"_index":"my_index", "_id":"7"} }
{ "full_text" : "quick red dog"}
{"index": {"_index":"my_index", "_id":"8"} }
{ "full_text" : "quick green dog"}
{"index": {"_index":"my_index", "_id":"9"} }
{ "full_text" : "quick blue dog"}

GET my_index/_search
{
  "from": 4,
  "size": 2, 
  "query": {
    "match": {
      "full_text": "green fox jumps over the blue dog"
    }
  },
  "rescore": {
    "query": {
      "rescore_query": {
          "constant_score": {
            "filter": {
              "term": { "full_text": "quick" }
            },
            "boost": 0.1
          }
        },
        "score_mode": "multiply"
    },
    "window_size": 4
  }
}

For from >= 2 the two results are always the same!

Expected Result
Either the whole result is reordered or the reordering only takes place within the window_size. (Must be defined)

Related issue: o19s/elasticsearch-learning-to-rank#369

@rudibatt rudibatt added >bug needs:triage Requires assignment of a team area label labels Jul 15, 2021
@jtibshirani jtibshirani added :Search Relevance/Ranking Scoring, rescoring, rank evaluation. and removed >bug needs:triage Requires assignment of a team area label labels Jul 16, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jul 16, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@dnhatn dnhatn self-assigned this Jul 20, 2021
@dnhatn
Copy link
Member

dnhatn commented Jul 21, 2021

Thanks for reporting the issue.

I can reproduce it. The problem is that the requesting documents (i.e., from + size) exceed the window size. Here rescoring reduces the scores of two docs (containing quick) in the top4. And these docs are moved to the bottom after rescored. That explains why these two docs are always returned when from >= 2.

I think we should reject such a request. I opened #75556.

@rudibatt
Copy link
Author

I suggest to only resort the documents within the window size. Then the scores would not be continuous, but the result order would be, even for pages beyond window_size.

@javanna javanna added the >docs General docs changes label May 3, 2023
@javanna javanna assigned abdonpijpelink and unassigned dnhatn May 3, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@elasticsearchmachine elasticsearchmachine added the Team:Docs Meta label for docs team label May 3, 2023
@djstrong
Copy link

I agree with @rudibatt

@abdonpijpelink abdonpijpelink removed their assignment Jan 26, 2024
@javanna javanna added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 12, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs General docs changes :Search Relevance/Ranking Scoring, rescoring, rank evaluation. Team:Docs Meta label for docs team Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants