Correct boost in script_score query and error on negative scores #52478

mayya-sharipova · 2020-02-18T14:27:51Z

Before boost in script_score query was wrongly applied only to the subquery.
This commit makes sure that the boost is applied to the whole score
that comes out of script.

Also provide error 400x error message on negative scores in script_score.

Closes #48465

Before boost in script_score query was wrongly applied only to the subquery. This commit makes sure that the boost is applied to the whole score that comes out of script. Closes elastic#48465

elasticmachine · 2020-02-18T14:28:03Z

Pinging @elastic/es-search (:Search/Search)

matriv

Looks good, I'm completely inexperienced in the area but I left some comments, especially the one regarding explanation.

matriv · 2020-02-20T16:14:58Z

docs/reference/query-dsl/script-score-query.asciidoc

@@ -48,9 +48,12 @@ scores be positive or `0`.
 --

 `min_score`::
-(Optional, float) Documents with a <<relevance-scores,relevance score>> lower
+(Optional, float) Documents with a score lower


why did you remove the reference to the score docs?

Before we were referencing relevance scores, while I think the goal of script_score query is to calculate custom scores through some scripts, not a traditional textual relevance score.

matriv · 2020-02-20T16:18:45Z

server/src/main/java/org/elasticsearch/common/lucene/search/function/ScriptScoreQuery.java

@@ -143,7 +148,12 @@ public Explanation explain(LeafReaderContext context, int doc) throws IOExceptio
                        explanation = Explanation.match(score, desc);
                    }
                }
-
+                if (boost != 1.0f) {


This seems to be executed also when explanation == null? Or I'm missing something?
Maybe it worths checking for a test that when the boost is != 1 and explanation is false there is no explanation returned regarding the boost?

Also, why not return an explanation if the boost is 1, since it's asked by the user?

The way we handle explanation object is different from other queries, which may be confusing. In script_score query a user can provide his/her custom explanation.

This seems to be executed also when explanation == null?

Not exactly. We start with explanation that is user provided explanation for the script, and if it null, it is substituted by our standard explanation (line 143), so on in line 151 explanation is never null.

Maybe it worths checking for a test that when the boost is != 1 and explanation is false there is no explanation returned regarding the boost?

When is explanation is not asked (explain = false in a search request), we will not even go to this method. This method is executed only when a user requests an explanation.

Also, why not return an explanation if the boost is 1, since it's asked by the user?

I was thinking since boost is an optional parameter, when it is not provided by a user, there is no need to provide an explanation about it.

matriv · 2020-02-20T16:19:43Z

modules/lang-painless/src/test/resources/rest-api-spec/test/painless/110_script_score_boost.yml

+              query: {match_all: {boost: 10}}
+              script:
+                source: "doc['i'].value * _score"
+              boost: 10


Maybe use a different score here, e.g.: 5 to make it more clear.

mayya-sharipova · 2020-02-20T21:13:20Z

@matriv Thanks for the review. I have tried to address your comments, please continue to review when you have time.

matriv

LGTM. Thanks a lot for responding to my questions and providing some more context!

jpountz

I left a minor suggestion, otherwise LGTM.

jpountz · 2020-02-21T09:16:56Z

server/src/main/java/org/elasticsearch/common/lucene/search/function/ScriptScoreQuery.java

            this.explanation = explanation;
        }

        @Override
        public float score() throws IOException {
            int docId = docID();
            scoreScript.setDocument(docId);
-            float score = (float) scoreScript.execute(explanation);
+            float score = (float) scoreScript.execute(explanation) * boost;
            if (score == Float.NEGATIVE_INFINITY || Float.isNaN(score)) {
                throw new ElasticsearchException(
                    "script_score query returned an invalid score [" + score + "] for doc [" + docId + "].");


I think it would be better if this error message returned the value produced by the script without the boost.

jpountz · 2020-02-21T09:24:29Z

server/src/main/java/org/elasticsearch/common/lucene/search/function/ScriptScoreQuery.java

+                    subs.addAll(Arrays.asList(explanation.getDetails()));
+                    subs.add(Explanation.match(boost, "boost"));
+                    explanation = Explanation.match(explanation.getValue(), explanation.getDescription(), subs);
+                }


I'd suggest wrapping the explanation instead of modifying it in-place: create the scorer with boost=1f a couple lines above, and then here:

if (boost != 1f) { explanation = Explanation.match(boost * explanation.getValue().floatValue(), "Boosted score, product of:", Explanation.match(boost, "boost"), explanation); }

mayya-sharipova · 2020-02-24T13:54:24Z

@jpountz Thanks for the feedback, I have addressed it.
In the last commit , I have also corrected the exception to make sure a 400 error is returned when a score is negative.

Before boost in script_score query was wrongly applied only to the subquery. This commit makes sure that the boost is applied to the whole score that comes out of script. Closes elastic#48465

Before boost in script_score query was wrongly applied only to the subquery. This commit makes sure that the boost is applied to the whole score that comes out of script. Closes #48465

jtibshirani · 2020-03-03T18:14:49Z

server/src/main/java/org/elasticsearch/common/lucene/search/function/ScriptScoreQuery.java

-            if (score == Float.NEGATIVE_INFINITY || Float.isNaN(score)) {
-                throw new ElasticsearchException(
-                    "script_score query returned an invalid score [" + score + "] for doc [" + docId + "].");
+            if (score < 0f || Float.isNaN(score)) {


@mayya-sharipova it looks like this PR also fixed a bug in script_score queries where we allowed negative scores. I think we should add a note to the breaking changes docs and also update the PR description to make it clear we included this change.

@jtibshirani Thanks. Even before without this change a user would get an error if their script_score query produced a negative score. They would just get it from a different place, one of them from the Lucene here

So the only thing changed from a user perspective is an error message and error status code (before was 500, not 400x). Do you think it warrants a breaking change notice?
+1 for include this in the PR description

I tried this out using an Elasticsearch 7.6 build, and didn't receive an error:

PUT my_index/_doc/1?refresh { "field": "value" } GET my_index/_search { "query": { "script_score": { "query": { "match_all": {} }, "script": { "source": "-1000" } } } }

The line you linked to is an assert, so perhaps these Lucene checks didn't always catch the issue in non-test environments.

@jtibshirani Thanks for uncovering this. I understood what happened:

Before 7.5, script_score query was using ScriptScoreFunction that was returning 400 error with a negative score.

From 7.5, we have changed it to not use ScriptScoreFunction but forgot to add a condition for a negative score. But TopScoreDocCollector assertion is tripped, causing fatal error in the dev mode. But I guess we silence these assertions in a production mode as we don't see any visible errors or error log messages.

So, negative scores were wrongly allowed only in 7.5-7.6 versions, so to me it doesn't look like a really breaking change. But I think it is still worth to add a note with explanation in release notes. I will do that. WDYT?

Adding an explanation to the release notes makes sense to me. I agree it shouldn't be presented as a typical 'breaking change', it is more like a regression that we fixed. Perhaps we could add a unit test along with the release notes update, to prevent a future regression?

7.5 and 7.6 had a regression that allowed for script_score queries to have negative scores. We have corrected this regression in elastic#52478. This is an addition to elastic#52478 that adds a test and release notes.

7.5 and 7.6 had a regression that allowed for script_score queries to have negative scores. We have corrected this regression in #52478. This is an addition to #52478 that adds a test and release notes.

7.5 and 7.6 had a regression that allowed for script_score queries to have negative scores. We have corrected this regression in #52478. This is an addition to #52478 that adds a test for this. Related to #53133

Correct boost calculation in script_score query

7c9f5f8

Before boost in script_score query was wrongly applied only to the subquery. This commit makes sure that the boost is applied to the whole score that comes out of script. Closes elastic#48465

mayya-sharipova added the :Search/Search Search-related issues that do not fall into other categories label Feb 18, 2020

mayya-sharipova added >bug v7.7.0 v8.0.0 labels Feb 18, 2020

matriv reviewed Feb 20, 2020

View reviewed changes

Address feedback

1d8313e

matriv approved these changes Feb 20, 2020

View reviewed changes

jpountz approved these changes Feb 21, 2020

View reviewed changes

Address Feedback

96db563

mayya-sharipova added 2 commits February 24, 2020 09:04

Remove unused imports

c16c444

Merge remote-tracking branch 'upstream/master' into script_score_boost

8882f39

mayya-sharipova merged commit 556ee9a into elastic:master Feb 24, 2020

mayya-sharipova deleted the script_score_boost branch February 24, 2020 15:46

jtibshirani reviewed Mar 3, 2020

View reviewed changes

mayya-sharipova changed the title ~~Correct boost calculation in script_score query~~ Correct boost in script_score query and error on negative scores Mar 3, 2020

mayya-sharipova added >breaking and removed >breaking labels Mar 4, 2020

mayya-sharipova mentioned this pull request Mar 4, 2020

script_score query errors on negative scores #53133

Merged

This was referenced Apr 1, 2020

7.7.0 meta ticket elastic/elasticsearch-net#4525

Closed

7.7.0 meta ticket (Part 3) elastic/elasticsearch-net#4534

Closed

jakelandis removed the v8.0.0 label Jul 26, 2021

jakelandis added the v8.0.0-alpha1 label Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct boost in script_score query and error on negative scores #52478

Correct boost in script_score query and error on negative scores #52478

mayya-sharipova commented Feb 18, 2020 •

edited

Loading

elasticmachine commented Feb 18, 2020

matriv left a comment

matriv Feb 20, 2020

mayya-sharipova Feb 20, 2020

matriv Feb 20, 2020 •

edited

Loading

mayya-sharipova Feb 20, 2020

matriv Feb 20, 2020

mayya-sharipova commented Feb 20, 2020

matriv left a comment

jpountz left a comment

jpountz Feb 21, 2020

jpountz Feb 21, 2020

mayya-sharipova commented Feb 24, 2020

jtibshirani Mar 3, 2020 •

edited

Loading

mayya-sharipova Mar 3, 2020 •

edited

Loading

jtibshirani Mar 3, 2020

mayya-sharipova Mar 4, 2020 •

edited

Loading

jtibshirani Mar 4, 2020

Correct boost in script_score query and error on negative scores #52478

Correct boost in script_score query and error on negative scores #52478

Conversation

mayya-sharipova commented Feb 18, 2020 • edited Loading

elasticmachine commented Feb 18, 2020

matriv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matriv Feb 20, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mayya-sharipova commented Feb 20, 2020

matriv left a comment

Choose a reason for hiding this comment

jpountz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mayya-sharipova commented Feb 24, 2020

jtibshirani Mar 3, 2020 • edited Loading

Choose a reason for hiding this comment

mayya-sharipova Mar 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mayya-sharipova Mar 4, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mayya-sharipova commented Feb 18, 2020 •

edited

Loading

matriv Feb 20, 2020 •

edited

Loading

jtibshirani Mar 3, 2020 •

edited

Loading

mayya-sharipova Mar 3, 2020 •

edited

Loading

mayya-sharipova Mar 4, 2020 •

edited

Loading