Strict Type Checking on Import (or strict type usage for scripts) #30211

OlivierPetri · 2018-04-27T16:53:35Z

Describe the feature:

Currently ES will check the format of input data against the field type, but it does not stick to the specified type when supplying field content to scripts (see example below).

This leads to confusing results and hard to track bugs in client applications.

Desired feature:
Implement different level of type checking:
STRICT : Do not accept "123" for numeric input, only 123.
BEST_EFFORT: Convert "123" to 123 (desired default behaviour).
LEGACY: Current behaviour.

Alternatively controle (and cast!) document fields when handed to scripts (see below).

Example for current behaviour.

Given the following mapping:

PUT sumtest
{
	"mappings" : {
	"doc" : {
	"properties" : {
		"NRA" : { "type" : "integer" },
		"NRB" : { "type" : "integer" }
	}
	}
	}
}

and the following input

PUT sumtest/doc/1
{
"NRA" : 12,
"NRB" : 34
}

PUT sumtest/doc/2
{
"NRA" : "12",
"NRB" : "34"
}

The script expression
doc['NRA'] + doc['NRB']
will return 46 for item 1 and "1234" for item 2. The second result ist quite surprising for unsuspecting users.

The expression
ctx._source.NRA + ctx._source.NRB
yields the same result.

The text was updated successfully, but these errors were encountered:

talevy · 2018-04-27T18:29:05Z

@OlivierPetri, which version of Elasticsearch are you running? Knowing the script context
you are running against would also be helpful.

In Elasticsearch 6.2.2, I see the expected behavior. Assuming this is for search-response-time scripting, I've re-created your example with what I see. Let me know if this does not align with your use-case.

PUT sumtest
{
	"mappings" : {
  	"doc" : {
  	"properties" : {
  		"NRA" : { "type" : "integer" },
  		"NRB" : { "type" : "integer" }
  	}
  	}
	}
}

PUT sumtest/doc/1
{
"NRA" : 12,
"NRB" : 34
}

PUT sumtest/doc/2
{
"NRA" : "12",
"NRB" : "34"
}


GET sumtest/_search
{
    "query" : {
        "match_all": {}
    },
    "script_fields": {
      "sum": {
        "script": """doc["NRA"].value + doc["NRB"].value"""
      }
    }
}

returns:

...
    "hits": [
      {
        "_index": "sumtest",
        "_type": "doc",
        "_id": "2",
        "_score": 1,
        "fields": {
          "sum": [
            46
          ]
        }
      },
      {
        "_index": "sumtest",
        "_type": "doc",
        "_id": "1",
        "_score": 1,
        "fields": {
          "sum": [
            46
          ]
        }
      }
    ]
...

Two things to note about the above example that differs from the two other expressions that were shared.

the doc object holds the value of the fields that are stored in the doc-values for the index. This means that these values are post-coercion and mapping evaluation. doc['NRA'] references a wrapper object around the actual value, so doc['NRA'].value is necessary to retrieve the long-value to add together.
ctx._source.NRA retrieves the original value in the source that was indexed. This is pre-coercion, meaning that this will stay a string for doc 2 and a long for doc 1. You can find some more explanation about this concept at the bottom of the documentation on this page: https://www.elastic.co/guide/en/elasticsearch/reference/6.2/search-request-script-fields.html

elasticmachine · 2018-04-27T18:29:24Z

Pinging @elastic/es-core-infra

talevy · 2018-04-27T18:34:56Z

Regarding your feature request for stricter validation, I forgot to mention that we do have this for numeric mapping types.

Please check out https://www.elastic.co/guide/en/elasticsearch/reference/6.2/coerce.html for examples

OlivierPetri · 2018-05-02T13:38:15Z

Hi there, and thanks for your kind reply.

I confirm that
doc['NRA'] + doc['NRB']
works and
ctx._source.NRA + ctx._source.NRB
does not.

However the doc notation cannot be used in update or update by query statements as stated here and here. This means, that you have to do it manualy
Integer.parseInt(ctx._source.CNTA) + Integer.parseInt(ctx._source.CNTB)
as suggested here, and I do not feel this is a viable solution in the long run.

I will rerun my tests and come back here, but from what I observed, coercion did not work by default.

I'll be back later today.

talevy · 2018-05-02T15:34:07Z

@OlivierPetri ah, I see. Since this is an update-script, it does not have access doc-values. I believe that the real issue here is that errors like this should come up at compilation-time, rather than runtime. I believe the explicit integer parsing is the only option at the moment

OlivierPetri · 2018-05-02T15:40:03Z

Allright and hello again. So from what I found, "coerce = false" just enforces what I named "STRICT" in my proposal, so an integer field has to be inserted as a json integer, and "coerce = true" does what I dubbed "LEGACY": It accepts a quoted number as an integer, but it will not cast it to an integer (which is what I'd understand as "coercion"), but retain the string value.

So with "coerce=true" and the input I gave above, I'll end up with:

            {
                "_index": "sumtest",
                "_type": "doc",
                "_id": "1",
                "_score": 1,
                "_source": {
                    "NRA": 12,
                    "NRB": 34,
                }
            },
            {
                "_index": "sumtest",
                "_type": "doc",
                "_id": "2",
                "_score": 1,
                "_source": {
                    "NRA": "12",
                    "NRB": "34",
                }
            }

With this,
ctx._source.NRA + ctx._source.NRB
will work for item #1 but not for #2. Or rather, #1 will yield 46 and #2 will yield "1234" in an update query.

And sorry for describing this behaviour for
doc['NRA'] + doc['NRB']
witch is wrong, as doc[] seemingly does coerce on the fly. But since it is not available in update queries, it's not a general solution to use the doc notation. I'm afraid I caused some confustion mentioning doc at all, and I appologize for posting on Friday afternoon...

Hey, thank you all for the excellent work and your lightening fast responsiveness!

talevy · 2018-05-02T17:13:03Z

you're welcome @OlivierPetri, thanks for bearing with the lack of compile-time checking around this! This will be fixed over time, as was mentioned in #30210 (comment).

talevy added the :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache label Apr 27, 2018

talevy self-assigned this Apr 27, 2018

colings86 added the >feature label May 1, 2018

talevy closed this as completed May 2, 2018

ChrisHegarty unassigned talevy Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strict Type Checking on Import (or strict type usage for scripts) #30211

Strict Type Checking on Import (or strict type usage for scripts) #30211

OlivierPetri commented Apr 27, 2018

talevy commented Apr 27, 2018

elasticmachine commented Apr 27, 2018

talevy commented Apr 27, 2018

OlivierPetri commented May 2, 2018 •

edited

Loading

talevy commented May 2, 2018 •

edited

Loading

OlivierPetri commented May 2, 2018

talevy commented May 2, 2018

Strict Type Checking on Import (or strict type usage for scripts) #30211

Strict Type Checking on Import (or strict type usage for scripts) #30211

Comments

OlivierPetri commented Apr 27, 2018

Example for current behaviour.

talevy commented Apr 27, 2018

elasticmachine commented Apr 27, 2018

talevy commented Apr 27, 2018

OlivierPetri commented May 2, 2018 • edited Loading

talevy commented May 2, 2018 • edited Loading

OlivierPetri commented May 2, 2018

talevy commented May 2, 2018

OlivierPetri commented May 2, 2018 •

edited

Loading

talevy commented May 2, 2018 •

edited

Loading