SQL: Queries with script filter result in a linear scan through all documents even if `LIMIT` is used #80523

Luegg · 2021-11-09T08:41:07Z

For example, the query SELECT station.name FROM \"weather-data-2016\" WHERE LENGTH(station.name) > 10 LIMIT 10 takes more than 60s on the NOAA benchmark dataset. The equivalent search request on the other hand only takes a few ms:

POST http://localhost:39200/weather-data-2016/_search
Content-Type: application/json

{
  "size": 10,
  "query": {
    "script": {
      "script": {
        "source": "InternalQlScriptUtils.nullSafeFilter(InternalQlScriptUtils.gt(InternalSqlScriptUtils.length(InternalQlScriptUtils.docValue(doc,params.v0)),params.v1))",
        "lang": "painless",
        "params": {
          "v0": "station.name",
          "v1": 5
        }
      },
      "boost": 1.0
    }
  },
  "_source": false,
  "fields": [
    {
      "field": "station.name"
    }
  ],
  "sort": [
    {
      "_doc": {
        "order": "asc"
      }
    }
  ]
}

Note, the search request does not open a scroll context but SQL does. Opening the scroll context causes ES to count all matching documents which requires to evaluate the script on every doc.

This issue could be addressed by using PIT instead of scroll contexts (see #61873).

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-11-09T08:41:09Z

Pinging @elastic/es-ql (Team:QL)

bpintea · 2021-11-10T10:02:54Z

Opening the scroll context causes ES to count all matching documents which requires to evaluate the script on every doc.

Curious how is the result evaluated when not opening a scroll context (or with PIT), if not going through every doc?
Is the size considered in one case (scroll/PIT), but not the other?

Luegg · 2021-12-06T16:18:32Z

Sorry, only saw your question today @bpintea. The script filter only needs to be applied until you have 10 matches. If the script is not too selective, you might only have to go through a few docs.

It seems to be a peculiarity of the scroll functionality that it includes a match count. Maybe that's not even necessary and a bug. PIT does not have this issue.

Resolves #61873 The goal of this PR is to remove the use of the deprecated scroll cursors in SQL. Functionality and APIs should remain the same with one notable difference: The last page of a search hit query used to always include a scroll cursor if it is non-empty. This is no longer the case, if a result set is exhausted, the PIT will be closed and the last page does not include a cursor. Note, PIT can also be used for aggregation and PIVOT queries but this is not in the scope of this PR and will be implemented in a follow up. Additionally, this PR resolves #80523 because the total doc count is no longer required.

Luegg added >bug :Analytics/SQL SQL querying labels Nov 9, 2021

elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label Nov 9, 2021

Luegg linked a pull request Feb 8, 2022 that will close this issue

SQL: Replace scroll cursors with point-in-time and search_after #83381

Merged

Luegg mentioned this issue Feb 8, 2022

SQL: Replace scroll cursors with point-in-time and search_after #83381

Merged

elasticsearchmachine closed this as completed in #83381 Feb 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL: Queries with script filter result in a linear scan through all documents even if `LIMIT` is used #80523

SQL: Queries with script filter result in a linear scan through all documents even if `LIMIT` is used #80523

Luegg commented Nov 9, 2021

elasticmachine commented Nov 9, 2021

bpintea commented Nov 10, 2021

Luegg commented Dec 6, 2021

SQL: Queries with script filter result in a linear scan through all documents even if LIMIT is used #80523

SQL: Queries with script filter result in a linear scan through all documents even if LIMIT is used #80523

Comments

Luegg commented Nov 9, 2021

elasticmachine commented Nov 9, 2021

bpintea commented Nov 10, 2021

Luegg commented Dec 6, 2021

SQL: Queries with script filter result in a linear scan through all documents even if `LIMIT` is used #80523

SQL: Queries with script filter result in a linear scan through all documents even if `LIMIT` is used #80523