Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Queries with script filter result in a linear scan through all documents even if LIMIT is used #80523

Closed
Luegg opened this issue Nov 9, 2021 · 3 comments · Fixed by #83381
Labels
:Analytics/SQL SQL querying >bug Team:QL (Deprecated) Meta label for query languages team

Comments

@Luegg
Copy link
Contributor

Luegg commented Nov 9, 2021

For example, the query SELECT station.name FROM \"weather-data-2016\" WHERE LENGTH(station.name) > 10 LIMIT 10 takes more than 60s on the NOAA benchmark dataset. The equivalent search request on the other hand only takes a few ms:

POST http://localhost:39200/weather-data-2016/_search
Content-Type: application/json

{
  "size": 10,
  "query": {
    "script": {
      "script": {
        "source": "InternalQlScriptUtils.nullSafeFilter(InternalQlScriptUtils.gt(InternalSqlScriptUtils.length(InternalQlScriptUtils.docValue(doc,params.v0)),params.v1))",
        "lang": "painless",
        "params": {
          "v0": "station.name",
          "v1": 5
        }
      },
      "boost": 1.0
    }
  },
  "_source": false,
  "fields": [
    {
      "field": "station.name"
    }
  ],
  "sort": [
    {
      "_doc": {
        "order": "asc"
      }
    }
  ]
}

Note, the search request does not open a scroll context but SQL does. Opening the scroll context causes ES to count all matching documents which requires to evaluate the script on every doc.

This issue could be addressed by using PIT instead of scroll contexts (see #61873).

@Luegg Luegg added >bug :Analytics/SQL SQL querying labels Nov 9, 2021
@elasticmachine elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label Nov 9, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@bpintea
Copy link
Contributor

bpintea commented Nov 10, 2021

Opening the scroll context causes ES to count all matching documents which requires to evaluate the script on every doc.

Curious how is the result evaluated when not opening a scroll context (or with PIT), if not going through every doc?
Is the size considered in one case (scroll/PIT), but not the other?

@Luegg
Copy link
Contributor Author

Luegg commented Dec 6, 2021

Sorry, only saw your question today @bpintea. The script filter only needs to be applied until you have 10 matches. If the script is not too selective, you might only have to go through a few docs.

It seems to be a peculiarity of the scroll functionality that it includes a match count. Maybe that's not even necessary and a bug. PIT does not have this issue.

elasticsearchmachine pushed a commit that referenced this issue Feb 15, 2022
Resolves #61873

The goal of this PR is to remove the use of the deprecated scroll
cursors in SQL. Functionality and APIs should remain the same with one
notable difference: The last page of a search hit query used to always
include a scroll cursor if it is non-empty. This is no longer the case,
if a result set is exhausted, the PIT will be closed and the last page
does not include a cursor.

Note, PIT can also be used for aggregation and PIVOT queries but this is
not in the scope of this PR and will be implemented in a follow up.

Additionally, this PR resolves #80523 because the total doc count is no
longer required.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/SQL SQL querying >bug Team:QL (Deprecated) Meta label for query languages team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants