[FEATURE] Top-K enhancement when having plan like `SortOperator` + `LimitOperator` #2857

qianheng-aws · 2024-07-24T04:53:10Z

Is your feature request related to a problem?
Currently, when PPL having a sort and a followed head in its query, pattern like:

source | sort $field | head k

if using SQL, it should be pattern like:

select $field from source order by $field limit k;

It will using SortOperator and LimitOperator to calculate the final result if they are not being pushed down into the OpenSearch DSL. As shown in this issue #2802

POST _plugins/_ppl/_explain
{
  "query": """
    source=opensearch_dashboards_sample_data_flights
    | eval FlightMin = FlightTimeMin 
    | sort -FlightMin | head 5 | fields FlightMin
  """
}

# explain results
{
  "root": {
    "name": "ProjectOperator",
    "description": {
      "fields": "[FlightMin]"
    },
    "children": [
      {
        "name": "LimitOperator",
        "description": {
          "limit": 5,
          "offset": 0
        },
        "children": [
          {
            "name": "SortOperator",
            "description": {
              "sortList": {
                "FlightMin": {
                  "sortOrder": "DESC",
                  "nullOrder": "NULL_LAST"
                }
              }
            },
            "children": [
              {
                "name": "EvalOperator",
                "description": {
                  "expressions": {
                    "FlightMin": "FlightTimeMin"
                  }
                },
                "children": [
                  {
                    "name": "OpenSearchIndexScan",
                    "description": {
                      "request": """OpenSearchScrollRequest(initialSearchRequest=SearchRequest{searchType=QUERY_THEN_FETCH, indices=[opensearch_dashboards_sample_data_flights], indicesOptions=IndicesOptions[ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, expand_wildcards_hidden=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], routing='null', preference='null', requestCache=null, scroll=Scroll{keepAlive=1m}, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=null, allowPartialSearchResults=null, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={"from":0,"size":10000,"timeout":"1m"}, cancelAfterTimeInterval=null, pipeline=null, phaseTook=null}, scrollTimeout=1m, indexName=opensearch_dashboards_sample_data_flights, scrollId=, needClean=true)"""
                    },
                    "children": []
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
  }
}

In some cases like above where sort and limit operator cannot be pushed down, it may need to scan all documents whose size could be very large, or user just wants to set plugins.query.size_limit to be very large, it will be a challenge for the current implementation.

What solution would you like?
SortOperator + LimitOperator is not the most efficient way to calculate the top-k results, and it may have risk of OOM since SortOperator doesn't support spilling in current implementation(just use java.util.PriorityQueue which is all in memory)

sql/core/src/main/java/org/opensearch/sql/planner/physical/SortOperator.java

Line 74 in 2117650

PriorityQueue<ExprValue> sorted = new PriorityQueue<>(1, sorter::compare);

).

As we only care about the top-k elements in output of SortOperator, the better way is to kick off the unnecessary elements and only maintain k elements in this operator. It can both reduce the time complexity to O(nlogk) and memory capacity to O(k), and ensure correct results as well.

What alternatives have you considered?

Do you have any additional context?

The text was updated successfully, but these errors were encountered:

qianheng-aws added enhancement New feature or request untriaged labels Jul 24, 2024

qianheng-aws changed the title ~~[FEATURE] Top-K enhancement when having PPL pattern like source | sort $field | head k~~ [FEATURE] Top-K enhancement when having plan like SortOperator + LimitOperator Jul 24, 2024

LantaoJin removed the untriaged label Jul 25, 2024

LantaoJin mentioned this issue Jul 25, 2024

[RFC] Change the default value of plugins.query.size_limit to 10000 (MAX_RESULT_WINDOW) #2859

Closed

qianheng-aws mentioned this issue Jul 26, 2024

add TakeOrderedOperator #2863

Merged

6 tasks

LantaoJin assigned qianheng-aws Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Top-K enhancement when having plan like `SortOperator` + `LimitOperator` #2857

[FEATURE] Top-K enhancement when having plan like `SortOperator` + `LimitOperator` #2857

qianheng-aws commented Jul 24, 2024 •

edited

Loading

[FEATURE] Top-K enhancement when having plan like SortOperator + LimitOperator #2857

[FEATURE] Top-K enhancement when having plan like SortOperator + LimitOperator #2857

Comments

qianheng-aws commented Jul 24, 2024 • edited Loading

[FEATURE] Top-K enhancement when having plan like `SortOperator` + `LimitOperator` #2857

[FEATURE] Top-K enhancement when having plan like `SortOperator` + `LimitOperator` #2857

qianheng-aws commented Jul 24, 2024 •

edited

Loading