[FEATURE] Top-K enhancement when having plan like SortOperator
+ LimitOperator
#2857
Labels
enhancement
New feature or request
SortOperator
+ LimitOperator
#2857
Is your feature request related to a problem?
Currently, when PPL having a
sort
and a followedhead
in its query, pattern like:if using SQL, it should be pattern like:
It will using
SortOperator
andLimitOperator
to calculate the final result if they are not being pushed down into the OpenSearch DSL. As shown in this issue #2802In some cases like above where sort and limit operator cannot be pushed down, it may need to scan all documents whose size could be very large, or user just wants to set
plugins.query.size_limit
to be very large, it will be a challenge for the current implementation.What solution would you like?
SortOperator
+LimitOperator
is not the most efficient way to calculate the top-k results, and it may have risk of OOM sinceSortOperator
doesn't support spilling in current implementation(just usejava.util.PriorityQueue
which is all in memory)sql/core/src/main/java/org/opensearch/sql/planner/physical/SortOperator.java
Line 74 in 2117650
As we only care about the top-k elements in output of
SortOperator
, the better way is to kick off the unnecessary elements and only maintaink
elements in this operator. It can both reduce the time complexity toO(nlogk)
and memory capacity toO(k)
, and ensure correct results as well.What alternatives have you considered?
Do you have any additional context?
The text was updated successfully, but these errors were encountered: