-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deduce type from the aggregators when materializing subquery results #16703
Conversation
sql/src/test/java/org/apache/druid/sql/calcite/CalciteSubqueryTest.java
Dismissed
Show dismissed
Hide dismissed
sql/src/test/java/org/apache/druid/sql/calcite/CalciteSubqueryTest.java
Dismissed
Show dismissed
Hide dismissed
sql/src/test/java/org/apache/druid/sql/calcite/CalciteSubqueryTest.java
Dismissed
Show dismissed
Hide dismissed
RowSignature rowSignature = query.getResultRowSignature( | ||
query.context().isFinalize(true) | ||
? RowSignature.Finalization.YES | ||
: RowSignature.Finalization.NO | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this logic supposed to be inside query.getResultRowSignature
; query
already knows context()
; why should we tell it from the outside the value of Finalization
?
doesn't that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should work, but I am scared to make that change given that it will affect everything from native and SQL queries. Lemme try making the change and see if there are any failing tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realised that it shouldn't work. For example - look at GroupByPreShuffleFrameProcessor and GroupByPostShuffleFrameProcessor. The same query requires different finalization modes, since one partially aggregates and we need to intermediate type while the other completely aggregates and finalizes. This information isn't fully captured by the query and needs someone from the outside to tell which finalization mode to use. Therefore we can't trustily determine based on the query context.
@@ -535,4 +532,16 @@ private Function<Result<TimeseriesResultValue>, Result<TimeseriesResultValue>> m | |||
); | |||
}; | |||
} | |||
|
|||
private RowSignature resultSignature(final TimeseriesQuery query, final RowSignature.Finalization finalization) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this method be moved to be: TimeseriesQuery#getResultSignature
(like for GroupByQuery
)
or TimeseriesQuery#getRowSignature
(like for ScanQuery
) ?
@@ -558,7 +558,16 @@ public Optional<Sequence<FrameSignaturePair>> resultsAsFrames( | |||
boolean useNestedForUnknownTypes | |||
) | |||
{ | |||
final RowSignature rowSignature = resultArraySignature(query); | |||
final RowSignature rowSignature = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarily to TS
: - shouldn't this be TopNQuery#getResultRowSignature
?
please also update resultArraySignature
to use that method so we are not duplicating logic
Thanks for the review @kgyrtkirk. |
…pache#16703) For aggregators like StringFirst/Last, whose intermediate type isn't the same as the final type, using them in GroupBy, TopN or Timeseries subqueries causes a fallback when maxSubqueryBytes is set. This is because we assume that the finalization is not known, due to which the row signature cannot determine whether to use the intermediate or the final type, and it puts it as null. This PR figures out the finalization from the query context and uses the intermediate or the final type appropriately.
Description
For aggregators like StringFirst/Last, whose intermediate type isn't the same as the final type, using them in GroupBy, TopN or Timeseries subqueries causes a fallback when
maxSubqueryBytes
is set. This is because we assume that the finalization is not known, due to which the row signature cannot determine whether to use the intermediate or the final type, and it puts it as null. This PR figures out the finalization from the query context and uses the intermediate or the final type appropriately.Release note
Key changed/added classes in this PR
MyFoo
OurBar
TheirBaz
This PR has: