-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add EARLIEST_BY/LATEST_BY to make EARLIEST/LATEST function signatures less ambiguous #12145
add EARLIEST_BY/LATEST_BY to make EARLIEST/LATEST function signatures less ambiguous #12145
Conversation
docs/querying/sql.md
Outdated
|`EARLIEST(expr, maxBytesPerString)`|Like `EARLIEST(expr)`, but for strings. The `maxBytesPerString` parameter determines how much aggregation space to allocate per string. Strings longer than this limit will be truncated. This parameter should be set as low as possible, since high values will lead to wasted memory.|`null` if `druid.generic.useDefaultValueForNull=false`, otherwise `''`| | ||
|`EARLIEST(expr, maxBytesPerString, timeColumn)`|Like `EARLIEST(expr, timeColumn)`, but for strings. The `maxBytesPerString` parameter determines how much aggregation space to allocate per string. Strings longer than this limit will be truncated. This parameter should be set as low as possible, since high values will lead to wasted memory.|`null` if `druid.generic.useDefaultValueForNull=false`, otherwise `''`| | ||
|`EARLIEST_BY(expr, timeColumn)`|Returns the earliest value of `expr`, which must be numeric. Earliest value is defined as the value first encountered with the minimum overall value of time column of all values being aggregated.|`null` if `druid.generic.useDefaultValueForNull=false`, otherwise `0`| | ||
|`EARLIEST_BY(expr, maxBytesPerString, timeColumn)`|Like `EARLIEST_BY(expr, timeColumn)`, but for strings. The `maxBytesPerString` parameter determines how much aggregation space to allocate per string. Strings longer than this limit will be truncated. This parameter should be set as low as possible, since high values will lead to wasted memory.|`null` if `druid.generic.useDefaultValueForNull=false`, otherwise `''`| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With EARLIEST_BY
I'd suggest we swap the location of timeColumn
and maxBytesPerString
. Given that timeColumn
is non-optional on the _BY
items, we want the optional parameter to be at the end of the signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than the one comment.
OperandTypes.sequence( | ||
"'" + aggregatorType.name() + "(expr, timeColumn)'\n", | ||
OperandTypes.ANY, | ||
OperandTypes.NUMERIC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should allow either NUMERIC or TIMESTAMP here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, thinking about it more, IMO we should only allow TIMESTAMP. That's consistent with what our other time functions do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed, and this found a bug when i had to switch to column that could actually be converted to TIMESTAMP (a long), we weren't correctly checking if the time column selector had null values, so I've fixed up the native aggregators to handle this case.
.idea/inspectionProfiles/Druid.xml
Outdated
@@ -70,7 +70,6 @@ | |||
</option> | |||
<option name="IGNORE_FIELDS_USED_IN_MULTIPLE_METHODS" value="true" /> | |||
</inspection_tool> | |||
<inspection_tool class="FieldMayBeFinal" enabled="true" level="WARNING" enabled_by_default="true" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this change (& the others in this file) intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no heh, will fix
… less ambiguous (apache#12145) * add EARLIEST_BY/LATEST_BY to make EARLIEST/LATEST function signatures unambiguous * switcheroo * EARLIEST_BY/LATEST_BY use timestamp instead of numeric types, update docs * revert unintended change * fix docs * fix docs better (cherry picked from commit f2ce769) Signed-off-by: ssagare <[email protected]>
… less ambiguous (apache#12145) * add EARLIEST_BY/LATEST_BY to make EARLIEST/LATEST function signatures unambiguous * switcheroo * EARLIEST_BY/LATEST_BY use timestamp instead of numeric types, update docs * revert unintended change * fix docs * fix docs better Signed-off-by: ssagare <[email protected]>
Description
Follow-up to #11949, this PR splits explicit time column functions into standalone
EARLIEST_BY
andLATEST_BY
SQL functions to avoid the method signatures being ambiguous and dependent on the column types of the inputs.Prior to this PR, something like
latest(x, 10)
could either translate to "latest" value of x with max bytes of 10 if a string, but if x was a number it would treat the10
as a timestamp, so instead of a validation exception like would happen prior to #11949, it would allow it but then explode with strange cast exceptions due to unintended behavior on the users part.Splitting this out into separate functions makes it much less likely for the user to issue the incorrect query.
This PR has: