You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A GROUP BY is effectively a PARTITION BY, (selectKey in KS parlance?), followed by an aggregation. They two should behave in a consistent manner.
Consider the following example:
CREATE STREAM foo ASSELECT col1*col2 AS new_col1, col3*100as new_col2
FROM bar
PARTITION BY new_col1;
We know that the result schema is new_col1, new_col2 and it is partitioned by the first column, new_col1.
@hjafarpour, I think you're still thinking in terms of only the value columns being part of the schema. We've moving away from this with the work on primitive keys, (almost there) and structured keys. With this work done the key columns are as much a part of the schema as the value columns.
If you take into account that the key columns are in a schema, then the example starts to look more like:
CREATE STREAM foo ASSELECT col1*col2 AS new_col1, col3*100as new_col2
FROM bar
PARTITION BY col1*col2 AS Id;
And the resulting schema is something like the following, (make up types) ID INT KEY, NEW_COL1 INT, NEW_COL2 INT.
But actually, this is duplicating the key column in the value, which is a waste of space if the downstream system doesn't need it, so your initial query might be better re-written as:
CREATE STREAM foo ASSELECT col3*100as new_col2
FROM bar
PARTITION BY col1*col2 AS new_col1;
with a result schema of NEW_COL1 INT KEY, NEW_COL2 INT.
However, I agree we'll need expression support in the PARTITION BY -> @agavra, do we get that for free with your change promoting this to a proper logic node? Or is this something we'll need to add? If it's the former, we should add some QTT tests covering this.
The text was updated successfully, but these errors were encountered:
Originally posted by @big-andy-coates in #3982 (comment)
A
GROUP BY
is effectively aPARTITION BY
, (selectKey in KS parlance?), followed by an aggregation. They two should behave in a consistent manner.@hjafarpour, I think you're still thinking in terms of only the value columns being part of the schema. We've moving away from this with the work on primitive keys, (almost there) and structured keys. With this work done the key columns are as much a part of the schema as the value columns.
If you take into account that the key columns are in a schema, then the example starts to look more like:
And the resulting schema is something like the following, (make up types)
ID INT KEY, NEW_COL1 INT, NEW_COL2 INT
.But actually, this is duplicating the key column in the value, which is a waste of space if the downstream system doesn't need it, so your initial query might be better re-written as:
with a result schema of
NEW_COL1 INT KEY, NEW_COL2 INT
.However, I agree we'll need expression support in the
PARTITION BY
-> @agavra, do we get that for free with your change promoting this to a proper logic node? Or is this something we'll need to add? If it's the former, we should add some QTT tests covering this.The text was updated successfully, but these errors were encountered: