Expression support for PARTITION BY clause #4018

agavra · 2019-12-02T19:34:30Z

Originally posted by @big-andy-coates in #3982 (comment)

A GROUP BY is effectively a PARTITION BY, (selectKey in KS parlance?), followed by an aggregation. They two should behave in a consistent manner.

Consider the following example:

CREATE STREAM foo AS SELECT col1*col2 AS new_col1, col3*100 as new_col2 
FROM bar 
PARTITION BY new_col1;
We know that the result schema is new_col1, new_col2 and it is partitioned by the first column, new_col1.

@hjafarpour, I think you're still thinking in terms of only the value columns being part of the schema. We've moving away from this with the work on primitive keys, (almost there) and structured keys. With this work done the key columns are as much a part of the schema as the value columns.

If you take into account that the key columns are in a schema, then the example starts to look more like:

CREATE STREAM foo AS SELECT col1*col2 AS new_col1, col3*100 as new_col2 
FROM bar 
PARTITION BY col1*col2 AS Id;

And the resulting schema is something like the following, (make up types) ID INT KEY, NEW_COL1 INT, NEW_COL2 INT.

But actually, this is duplicating the key column in the value, which is a waste of space if the downstream system doesn't need it, so your initial query might be better re-written as:

CREATE STREAM foo AS SELECT col3*100 as new_col2 
FROM bar 
PARTITION BY col1*col2 AS new_col1;

with a result schema of NEW_COL1 INT KEY, NEW_COL2 INT.

However, I agree we'll need expression support in the PARTITION BY -> @agavra, do we get that for free with your change promoting this to a proper logic node? Or is this something we'll need to add? If it's the former, we should add some QTT tests covering this.

The text was updated successfully, but these errors were encountered:

rayzyar · 2020-02-26T10:49:39Z

@agavra Is this fix also available in https://hub.docker.com/r/confluentinc/cp-ksql-server?
My work is based on https://github.com/confluentinc/cp-helm-charts setup

rayzyar · 2020-02-26T11:03:07Z

@agavra Is this fix also available in https://hub.docker.com/r/confluentinc/cp-ksql-server?
My work is based on https://github.com/confluentinc/cp-helm-charts setup

Just found this #1039 (comment)
When would it be released to confluentinc/cp-ksql-server? Any estimation?

agavra added the enhancement label Dec 2, 2019

agavra mentioned this issue Dec 2, 2019

fix: unify behavior for PARTITION BY and GROUP BY #3982

Merged

2 tasks

agavra self-assigned this Dec 2, 2019

agavra mentioned this issue Dec 3, 2019

feat: expression support for PARTITION BY #4032

Merged

2 tasks

agavra closed this as completed in #4032 Dec 6, 2019

agavra mentioned this issue Dec 10, 2019

PARTITION BY should support nested fields #2218

Closed

agavra mentioned this issue Feb 25, 2020

Allow statement to specify the casing (camel case, uppercase, etc) for field names when serialized to output topic #1039

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expression support for PARTITION BY clause #4018

Expression support for PARTITION BY clause #4018

agavra commented Dec 2, 2019

rayzyar commented Feb 26, 2020

rayzyar commented Feb 26, 2020

Expression support for PARTITION BY clause #4018

Expression support for PARTITION BY clause #4018

Comments

agavra commented Dec 2, 2019

rayzyar commented Feb 26, 2020

rayzyar commented Feb 26, 2020