KSQL partitions by column in source when there is a name collision with a column in the sink #2525

apurvam · 2019-03-05T22:46:38Z

Setup:

ksql> create stream TEST_1 (a varchar, b varchar)  \
  WITH (KAFKA_TOPIC='TEST_PART_1', value_format='delimited', \
  key='a');

 Message
----------------
 Stream created
----------------
ksql> create stream TEST_2 WITH (KAFKA_TOPIC='TEST_PART_2', \
  value_format='delimited') as select a + '_NEW' as a, b \
  from TEST_1 partition by a;

 Message
----------------------------
 Stream created and running
----------------------------

Produce to TEST_PART_1:

$ kafkacat -b localhost:9092 -t TEST_PART_1 -P -K:
C:C,D
E:E,F

See that the key in stream TEST_2 is the original value from TEST_1, not the modified value:

ksql> create stream TEST_2 WITH (KAFKA_TOPIC='TEST_PART_2', \
  value_format='delimited') as select a + '_NEW' as a, b \
  from TEST_1 partition by a;

 Message
----------------------------
 Stream created and running
----------------------------
ksql> select * from TEST_2;
1551824886814 | C | C_NEW | D
1551824930017 | E | E_NEW | F

As we can see, the query TEST_2 is picking the key column a from the source, rather than the modified key column with the same name a from the TEST_2 select statement.

If we modify the name of a in TEST_2 to a_new, and partition by a_new, then it behaves as expected:

ksql> create stream TEST_3 WITH (KAFKA_TOPIC='TEST_PART_3', \
   value_format='delimited') as select a + '_NEW' as a_new, b \
   from TEST_1 partition by a_new;

 Message
----------------------------
 Stream created and running
----------------------------
ksql> select * from TEST_3;
1551825264867 | K_NEW | K_NEW | L

That produces the correct key for the input:

$ kafkacat -b localhost:9092 -t TEST_PART_1 -P -K:
K:K,L

This is odd default behavior. We should always be choosing the key from the columns in the sink rather than the source when there are name collisions.

The text was updated successfully, but these errors were encountered:

apurvam added the bug label Mar 5, 2019

big-andy-coates self-assigned this Mar 29, 2019

big-andy-coates mentioned this issue Apr 23, 2019

StructuredDataSourceNode to always use fully qualified schema #2722

Merged

2 tasks

big-andy-coates added a commit to big-andy-coates/ksql that referenced this issue Apr 29, 2019

Add tests covering confluentinc#2525

a54416c

big-andy-coates mentioned this issue Apr 29, 2019

Fix Joining on ROWKEY #2735

Merged

2 tasks

big-andy-coates closed this as completed in #2735 May 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KSQL partitions by column in source when there is a name collision with a column in the sink #2525

KSQL partitions by column in source when there is a name collision with a column in the sink #2525

apurvam commented Mar 5, 2019 •

edited

Loading

KSQL partitions by column in source when there is a name collision with a column in the sink #2525

KSQL partitions by column in source when there is a name collision with a column in the sink #2525

Comments

apurvam commented Mar 5, 2019 • edited Loading

apurvam commented Mar 5, 2019 •

edited

Loading