fix: append additional fields to row values in `KsqlMaterialization.java` to fix pull query filtering #7336

cprasad1 · 2021-04-01T03:38:35Z

Description

Fixes #7312 and other similar situations by appending additional fields to row values in KsqlMaterialization.java

Testing done

RQTT
Unit Testing

Reviewer checklist

Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
Ensure relevant issues are linked (description should include text like "Fixes #")

ghost · 2021-04-01T03:38:37Z

@confluentinc It looks like @cprasad1 just signed our Contributor License Agreement. 👍

Always at your service,

clabot

...c/test/java/io/confluent/ksql/execution/streams/materialization/KsqlMaterializationTest.java

...s/src/main/java/io/confluent/ksql/execution/streams/materialization/KsqlMaterialization.java

cprasad1 · 2021-04-01T08:42:27Z

...s/src/main/java/io/confluent/ksql/execution/streams/materialization/KsqlMaterialization.java

@@ -174,7 +174,7 @@ public MaterializedWindowedTable windowed() {
      final Builder<WindowedRow> builder = ImmutableList.builder();

      for (final WindowedRow row : result) {
-        filterAndTransform(row.windowedKey(), row.value(), row.rowTime())
+        filterAndTransform(row.windowedKey(), getIntermediateRow(row), row.rowTime())


Unsure if we even need to append the extra columns for windowed rows. Windowed state stores might need more testing

Could you elaborate a bit more here? Is that because the windowed key may have the needed columns for filtering?

@guozhangwang my understanding is that windowed tables always need an aggregation, so they have a state store backing the aggregated table. If that is the case, then we don't need to append this extra metadata as KsqlMaterialization is sophisticated enough to handle those cases (we have test cases for that). I noticed that we follow a similar pattern for windowed rows in ProjectOperator and SelectOperator of generating intermediate rows on which filters and transformations can be applied. I added these fields as a hedge against potential cases that we might miss (obviously it comes at a cost). That being said, I have a couple of questions for you:

Are there any type of Windowed tables that are not queryable today that we want to be able to query?

Can Windowed tables be derived without doing any aggregations? (Specifically, GROUP BY aggregations)

I haven't tested it by try a windowed query with group by and having clause. I think that might trigger KsqlMaterialization. In that case, presumably you could say HAVING WINDOWSTART > 20 and therefore need to have those columns.

+1 to @AlanConfluent , I think today if you have a WINDOW BY + GROUP BY + HAVING you would have a first windowed table from windowBy+groupBy aggregations, and then a second windowed table from having as a filtering condition. So in that sense not all windowed tables should be generated with aggregations, they can also be generated from table statless operators from other existing windowed tables.

It would be good to have this as a RQTT test where the having clause mentions windowstart or windowend

cprasad1 · 2021-04-01T08:43:27Z

...s/src/test/resources/rest-query-validation-tests/pull-queries-against-ctas-materialized.json

+          {"row":{"columns":["F"]}}
+
+        ]}
+      ]


Currently, there are no additional tests for Windowed materialization. Did we intend to make any new new type of Windowed table queryable with all these changes @AlanConfluent ?

You could always tested a windowed case like

"CREATE TABLE AGGREGATE AS SELECT ID, COUNT(1) AS COUNT FROM INPUT WINDOW TUMBLING(SIZE 1 SECOND) GROUP BY ID HAVING COUNT(1) > 2;",

We don't have a lot of tests with the having clause for pull queries and I think it might trigger the KsqlMaterialization transform logic to apply the filtering. Since they can apply for group bys I assume they work for windowed tables as well. Normal where clauses presumably don't touch KsqlMaterialization when a group by is in place since the materialization happens after the filter has been applied, so that might not need additional testing.

But in general, you're right that a lot of windowed logic has already been tested fairly well in https://github.com/confluentinc/ksql/blob/master/ksqldb-functional-tests/src/test/resources/rest-query-validation-tests/pull-queries-against-materialized-aggregates.json since windows require a group by.

These are good cases. It would be good to do a windowed table + group by + having clause mentioning windowstart. That's one last case I don't see.

we don't allow WINDOWSTART that at the moment, so it's not a testable case. The specific error message is Window bounds column WINDOWSTART can only be used in the SELECT clause of windowed aggregations and can not be passed to aggregate functions.

Related: #4397

I'm still wondering, if @AlanConfluent 's query is possible:

CREATE TABLE AGGREGATE AS SELECT ID, COUNT(1) AS COUNT FROM INPUT WINDOW TUMBLING(SIZE 1 SECOND) GROUP BY ID HAVING COUNT(1) > 2;

note it does not try to group by window-start/end, while we should still be able to pull query it with conditions on other columns?

guozhangwang

Just wondering what's the difference for windowed stores, for my own education. Otherwise the fix lgtm.

guozhangwang · 2021-04-01T18:13:53Z

...s/src/main/java/io/confluent/ksql/execution/streams/materialization/KsqlMaterialization.java

@@ -174,7 +174,7 @@ public MaterializedWindowedTable windowed() {
      final Builder<WindowedRow> builder = ImmutableList.builder();

      for (final WindowedRow row : result) {
-        filterAndTransform(row.windowedKey(), row.value(), row.rowTime())
+        filterAndTransform(row.windowedKey(), getIntermediateRow(row), row.rowTime())


Could you elaborate a bit more here? Is that because the windowed key may have the needed columns for filtering?

...s/src/main/java/io/confluent/ksql/execution/streams/materialization/KsqlMaterialization.java

...c/test/java/io/confluent/ksql/execution/streams/materialization/KsqlMaterializationTest.java

AlanConfluent · 2021-04-01T20:30:40Z

...c/test/java/io/confluent/ksql/execution/streams/materialization/KsqlMaterializationTest.java

    materialization = new KsqlMaterialization(
        inner,
        SCHEMA,
-        ImmutableList.of(project, filter)
+        ImmutableList.of(filter, project)


Did you swap these because this is a more realistic ordering?

AlanConfluent · 2021-04-01T20:42:12Z

...s/src/test/resources/rest-query-validation-tests/pull-queries-against-ctas-materialized.json

+          {"row":{"columns":["F"]}}
+
+        ]}
+      ]


You could always tested a windowed case like

"CREATE TABLE AGGREGATE AS SELECT ID, COUNT(1) AS COUNT FROM INPUT WINDOW TUMBLING(SIZE 1 SECOND) GROUP BY ID HAVING COUNT(1) > 2;",

We don't have a lot of tests with the having clause for pull queries and I think it might trigger the KsqlMaterialization transform logic to apply the filtering. Since they can apply for group bys I assume they work for windowed tables as well. Normal where clauses presumably don't touch KsqlMaterialization when a group by is in place since the materialization happens after the filter has been applied, so that might not need additional testing.

But in general, you're right that a lot of windowed logic has already been tested fairly well in https://github.com/confluentinc/ksql/blob/master/ksqldb-functional-tests/src/test/resources/rest-query-validation-tests/pull-queries-against-materialized-aggregates.json since windows require a group by.

AlanConfluent · 2021-04-01T20:43:52Z

...s/src/main/java/io/confluent/ksql/execution/streams/materialization/KsqlMaterialization.java

@@ -174,7 +174,7 @@ public MaterializedWindowedTable windowed() {
      final Builder<WindowedRow> builder = ImmutableList.builder();

      for (final WindowedRow row : result) {
-        filterAndTransform(row.windowedKey(), row.value(), row.rowTime())
+        filterAndTransform(row.windowedKey(), getIntermediateRow(row), row.rowTime())


I haven't tested it by try a windowed query with group by and having clause. I think that might trigger KsqlMaterialization. In that case, presumably you could say HAVING WINDOWSTART > 20 and therefore need to have those columns.

AlanConfluent

Nice PR and good test cases

...s/src/main/java/io/confluent/ksql/execution/streams/materialization/KsqlMaterialization.java

guozhangwang

Thanks for the added testing coverage!

guozhangwang · 2021-04-05T18:38:32Z

...s/src/test/resources/rest-query-validation-tests/pull-queries-against-ctas-materialized.json

+      ]
+    },
+    {
+      "name": "persistent query with KEY filter and projection +++ pull query table scan and single key lookup ***FAILURE***",


nit: should we explicitly state the the error root cause here? Since otherwise the name is exactly the same as above except we say it is a ***FAILURE*** case.

guozhangwang · 2021-04-05T18:42:39Z

...s/src/test/resources/rest-query-validation-tests/pull-queries-against-ctas-materialized.json

+          {"row":{"columns":["F"]}}
+
+        ]}
+      ]


I'm still wondering, if @AlanConfluent 's query is possible:

CREATE TABLE AGGREGATE AS SELECT ID, COUNT(1) AS COUNT FROM INPUT WINDOW TUMBLING(SIZE 1 SECOND) GROUP BY ID HAVING COUNT(1) > 2;

note it does not try to group by window-start/end, while we should still be able to pull query it with conditions on other columns?

…ava` to fix pull query filtering (confluentinc#7336) * passes unit tests * semantic * start adding mores RQTT * start adding mores RQTT 2 * start adding mores RQTT 3 all greeeeen * start adding mores RQTT 3 all greeeeen MAX COMPLEX * FINISH RQTT * added windowed table tests * modified tests * add small comment * add small comment fix checkstyle Co-authored-by: Chittaranjan Prasad <>

cprasad1 · 2021-04-05T19:58:59Z

...s/src/test/resources/rest-query-validation-tests/pull-queries-against-ctas-materialized.json

+      "name": "windowed - select star with HAVING filter",
+      "statements": [
+        "CREATE STREAM INPUT (ID STRING KEY, IGNORED INT) WITH (kafka_topic='test_topic', value_format='JSON');",
+        "CREATE TABLE AGGREGATE AS SELECT ID, COUNT(1) AS COUNT FROM INPUT WINDOW TUMBLING(SIZE 1 SECOND) GROUP BY ID HAVING COUNT(1) > 1;",


@guozhangwang is this the test similar to what you are interested in?

Ah yes, thanks!

…ava` to fix pull query filtering (#7336) (#7342) * passes unit tests * semantic * start adding mores RQTT * start adding mores RQTT 2 * start adding mores RQTT 3 all greeeeen * start adding mores RQTT 3 all greeeeen MAX COMPLEX * FINISH RQTT * added windowed table tests * modified tests * add small comment * add small comment fix checkstyle Co-authored-by: Chittaranjan Prasad <>

passes unit tests

22e75d9

cprasad1 requested a review from a team as a code owner April 1, 2021 03:38

Chittaranjan Prasad added 6 commits March 31, 2021 20:50

semantic

367753d

start adding mores RQTT

1d52f46

start adding mores RQTT 2

4d8e4ab

start adding mores RQTT 3 all greeeeen

3a6bde3

start adding mores RQTT 3 all greeeeen MAX COMPLEX

9205de4

FINISH RQTT

3ab5374

cprasad1 requested a review from AlanConfluent April 1, 2021 08:37

cprasad1 commented Apr 1, 2021

View reviewed changes

...c/test/java/io/confluent/ksql/execution/streams/materialization/KsqlMaterializationTest.java Outdated Show resolved Hide resolved

cprasad1 commented Apr 1, 2021

View reviewed changes

...s/src/main/java/io/confluent/ksql/execution/streams/materialization/KsqlMaterialization.java Show resolved Hide resolved

cprasad1 commented Apr 1, 2021

View reviewed changes

guozhangwang reviewed Apr 1, 2021

View reviewed changes

AlanConfluent reviewed Apr 1, 2021

View reviewed changes

Chittaranjan Prasad added 2 commits April 1, 2021 19:11

added windowed table tests

6379d88

modified tests

4f082fd

cprasad1 requested review from guozhangwang and AlanConfluent April 2, 2021 15:52

AlanConfluent approved these changes Apr 2, 2021

View reviewed changes

...s/src/main/java/io/confluent/ksql/execution/streams/materialization/KsqlMaterialization.java Show resolved Hide resolved

Chittaranjan Prasad added 2 commits April 2, 2021 12:46

add small comment

5fcb84d

add small comment fix checkstyle

c3c3bf0

guozhangwang approved these changes Apr 5, 2021

View reviewed changes

cprasad1 merged commit f8a4609 into confluentinc:master Apr 5, 2021

cprasad1 deleted the pull_filter_fix branch April 5, 2021 18:40

guozhangwang reviewed Apr 5, 2021

View reviewed changes

cprasad1 mentioned this pull request Apr 5, 2021

fix: append additional fields to row values in KsqlMaterialization.java to fix pull query filtering #7342

Merged

2 tasks

cprasad1 commented Apr 5, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: append additional fields to row values in `KsqlMaterialization.java` to fix pull query filtering #7336

fix: append additional fields to row values in `KsqlMaterialization.java` to fix pull query filtering #7336

cprasad1 commented Apr 1, 2021 •

edited

Loading

ghost commented Apr 1, 2021

cprasad1 Apr 1, 2021

guozhangwang Apr 1, 2021

cprasad1 Apr 1, 2021 •

edited

Loading

AlanConfluent Apr 1, 2021

guozhangwang Apr 1, 2021

AlanConfluent Apr 2, 2021

cprasad1 Apr 1, 2021 •

edited

Loading

AlanConfluent Apr 1, 2021

AlanConfluent Apr 2, 2021

cprasad1 Apr 2, 2021

cprasad1 Apr 2, 2021

guozhangwang Apr 5, 2021

guozhangwang left a comment

guozhangwang Apr 1, 2021

AlanConfluent Apr 1, 2021

cprasad1 Apr 1, 2021

AlanConfluent Apr 1, 2021

AlanConfluent Apr 1, 2021

AlanConfluent left a comment

guozhangwang left a comment

guozhangwang Apr 5, 2021

guozhangwang Apr 5, 2021

cprasad1 Apr 5, 2021

guozhangwang Apr 5, 2021

fix: append additional fields to row values in KsqlMaterialization.java to fix pull query filtering #7336

fix: append additional fields to row values in KsqlMaterialization.java to fix pull query filtering #7336

Conversation

cprasad1 commented Apr 1, 2021 • edited Loading

Description

Testing done

Reviewer checklist

ghost commented Apr 1, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cprasad1 Apr 1, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cprasad1 Apr 1, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guozhangwang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlanConfluent left a comment

Choose a reason for hiding this comment

guozhangwang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fix: append additional fields to row values in `KsqlMaterialization.java` to fix pull query filtering #7336

fix: append additional fields to row values in `KsqlMaterialization.java` to fix pull query filtering #7336

cprasad1 commented Apr 1, 2021 •

edited

Loading

cprasad1 Apr 1, 2021 •

edited

Loading

cprasad1 Apr 1, 2021 •

edited

Loading