Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: expression support for PARTITION BY #4032

Merged
merged 3 commits into from
Dec 6, 2019

Conversation

agavra
Copy link
Contributor

@agavra agavra commented Dec 3, 2019

fixes #4018

Description

This change allows for arbitrary expression support for PARTITION BY expressions:

  • use CodeGenRunner to evaluate the partition by expression instead of the legacy way which had custom handling for column references
  • have partition by go through proper rewriting (to make it consistent with other expressions, e.g. adding alias)

Note that if a partition by expression is used, the rowkey will not correspond to any column in the value.

Testing done

  • new unit tests
  • updated QTT tests

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@agavra agavra requested a review from a team as a code owner December 3, 2019 18:12
@agavra agavra force-pushed the partition_exp branch 2 times, most recently from 9c8bf3c to 83aaef9 Compare December 3, 2019 19:24
Copy link
Contributor

@rodesai rodesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM. One issue raised inline.

.map(kf -> kf.ref().equals(proposedKey.ref()))
.orElse(false);

return !namesMatch && !isRowKey(columnRef);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think comparing with the ROWKEY is actually safe, e.g.:

CREATE STREAM LEFT (A STRING, B STRING) WITH (KEY='A');
CREATE STREAM RIGHT (C STRING, D STRING) WITH (KEY='C');
CREATE STREAM JOINED AS SELECT LEFT.*, RIGHT.* FROM LEFT JOIN RIGHT ON L.B=R.D PARTITION BY LEFT.ROWKEY;

In this case, when we hit the repartition, the stream will be partitioned on B/D, so we do want to repartition, even though the column ref is ROWKEY.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @big-andy-coates - I think @rodesai brings up a good point here, though I think it's an existing bug. I think we should fix it in the short-term by re-introducing the "unnecessary" repartition step in the case of isRowKey but before I do that I wanted to run it past you since you implemented the original optimization (#2735).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest adding this as a QTT test to ensure we're handling this correctly. Do what you need to do to get this functionally working.

Raise a github issue to track the outstanding piece.

I think we can clean this up once we have cleaner code around the handling of schemas and duplicating ROWKEY and ROWTIME into the value schema.

Specifically, once we clean up / remove the use of source-aliases on schema fields and have arbitrary key column names, then I think this will be easy to solve because the ROWKEY name won't be ambiguous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f8bfefa - I will fix this in a future PR

Copy link
Contributor Author

@agavra agavra Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agavra agavra requested review from rodesai and a team December 5, 2019 17:06
Copy link
Contributor

@rodesai rodesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@agavra agavra merged commit 0f31f8e into confluentinc:master Dec 6, 2019
@agavra agavra deleted the partition_exp branch December 6, 2019 02:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expression support for PARTITION BY clause
3 participants