Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Primitive Keys: Support CAST in join criteria #4130

Closed
big-andy-coates opened this issue Dec 13, 2019 · 2 comments · Fixed by #4278
Closed

Primitive Keys: Support CAST in join criteria #4130

big-andy-coates opened this issue Dec 13, 2019 · 2 comments · Fixed by #4278
Assignees
Labels
P0 Denotes must-have for a given milestone
Milestone

Comments

@big-andy-coates
Copy link
Contributor

Previously, as all keys were of type STRING you could join on two columns of different types, e.g. a STRING column that you know to contain a number with an INTEGER column, or an INTEGER column with a BIGINT column.

With primitive key support this is no longer the case. The STRING containing a number won't necessarily be the same as a INTEGER. Hence the join is not allowed.

This can break users existing joins, with no easy way for them to fix because we don't support arbitrary expressions in the join criteria, i,.e. they can't do:

SELECT * FROM A JOIN B ON A.ID = CAST(B.ID AS INTEGER);

We could consider supporting casts, or just any expression, in the join criteria.

@big-andy-coates big-andy-coates added this to the 0.7.0 milestone Dec 13, 2019
big-andy-coates added a commit to big-andy-coates/ksql that referenced this issue Dec 13, 2019
Adds support for using primitive types in joins.

BREAKING CHANGE: Some existing joins may now fail and the type of `ROWKEY` in the result schema of joins may have changed.

When `ROWKEY` was always a `STRING` it was possible to join an `INTEGER` column with a `BIGINT` column.  This is no longer the case. A `JOIN` requires the join columns to be of the same type. (See confluentinc#4130 which tracks adding support for being able to `CAST` join criteria).

Where joining on two `INT` columns would previously have resulted in a schema containing `ROWKEY STRING KEY`, it would not result in `ROWKEY INT KEY`.
big-andy-coates added a commit that referenced this issue Dec 17, 2019
* chore: primitive key support in JOINs

Adds support for using primitive types in joins.

BREAKING CHANGE: Some existing joins may now fail and the type of `ROWKEY` in the result schema of joins may have changed.

When `ROWKEY` was always a `STRING` it was possible to join an `INTEGER` column with a `BIGINT` column.  This is no longer the case. A `JOIN` requires the join columns to be of the same type. (See #4130 which tracks adding support for being able to `CAST` join criteria).

Where joining on two `INT` columns would previously have resulted in a schema containing `ROWKEY STRING KEY`, it would not result in `ROWKEY INT KEY`.
@apurvam apurvam changed the title Support CAST in join criteria Primitive Keys: Support CAST in join criteria Dec 17, 2019
@apurvam apurvam added the P0 Denotes must-have for a given milestone label Dec 17, 2019
@agavra
Copy link
Contributor

agavra commented Jan 8, 2020

Dropping a note here, but I need to explore more in depth - I'm wondering if this was properly implemented in first place. The partitioning strategy for Strings may not be the same as for Integers (for example) so the co-partitioning requirement may have been broken to begin with.

@big-andy-coates
Copy link
Contributor Author

As discussed, I still think its worthwhile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P0 Denotes must-have for a given milestone
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants