-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(serde): kafka format #3065
feat(serde): kafka format #3065
Conversation
A new ``KAFKA`` format that supports ``INT``, ``BIGINT``, ``DOUBLE`` and ``STRING`` fields that have been serialized using the standard Kafka serializers, e.g. ``org.apache.kafka.common.serialization.LongSerializer``, or equivalent. The format only supports single values, i.e. only single field, being primarily intended for use as a key format.
Will this work with Kafka Connect? |
Conflicting files ksql-engine/src/main/java/io/confluent/ksql/analyzer/Analyzer.java ksql-engine/src/main/java/io/confluent/ksql/ddl/commands/CreateSourceCommand.java ksql-engine/src/main/java/io/confluent/ksql/serde/KsqlSerdeFactories.java ksql-engine/src/test/java/io/confluent/ksql/analyzer/AnalyzerTest.java ksql-serde/src/main/java/io/confluent/ksql/serde/Format.java
I don't think this will help connect, which tends to use Avro/Json keys, right? I'll be looking into Avro/Json keys v. soon. The new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with a few suggestions.
Update on Connect friendlyness of new KAFKA format from Connect team.
In other words, yes you can configure Connect to serialize something like a numeric User Id as a |
Co-Authored-By: Jim Galasyn <[email protected]>
Co-Authored-By: Jim Galasyn <[email protected]>
Co-Authored-By: Jim Galasyn <[email protected]>
Co-Authored-By: Jim Galasyn <[email protected]>
Co-Authored-By: Jim Galasyn <[email protected]>
Co-Authored-By: Jim Galasyn <[email protected]>
Co-Authored-By: Jim Galasyn <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @big-andy-coates, this mostly looks good. 1 issue inline.
ksql-serde/src/main/java/io/confluent/ksql/serde/kafka/KafkaSerdeFactory.java
Outdated
Show resolved
Hide resolved
ksql-serde/src/main/java/io/confluent/ksql/serde/kafka/KafkaSerdeFactory.java
Outdated
Show resolved
Hide resolved
ksql-serde/src/main/java/io/confluent/ksql/serde/kafka/KafkaSerdeFactory.java
Show resolved
Hide resolved
ksql-common/src/main/java/io/confluent/ksql/schema/ksql/SchemaConverters.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - I agree with Rohan's comment, so if you feel strongly I think we should justify it. Maybe we can use @vcrfxia's benchmarks?
ksql-functional-tests/src/main/java/io/confluent/ksql/test/serde/kafka/KafkaSerdeSupplier.java
Outdated
Show resolved
Hide resolved
ksql-functional-tests/src/main/java/io/confluent/ksql/test/serde/kafka/KafkaSerdeSupplier.java
Show resolved
Hide resolved
ksql-functional-tests/src/test/resources/query-validation-tests/elements.json
Show resolved
Hide resolved
ksql-serde/src/main/java/io/confluent/ksql/serde/delimited/KsqlDelimitedSerdeFactory.java
Show resolved
Hide resolved
ksql-serde/src/main/java/io/confluent/ksql/serde/delimited/KsqlDelimitedSerdeFactory.java
Outdated
Show resolved
Hide resolved
ksql-serde/src/main/java/io/confluent/ksql/serde/kafka/KafkaSerdeFactory.java
Outdated
Show resolved
Hide resolved
Conflicting files ksql-engine/src/test/java/io/confluent/ksql/analyzer/AnalyzerTest.java ksql-serde/src/main/java/io/confluent/ksql/serde/delimited/KsqlDelimitedSerdeFactory.java
Conflicting files ksql-engine/src/test/java/io/confluent/ksql/analyzer/AnalyzerTest.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Note: contains PR #3066
Part of the work to introduce primitive, then structured, keys.
Keys are currently assumed to be Kafka serialized Strings. When we introduce a way to specify a
KEY_FORMAT
there must be a way to declare the key is a Kafka serialized string if we're to maintain backwards compatibility. Also, many companies using Kafka serialized ints or longs as keys.This PR brings a new
KAFKA
format, which will use the appropriate standards Kafka serde classes to deserialize a primitive key or value, e.g.Will handle the case where the keys of the messages are longs that have been serialized using Kafka's
LongSerializer
class. (Or will when the rest of the associated work is also complete).The new
KAFKA
format supportsINT
,BIGINT
,DOUBLE
andSTRING
fields only, as that's the set of Kafka serde classes that match up to our KSQL types.The format only supports single values, i.e. only single field, being primarily intended for use as a key format.
However, users can use it as a value format too. But if they do so then they can't use the source in a statement with a
JOIN
orGROUP BY
clause. This is because such statements generally require repartition and changelog topics and such internal topics currently use the same value format as their source, i.e. they'd useKAFKA
value format, and they also currently copyROWTIME
andROWKEY
into the value schema, i.e. they have multiple fields in the value schema, andKAFKA
format can not support multiple fields...So I have explicitly disabled
JOIN
andGROUP BY
where theVALUE_FORMAT
isKAFKA
so that the user gets a more useful error message.In the future we can fix this by either/both:
This PR also enhanced the validation done on C* statements that use the
DELIMITED
value format. Previously a statement such as the one below would succeed:Even though
DELIMITED
does not support such complex types. Any C*AS statement built offFOO
would then fail with a cryptic error message. I picked this up when testing such statements for the new format, so fixed both.Testing done
Lots of appropriate tests added.
Reviewer checklist