-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PRINT TOPIC should determine KEY format #4258
Milestone
Comments
2 tasks
Re-opening as there remains some unfinished tasks. |
big-andy-coates
added a commit
to big-andy-coates/ksql
that referenced
this issue
Feb 13, 2020
fixes: confluentinc#4258 With this change `PRINT TOPIC` has enhanced detection for the key and value formats of a topic. The command starts with a list of known formats for both the key and the value and refines this list as it sees more data. As the list of possible formats is refined over time the command will output the reduced list. For example, you may see output such as: ``` ksql> PRINT some_topic FROM BEGINNING; Key format: JSON or SESSION(KAFKA_STRING) or HOPPING(KAFKA_STRING) or TUMBLING(KAFKA_STRING) or KAFKA_STRING Value format: JSON or KAFKA_STRING rowtime: 12/21/18 23:58:42 PM PSD, key: stream/CLICKSTREAM/create, value: {statement":"CREATE STREAM clickstream (_time bigint,time varchar, ip varchar, request varchar, status int, userid int, bytes bigint, agent varchar) with (kafka_topic = 'clickstream', value_format = 'json');","streamsProperties":{}} rowtime: 12/21/18 23:58:42 PM PSD, key: table/EVENTS_PER_MIN/create, value: {"statement":"create table events_per_min as select userid, count(*) as events from clickstream window TUMBLING (size 10 second) group by userid EMIT CHANGES;","streamsProperties":{}} Key format: KAFKA_STRING ... ``` In the last line of the above output the command has narrowed the key format down as it has proceeded more data. The command has also been updated to only detect valid UTF8 encoded text as type `JSON` or `KAFKA_STRING`. This is inline with how KSQL would later deserialize the data. If no known format can successfully deserialize the data it is printed as a combination of ASCII characters and hex encoded bytes.
2 tasks
big-andy-coates
added a commit
that referenced
this issue
Feb 14, 2020
* feat: enhance `PRINT TOPIC`'s format detection fixes: #4258 With this change `PRINT TOPIC` has enhanced detection for the key and value formats of a topic. The command starts with a list of known formats for both the key and the value and refines this list as it sees more data. As the list of possible formats is refined over time the command will output the reduced list. For example, you may see output such as: ``` ksql> PRINT some_topic FROM BEGINNING; Key format: JSON or SESSION(KAFKA_STRING) or HOPPING(KAFKA_STRING) or TUMBLING(KAFKA_STRING) or KAFKA_STRING Value format: JSON or KAFKA_STRING rowtime: 12/21/18 23:58:42 PM PSD, key: stream/CLICKSTREAM/create, value: {statement":"CREATE STREAM clickstream (_time bigint,time varchar, ip varchar, request varchar, status int, userid int, bytes bigint, agent varchar) with (kafka_topic = 'clickstream', value_format = 'json');","streamsProperties":{}} rowtime: 12/21/18 23:58:42 PM PSD, key: table/EVENTS_PER_MIN/create, value: {"statement":"create table events_per_min as select userid, count(*) as events from clickstream window TUMBLING (size 10 second) group by userid EMIT CHANGES;","streamsProperties":{}} Key format: KAFKA_STRING ... ``` In the last line of the above output the command has narrowed the key format down as it has proceeded more data. The command has also been updated to only detect valid UTF8 encoded text as type `JSON` or `KAFKA_STRING`. This is inline with how KSQL would later deserialize the data. If no known format can successfully deserialize the data it is printed as a combination of ASCII characters and hex encoded bytes.
big-andy-coates
added a commit
that referenced
this issue
Feb 20, 2020
* feat: enhance `PRINT TOPIC`'s format detection fixes: #4258 With this change `PRINT TOPIC` has enhanced detection for the key and value formats of a topic. The command starts with a list of known formats for both the key and the value and refines this list as it sees more data. As the list of possible formats is refined over time the command will output the reduced list. For example, you may see output such as: ``` ksql> PRINT some_topic FROM BEGINNING; Key format: JSON or SESSION(KAFKA_STRING) or HOPPING(KAFKA_STRING) or TUMBLING(KAFKA_STRING) or KAFKA_STRING Value format: JSON or KAFKA_STRING rowtime: 12/21/18 23:58:42 PM PSD, key: stream/CLICKSTREAM/create, value: {statement":"CREATE STREAM clickstream (_time bigint,time varchar, ip varchar, request varchar, status int, userid int, bytes bigint, agent varchar) with (kafka_topic = 'clickstream', value_format = 'json');","streamsProperties":{}} rowtime: 12/21/18 23:58:42 PM PSD, key: table/EVENTS_PER_MIN/create, value: {"statement":"create table events_per_min as select userid, count(*) as events from clickstream window TUMBLING (size 10 second) group by userid EMIT CHANGES;","streamsProperties":{}} Key format: KAFKA_STRING ... ``` In the last line of the above output the command has narrowed the key format down as it has proceeded more data. The command has also been updated to only detect valid UTF8 encoded text as type `JSON` or `KAFKA_STRING`. This is inline with how KSQL would later deserialize the data. If no known format can successfully deserialize the data it is printed as a combination of ASCII characters and hex encoded bytes. (cherry picked from commit a3fae28)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
At the moment
PRINT TOPIC
will try to determine the format, (JSON, AVRO, DELIMITED, etc), of the value in the records in a Kafka topic. However, the key may also be something other than a KAFKA serialized STRING.PRINT TOPIC
is often used as a debugging tool. As such, it would be useful if:The text was updated successfully, but these errors were encountered: