Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PRINT TOPIC should determine KEY format #4258

Closed
big-andy-coates opened this issue Jan 9, 2020 · 1 comment · Fixed by #4507 or #4551
Closed

PRINT TOPIC should determine KEY format #4258

big-andy-coates opened this issue Jan 9, 2020 · 1 comment · Fixed by #4507 or #4551
Assignees
Milestone

Comments

@big-andy-coates
Copy link
Contributor

At the moment PRINT TOPIC will try to determine the format, (JSON, AVRO, DELIMITED, etc), of the value in the records in a Kafka topic. However, the key may also be something other than a KAFKA serialized STRING.

PRINT TOPIC is often used as a debugging tool. As such, it would be useful if:

  1. It tried to determine the format of the key
  2. It output the key, not just the value. (I think it currently only outputs the value).
  3. It warned the user if the key format / schema is not supported by KSQL
@big-andy-coates
Copy link
Contributor Author

Re-opening as there remains some unfinished tasks.

@big-andy-coates big-andy-coates modified the milestones: 0.7.0, 0.8.0 Feb 11, 2020
big-andy-coates added a commit to big-andy-coates/ksql that referenced this issue Feb 13, 2020
fixes: confluentinc#4258

With this change `PRINT TOPIC` has enhanced detection for the key and value formats of a topic.
The command starts with a list of known formats for both the key and the value and refines this
list as it sees more data.

As the list of possible formats is refined over time the command will output the reduced list.
For example, you may see output such as:

```
ksql> PRINT some_topic FROM BEGINNING;
Key format: JSON or SESSION(KAFKA_STRING) or HOPPING(KAFKA_STRING) or TUMBLING(KAFKA_STRING) or KAFKA_STRING
Value format: JSON or KAFKA_STRING
rowtime: 12/21/18 23:58:42 PM PSD, key: stream/CLICKSTREAM/create, value: {statement":"CREATE STREAM clickstream (_time bigint,time varchar, ip varchar, request varchar, status int, userid int, bytes bigint, agent varchar) with (kafka_topic = 'clickstream', value_format = 'json');","streamsProperties":{}}
rowtime: 12/21/18 23:58:42 PM PSD, key: table/EVENTS_PER_MIN/create, value: {"statement":"create table events_per_min as select userid, count(*) as events from clickstream window  TUMBLING (size 10 second) group by userid EMIT CHANGES;","streamsProperties":{}}
Key format: KAFKA_STRING
...
```

In the last line of the above output the command has narrowed the key format down as it has proceeded more data.

The command has also been updated to only detect valid UTF8 encoded text as type `JSON` or `KAFKA_STRING`.
This is inline with how KSQL would later deserialize the data.

If no known format can successfully deserialize the data it is printed as a combination of ASCII characters and hex encoded bytes.
big-andy-coates added a commit that referenced this issue Feb 14, 2020
* feat: enhance `PRINT TOPIC`'s format detection

fixes: #4258

With this change `PRINT TOPIC` has enhanced detection for the key and value formats of a topic.
The command starts with a list of known formats for both the key and the value and refines this
list as it sees more data.

As the list of possible formats is refined over time the command will output the reduced list.
For example, you may see output such as:

```
ksql> PRINT some_topic FROM BEGINNING;
Key format: JSON or SESSION(KAFKA_STRING) or HOPPING(KAFKA_STRING) or TUMBLING(KAFKA_STRING) or KAFKA_STRING
Value format: JSON or KAFKA_STRING
rowtime: 12/21/18 23:58:42 PM PSD, key: stream/CLICKSTREAM/create, value: {statement":"CREATE STREAM clickstream (_time bigint,time varchar, ip varchar, request varchar, status int, userid int, bytes bigint, agent varchar) with (kafka_topic = 'clickstream', value_format = 'json');","streamsProperties":{}}
rowtime: 12/21/18 23:58:42 PM PSD, key: table/EVENTS_PER_MIN/create, value: {"statement":"create table events_per_min as select userid, count(*) as events from clickstream window  TUMBLING (size 10 second) group by userid EMIT CHANGES;","streamsProperties":{}}
Key format: KAFKA_STRING
...
```

In the last line of the above output the command has narrowed the key format down as it has proceeded more data.

The command has also been updated to only detect valid UTF8 encoded text as type `JSON` or `KAFKA_STRING`.
This is inline with how KSQL would later deserialize the data.

If no known format can successfully deserialize the data it is printed as a combination of ASCII characters and hex encoded bytes.
big-andy-coates added a commit that referenced this issue Feb 20, 2020
* feat: enhance `PRINT TOPIC`'s format detection

fixes: #4258

With this change `PRINT TOPIC` has enhanced detection for the key and value formats of a topic.
The command starts with a list of known formats for both the key and the value and refines this
list as it sees more data.

As the list of possible formats is refined over time the command will output the reduced list.
For example, you may see output such as:

```
ksql> PRINT some_topic FROM BEGINNING;
Key format: JSON or SESSION(KAFKA_STRING) or HOPPING(KAFKA_STRING) or TUMBLING(KAFKA_STRING) or KAFKA_STRING
Value format: JSON or KAFKA_STRING
rowtime: 12/21/18 23:58:42 PM PSD, key: stream/CLICKSTREAM/create, value: {statement":"CREATE STREAM clickstream (_time bigint,time varchar, ip varchar, request varchar, status int, userid int, bytes bigint, agent varchar) with (kafka_topic = 'clickstream', value_format = 'json');","streamsProperties":{}}
rowtime: 12/21/18 23:58:42 PM PSD, key: table/EVENTS_PER_MIN/create, value: {"statement":"create table events_per_min as select userid, count(*) as events from clickstream window  TUMBLING (size 10 second) group by userid EMIT CHANGES;","streamsProperties":{}}
Key format: KAFKA_STRING
...
```

In the last line of the above output the command has narrowed the key format down as it has proceeded more data.

The command has also been updated to only detect valid UTF8 encoded text as type `JSON` or `KAFKA_STRING`.
This is inline with how KSQL would later deserialize the data.

If no known format can successfully deserialize the data it is printed as a combination of ASCII characters and hex encoded bytes.

(cherry picked from commit a3fae28)
@big-andy-coates big-andy-coates self-assigned this Mar 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant