Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: report window type and query status better from API #4313

Merged

Conversation

big-andy-coates
Copy link
Contributor

@big-andy-coates big-andy-coates commented Jan 14, 2020

Description

While testing the primitive keys stuff manually I can across a couple of issues, inconsistencies, poor UX and bugs, which this commit looks to fix.

This commit:

  1. exposes the window type of the key of a query/source, i.e. HOPPING, TUMBLING SESSION or none.
  2. makes the status of a query easier to find, so users can quickly know if a query has failed or is running.
  3. fixes a bug that meant the statement text of a query was not displayed in the CLI.

BREAKING CHANGE: The response from the RESTful API has changed for some commands with this commit: the SourceDescription type no longer has a format field. Instead it has keyFormat and valueFormat fields.

SHOW QUERY changes:

Response now includes a state property for each query that indicates the state of the query.

e.g.

{
  "queryString" : "create table OUTPUT as select * from INPUT;",
  "sinks" : [ "OUTPUT" ],
  "id" : "CSAS_OUTPUT_0",
  "state" : "Running"
}

The CLI output was:

 ksql> show queries;

  Query ID                   | Kafka Topic         | Query String

   CSAS_OUTPUT_0              | OUTPUT              | CREATE STREAM OUTPUT WITH (KAFKA_TOPIC='OUTPUT', PARTITIONS=1, REPLICAS=1) AS SELECT *
 FROM INPUT INPUT
 EMIT CHANGES;
  CTAS_CLICK_USER_SESSIONS_5 | CLICK_USER_SESSIONS | CREATE TABLE CLICK_USER_SESSIONS WITH (KAFKA_TOPIC='CLICK_USER_SESSIONS', PARTITIONS=1, REPLICAS=1) AS SELECT
   CLICKSTREAM.USERID USERID,
   COUNT(*) COUNT
 FROM CLICKSTREAM CLICKSTREAM
 WINDOW SESSION ( 300 SECONDS )
 GROUP BY CLICKSTREAM.USERID
 EMIT CHANGES;

 For detailed information on a Query run: EXPLAIN <Query ID>;

and is now:

 Query ID                   | Status      | Kafka Topic         | Query String

 CSAS_OUTPUT_0              | RUNNING     | OUTPUT              | CREATE STREAM OUTPUT WITH (KAFKA_TOPIC='OUTPUT', PARTITIONS=1, REPLICAS=1) AS SELECT *FROM INPUT INPUTEMIT CHANGES;

For detailed information on a Query run: EXPLAIN <Query ID>;

Note the addition of the Status column and the fact that Query String is now longer being written across multiple lines.

DESCRIBE <source>; changes:

old CLI output:

ksql> describe CLICK_USER_SESSIONS;

Name                 : CLICK_USER_SESSIONS
 Field   | Type

 ROWTIME | BIGINT           (system)
 ROWKEY  | INTEGER          (system)
 USERID  | INTEGER
 COUNT   | BIGINT

For runtime statistics and query details run: DESCRIBE EXTENDED <Stream,Table>;

New CLI output:

ksql> describe CLICK_USER_SESSIONS;

Name                 : CLICK_USER_SESSIONS
 Field   | Type

 ROWTIME | BIGINT           (system)
 ROWKEY  | INTEGER          (system) (Window type: SESSION)
 USERID  | INTEGER
 COUNT   | BIGINT

For runtime statistics and query details run: DESCRIBE EXTENDED <Stream,Table>;

Note the addition of the Window Type information.

DESCRIBE EXTENDED <source>; changes:

Old output:

ksql> describe extended CLICK_USER_SESSIONS;

Name                 : CLICK_USER_SESSIONS
Type                 : TABLE
Key field            : USERID
Key format           : STRING
Timestamp field      : Not set - using <ROWTIME>
Value Format                : JSON
Kafka topic          : CLICK_USER_SESSIONS (partitions: 1, replication: 1)
Statement            : CREATE TABLE CLICK_USER_SESSIONS WITH (KAFKA_TOPIC='CLICK_USER_SESSIONS', PARTITIONS=1, REPLICAS=1) AS SELECT
  CLICKSTREAM.USERID USERID,
  COUNT(*) COUNT
FROM CLICKSTREAM CLICKSTREAM
WINDOW SESSION ( 300 SECONDS )
GROUP BY CLICKSTREAM.USERID
EMIT CHANGES;

 Field   | Type

 ROWTIME | BIGINT           (system)
 ROWKEY  | INTEGER          (system)
 USERID  | INTEGER
 COUNT   | BIGINT

Queries that write from this TABLE
-----------------------------------
CTAS_CLICK_USER_SESSIONS_5 (RUNNING) : CREATE TABLE CLICK_USER_SESSIONS WITH (KAFKA_TOPIC='CLICK_USER_SESSIONS', PARTITIONS=1, REPLICAS=1) AS SELECT
 CLICKSTREAM.USERID USERID,
 COUNT(*) COUNT
 FROM CLICKSTREAM CLICKSTREAM
 WINDOW SESSION ( 300 SECONDS )
 GROUP BY CLICKSTREAM.USERID
 EMIT CHANGES;

For query topology and execution plan please run: EXPLAIN <QueryId>

Local runtime statistics
------------------------

(Statistics of the local KSQL server interaction with the Kafka topic CLICK_USER_SESSIONS)

New output:

ksql> describe extended CLICK_USER_SESSIONS;

Name                 : CLICK_USER_SESSIONS
Type                 : TABLE
Key field            : USERID
Timestamp field      : Not set - using <ROWTIME>
Key format           : KAFKA
Value format         : JSON
Kafka topic          : CLICK_USER_SESSIONS (partitions: 1, replication: 1)
Statement            : CREATE TABLE CLICK_USER_SESSIONS WITH (KAFKA_TOPIC='CLICK_USER_SESSIONS', PARTITIONS=1, REPLICAS=1) AS SELECT
  CLICKSTREAM.USERID USERID,
  COUNT(*) COUNT
FROM CLICKSTREAM CLICKSTREAM
WINDOW SESSION ( 300 SECONDS )
GROUP BY CLICKSTREAM.USERID
EMIT CHANGES;

 Field   | Type

 ROWTIME | BIGINT           (system)
 ROWKEY  | INTEGER          (system) (Window type: SESSION)
 USERID  | INTEGER
 COUNT   | BIGINT

Queries that write from this TABLE
-----------------------------------
CTAS_CLICK_USER_SESSIONS_5 (RUNNING) : CREATE TABLE CLICK_USER_SESSIONS WITH (KAFKA_TOPIC='CLICK_USER_SESSIONS', PARTITIONS=1, REPLICAS=1) AS SELECT  CLICKSTREAM.USERID USERID,  COUNT(*) COUNTFROM CLICKSTREAM CLICKSTREAMWINDOW SESSION ( 300 SECONDS ) GROUP BY CLICKSTREAM.USERIDEMIT CHANGES;

For query topology and execution plan please run: EXPLAIN <QueryId>

Local runtime statistics
------------------------

(Statistics of the local KSQL server interaction with the Kafka topic CLICK_USER_SESSIONS)

Note: the change from Key format of STRING to KAFKA. The output of Window Type information for windowed schemas and outputing sql statements on a single line.

Testing done

Describe the testing strategy. Unit and integration tests are expected for any behavior changes.

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

This commit:
1. exposes the window type of the key of a query/source, i.e. `HOPPING`, `TUMBLING` `SESSION` or none.
2. makes the status of a query easier to find.
3. fixes a bug that meant the statement text of a query was not displayed in the CLI.

BREAKING CHANGE: The response from the RESTful API has changed for some commands with this commit: the `SourceDescription` type no longer has a `format` field. Instead it has `keyFormat` and `valueFormat` fields.

## `SHOW QUERY` changes:

Response now includes a `state` property for each query that indicates the state of the query.

e.g.

```json
{
  "queryString" : "create table OUTPUT as select * from INPUT;",
  "sinks" : [ "OUTPUT" ],
  "id" : "CSAS_OUTPUT_0",
  "state" : "Running"
}
```

The CLI output was:

```
 ksql> show queries;

  Query ID                   | Kafka Topic         | Query String

   CSAS_OUTPUT_0              | OUTPUT              | CREATE STREAM OUTPUT WITH (KAFKA_TOPIC='OUTPUT', PARTITIONS=1, REPLICAS=1) AS SELECT *
 FROM INPUT INPUT
 EMIT CHANGES;
  CTAS_CLICK_USER_SESSIONS_5 | CLICK_USER_SESSIONS | CREATE TABLE CLICK_USER_SESSIONS WITH (KAFKA_TOPIC='CLICK_USER_SESSIONS', PARTITIONS=1, REPLICAS=1) AS SELECT
   CLICKSTREAM.USERID USERID,
   COUNT(*) COUNT
 FROM CLICKSTREAM CLICKSTREAM
 WINDOW SESSION ( 300 SECONDS )
 GROUP BY CLICKSTREAM.USERID
 EMIT CHANGES;

 For detailed information on a Query run: EXPLAIN <Query ID>;
```

and is now:

```
 Query ID                   | Status      | Kafka Topic         | Query String

 CSAS_OUTPUT_0              | RUNNING     | OUTPUT              | CREATE STREAM OUTPUT WITH (KAFKA_TOPIC='OUTPUT', PARTITIONS=1, REPLICAS=1) AS SELECT *FROM INPUT INPUTEMIT CHANGES;

For detailed information on a Query run: EXPLAIN <Query ID>;

```
Note the addition of the `Status` column and the fact that `Query String` is now longer being written across multiple lines.

## `DESCRIBE <source>;` changes:

old CLI output:

```
ksql> describe CLICK_USER_SESSIONS;

Name                 : CLICK_USER_SESSIONS
 Field   | Type

 ROWTIME | BIGINT           (system)
 ROWKEY  | INTEGER          (system)
 USERID  | INTEGER
 COUNT   | BIGINT

For runtime statistics and query details run: DESCRIBE EXTENDED <Stream,Table>;
```

New CLI output:

```
ksql> describe CLICK_USER_SESSIONS;

Name                 : CLICK_USER_SESSIONS
 Field   | Type

 ROWTIME | BIGINT           (system)
 ROWKEY  | INTEGER          (system) (Window type: SESSION)
 USERID  | INTEGER
 COUNT   | BIGINT

For runtime statistics and query details run: DESCRIBE EXTENDED <Stream,Table>;
```

Note the addition of the `Window Type` information.

The extended version of the command has also changed.

Old output:

```
ksql> describe extended CLICK_USER_SESSIONS;

Name                 : CLICK_USER_SESSIONS
Type                 : TABLE
Key field            : USERID
Key format           : STRING
Timestamp field      : Not set - using <ROWTIME>
Value Format                : JSON
Kafka topic          : CLICK_USER_SESSIONS (partitions: 1, replication: 1)
Statement            : CREATE TABLE CLICK_USER_SESSIONS WITH (KAFKA_TOPIC='CLICK_USER_SESSIONS', PARTITIONS=1, REPLICAS=1) AS SELECT
  CLICKSTREAM.USERID USERID,
  COUNT(*) COUNT
FROM CLICKSTREAM CLICKSTREAM
WINDOW SESSION ( 300 SECONDS )
GROUP BY CLICKSTREAM.USERID
EMIT CHANGES;

 Field   | Type

 ROWTIME | BIGINT           (system)
 ROWKEY  | INTEGER          (system)
 USERID  | INTEGER
 COUNT   | BIGINT

Queries that write from this TABLE
-----------------------------------
CTAS_CLICK_USER_SESSIONS_5 (RUNNING) : CREATE TABLE CLICK_USER_SESSIONS WITH (KAFKA_TOPIC='CLICK_USER_SESSIONS', PARTITIONS=1, REPLICAS=1) AS SELECT
 CLICKSTREAM.USERID USERID,
 COUNT(*) COUNT
 FROM CLICKSTREAM CLICKSTREAM
 WINDOW SESSION ( 300 SECONDS )
 GROUP BY CLICKSTREAM.USERID
 EMIT CHANGES;

For query topology and execution plan please run: EXPLAIN <QueryId>

Local runtime statistics
------------------------

(Statistics of the local KSQL server interaction with the Kafka topic CLICK_USER_SESSIONS)
```

New output:

```
ksql> describe extended CLICK_USER_SESSIONS;

Name                 : CLICK_USER_SESSIONS
Type                 : TABLE
Key field            : USERID
Timestamp field      : Not set - using <ROWTIME>
Key format           : KAFKA
Value format         : JSON
Kafka topic          : CLICK_USER_SESSIONS (partitions: 1, replication: 1)
Statement            : CREATE TABLE CLICK_USER_SESSIONS WITH (KAFKA_TOPIC='CLICK_USER_SESSIONS', PARTITIONS=1, REPLICAS=1) AS SELECT
  CLICKSTREAM.USERID USERID,
  COUNT(*) COUNT
FROM CLICKSTREAM CLICKSTREAM
WINDOW SESSION ( 300 SECONDS )
GROUP BY CLICKSTREAM.USERID
EMIT CHANGES;

 Field   | Type

 ROWTIME | BIGINT           (system)
 ROWKEY  | INTEGER          (system) (Window type: SESSION)
 USERID  | INTEGER
 COUNT   | BIGINT

Queries that write from this TABLE
-----------------------------------
CTAS_CLICK_USER_SESSIONS_5 (RUNNING) : CREATE TABLE CLICK_USER_SESSIONS WITH (KAFKA_TOPIC='CLICK_USER_SESSIONS', PARTITIONS=1, REPLICAS=1) AS SELECT  CLICKSTREAM.USERID USERID,  COUNT(*) COUNTFROM CLICKSTREAM CLICKSTREAMWINDOW SESSION ( 300 SECONDS ) GROUP BY CLICKSTREAM.USERIDEMIT CHANGES;

For query topology and execution plan please run: EXPLAIN <QueryId>

Local runtime statistics
------------------------

(Statistics of the local KSQL server interaction with the Kafka topic CLICK_USER_SESSIONS)
```

 Note: the change from `Key format` of `STRING` to `KAFKA`.  The output of `Window Type` information for windowed schemas and outputing sql statements on a single line.
@big-andy-coates big-andy-coates requested a review from a team as a code owner January 14, 2020 21:24
@vpapavas
Copy link
Member

Thanks @big-andy-coates! Couple of comments:

  1. In ksql> describe extended CLICK_USER_SESSIONS; the Statement field still has the ksql query in multi-line format. Do we want to be consistent and have it everywhere as a single line?

  2. In the tests, I see only a test that verifies the Running state. Do we want also tests for other states, like Error?

  3. Regarding the single line statement, wondering how it wraps after it reaches the window length? Is the output still readable (user-friendly?)

@vpapavas vpapavas self-assigned this Jan 14, 2020
@big-andy-coates
Copy link
Contributor Author

big-andy-coates commented Jan 15, 2020

Thanks @big-andy-coates! Couple of comments:

  1. In ksql> describe extended CLICK_USER_SESSIONS; the Statement field still has the ksql query in multi-line format. Do we want to be consistent and have it everywhere as a single line?

Personally, I think it's fine/a-good-thing that the SQL is multi-line when its not in a table with potentially multiple rows of data.

Putting that another way, the statement field is showing the SQL for this table. I think it's ok/useful for the SQL to be nicely formatted. (Well, nicely... ish).

However, for the read and write queries, where there can be multiple rows, I think multi-line SQL makes it hard to read. Have the SQL in a single line means one-line per query, which IMHO is much easier to understand.

  1. In the tests, I see only a test that verifies the Running state. Do we want also tests for other states, like Error?

Sure, though TBH it's just copying whatever text is provided. There are other existing tests that ensure QueryMetadata.getState is doing the right thing.

  1. Regarding the single line statement, wondering how it wraps after it reaches the window length? Is the output still readable (user-friendly?)

Good point, I'll test manually.

I'm not sure why you didn't approve this PR; none of your concerns seem to warrant not approving. Was this intentional?

@big-andy-coates big-andy-coates merged commit ca9368a into confluentinc:master Jan 16, 2020
@big-andy-coates big-andy-coates deleted the cli_improvements branch January 16, 2020 14:07
@big-andy-coates
Copy link
Contributor Author

Ack, merged without that manual testing... sorry. Will do it now...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants