Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ksqlDB CLI to use the Java client #6269

Open
vcrfxia opened this issue Sep 22, 2020 · 4 comments
Open

Update ksqlDB CLI to use the Java client #6269

vcrfxia opened this issue Sep 22, 2020 · 4 comments
Labels
query-engine Issues owned by the ksqlDB Query Engine team

Comments

@vcrfxia
Copy link
Contributor

vcrfxia commented Sep 22, 2020

We should update the ksqlDB CLI to use the Java client rather than having two codepaths to maintain, and since it'll be a great opportunity to dogfood the Java client. This would also allow us to deprecate the /query endpoint used by the CLI for push queries and remove another codepath there as well.

A prerequisite for doing this would be to enhance the Java client to support all the different types of statements that the REST API and CLI support, including connector management, the extended versions of admin commands, and other admin commands such as explaining queries.

We'd also have to decide how to handle the fact that the new /inserts-stream endpoint used by the Java client does not support expressions in INSERT VALUES statements as the /ksql endpoint currently used by the CLI for INSERT VALUES statements does. If we don't want to break backwards compatibility, we'd have to either enhance the /inserts-stream endpoint, introduce separate Java client methods for the different endpoints, or introduce logic into the CLI / Java client to determine where the requests should be sent (not recommended).

@apurvam
Copy link
Contributor

apurvam commented Sep 23, 2020

@vcrfxia to do a rough scoping with @colinhicks . We can prioritize after that.

@vcrfxia
Copy link
Contributor Author

vcrfxia commented Sep 24, 2020

Recording the remaining open questions after syncing with @colinhicks below.

Should we migrate INSERT VALUES to use the Java client?

The new HTTP/2 endpoint for inserting values differs from the old endpoint in two main ways:

  • the new endpoint only allows insertion into streams, not tables.
  • the new endpoint only supports inserting literals, not expressions.
    The first point is easy to change if we want to, but it's unclear that we do since inserting into a table today only inserts into the underlying Kafka topic and not the corresponding state store (if materialized) which is confusing.

The second point is more work to reconcile. Options include:

  • enhancing the HTTP/2 endpoint to support expressions. Open question about what the interface on the Java client should look like if we do this
  • drop support for inserting expressions from the CLI
  • do not migrate the CLI to the new HTTP/2 endpoint

Should we maintain backwards compatibility between the new CLI (post-migration) and older server versions?

The HTTP/2 endpoints used by the Java client were only introduced in recent ksqlDB releases, and we'll likely need to make additional server-side changes to enable the CLI migration as well. Maintaining compatibility between the new CLI post-migration and older server versions would be significantly more work and likely not worth it.

How should we expose command sequence number or equivalent functionality in the Java client?

The CLI supports requiring the server to have executed all previously issued statements before processing new statements/queries. This is achieved using the command sequence number functionality which is not currently exposed by the client interfaces. Options include:

  • expose command sequence number from client interfaces. Downside is that it clutters the interfaces for very niche use cases
  • expose a different interface that can use command sequence number under the hood. For example, we could expose a method that essentially says "wait until all previously issued statements have executed" to limit cluttering of the interfaces.
    Something like this is likely needed for the SQL file tool @colinhicks and @spena are planning on working on next quarter anyway.

How should the Java client interfaces expose warnings returned from KsqlEntity?

Again to avoid cluttering the client interfaces, the Java client does not currently expose warnings that may be returned with server responses as the use cases are extremely niche, but we'd need this in order to have parity with the CLI, and it's probably good to do at some point anyway. Adding a list of potential warnings to the different return types from client methods doesn't feel right in light of the fact that the warnings will almost always be empty (and will be guaranteed to be empty today for many methods).

@vcrfxia
Copy link
Contributor Author

vcrfxia commented Sep 24, 2020

In terms of estimated LOE for this migration, I think we're looking at roughly 16-18 person-(calendar-)weeks depending on the approaches we take for addressing some of the open questions above. Here's an example breakdown, where numbers next to tasks are in person-days:

additional server functionality:

  • add HTTP/2 endpoint for print topic (5)
  • expose use of command sequence number from /query-stream endpoint (2)
  • enhance /inserts-stream endpoint to support expressions? (5)

additional client functionality:

  • refactor client to use jackson for parsing server responses: Refactor Java client to use Jackson for parsing server responses #6042 (6)
  • connector management: create, show, drop, describe (5)
  • explain query, explain expression, describe function (2)
  • show functions, show properties, show types (2)
  • extended versions of admin commands (list streams extended, etc) (3)
  • expose warnings returned with KsqlEntity (4)
  • expose use of command sequence number (3-5, depending on the approach we decide on)
  • support for inserting expressions? (3)

cli migration:

  • set up java client and framework for mapping different types of statements to different methods (3-4)
  • migrate push queries (6)
  • migrate print topic (2)
  • migrate pull queries (3)
  • migrate DDL/DML statements (3) + test coverage (2)
  • migrate admin statements (6) + test coverage (4)
  • migrate insert values (2)
  • handle run script (3)
  • handle CLI specific commands (e.g., remote server command), not including request pipelining (2)
  • handle/verify CLI options (1)
  • handle request pipelining (2-5, depending on the approach we decide on)

extra test coverage for CLI usage (5)
extra time sprinkled throughout for familiarizing with CLI code and settling on new interfaces and designs (5)

Many of the tasks are parallelizable.

@agavra agavra added query-engine Issues owned by the ksqlDB Query Engine team needs-triage labels Mar 10, 2021
@agavra
Copy link
Contributor

agavra commented Mar 10, 2021

Thanks for the writeup @vcrfxia! Glad we had this record for when we wanted to go ahead and start working on the migration 😄

There are benefits of partially migrating the CLI to use the java client so that we can phase out the old HTTP/1 /query endpoint in favor of the /query-stream endpoint. I'll try to split this ticket up into the components required to make that change (even if we don't support many other items such as INSERT VALUES or admin commands).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
query-engine Issues owned by the ksqlDB Query Engine team
Projects
None yet
Development

No branches or pull requests

3 participants