Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to create JSON stream with Schema Registry #4789

Closed
vcrfxia opened this issue Mar 17, 2020 · 8 comments · Fixed by #4791
Closed

Unable to create JSON stream with Schema Registry #4789

vcrfxia opened this issue Mar 17, 2020 · 8 comments · Fixed by #4791
Assignees
Milestone

Comments

@vcrfxia
Copy link
Contributor

vcrfxia commented Mar 17, 2020

Describe the bug

On the latest master, when KSQL is started with Schema Registry, CREATE STREAM statements with value format JSON fail with the error message Could not register schema for topic. which is a bug since the JSON format should work independent of Schema Registry (no schemas should be registered at all).

To Reproduce

On the current master:

  • Enable Schema Registry integration by uncommenting ksql.schema.registry.url=http://localhost:8081 from the server properties file
  • Start the ksqlDB server
  • Create a topic, e.g., locations
  • Start the CLI and issue a CREATE STREAM statement such as CREATE STREAM riderLocations (profileId VARCHAR, latitude DOUBLE, longitude DOUBLE) WITH (kafka_topic='locations', value_format='json');

Expected behavior

The stream should be created successfully.

Actual behaviour

The CLI shows the following error message Could not register schema for topic.

There's nothing obvious in the server logs with the default logging configs.

Additional context

I think this has to do with the recently added support for JSON with Schema Registry. It's as if the JSON format is being interpreted as JSON_SR and something is going wrong when attempting to register a schema. Need to debug further to understand what's going on.

@vcrfxia vcrfxia added the bug label Mar 17, 2020
@vcrfxia vcrfxia added this to the 0.8.0 milestone Mar 17, 2020
@agavra agavra self-assigned this Mar 17, 2020
@vcrfxia
Copy link
Contributor Author

vcrfxia commented Mar 17, 2020

Looked into this with @agavra and found the issue:

The intended behavior is that both JSON and JSON_SR formats support schema inference when Schema Registry is configured, and both also register schemas to Schema Registry (for inference by other streams/tables down the road). The difference between the two formats is that JSON_SR serializes data with the Schema Registry magic byte prepended, whereas the regular JSON format serializes data without the magic byte (as vanilla JSON).

However, pre-5.5 versions of Schema Registry do not support JSON schemas, which means the current behavior breaks compatibility of ksqlDB with older Schema Registry versions since when Schema Registry is configured and a JSON stream is created, ksqlDB will try to register the schema with Schema Registry, which throws an exception (Unrecognized field: schemaType; error code: 422) since JSON schemas are not supported.

Options going forward include:

  • A quick fix of removing schema inference for the JSON format, which means the JSON format also won't attempt to register schemas with Schema Registry, and backwards compatibility will be restored.
  • A more involved fix of detecting old Schema Registry versions that do not support JSON and not registering JSON schemas only in this case. Bonus points for also adding a config to control whether the JSON format registers schemas with schema registry, in case users of newer Schema Registry versions just want vanilla JSON and don't want JSON schemas registered.

Chatting with @agavra we think it makes sense to pursue the quick fix for the ksqlDB 0.8.0 and CP 5.5.0 releases, and looking into the more involved fix for future releases. WDYT? @MichaelDrogalis @derekjn @apurvam ?

UPDATE: This is only a problem on master and not 5.5 (see discussion below).

@apurvam
Copy link
Contributor

apurvam commented Mar 17, 2020

So with the quick fix, JSON format will behave exactly as in previous versions: no schema inference, no backward compatibility checks,etc

The JSON_SR format has all those goodies, and only works with CP 5.5.

What is the error message if JSON_SR is used with older CP versions? Seems to me without version detection and with lazy registration of schemas we will not be able to provide good UX right ?

Finally, how did we find thid?

@vcrfxia
Copy link
Contributor Author

vcrfxia commented Mar 17, 2020

So with the quick fix, JSON format will behave exactly as in previous versions: no schema inference, no backward compatibility checks,etc

The JSON_SR format has all those goodies, and only works with CP 5.5.

Correct.

What is the error message if JSON_SR is used with older CP versions? Seems to me without version detection and with lazy registration of schemas we will not be able to provide good UX right ?

Schemas are now registered at topic creation time, not lazily (see #4717), which is why the bug reported in this issue causes the CREATE STREAM statement to fail.

The error message if JSON_SR is used with an older CP version is the same as the one in this bug report: Could not register schema for topic.

Finally, how did we find this?

After I cut a candidate release image I tried running through the ksqlDB quickstart as a sanity check, which failed on the first statement since I was using a docker compose file with 5.4.1 Schema Registry.

@apurvam
Copy link
Contributor

apurvam commented Mar 17, 2020

Cool. Thanks for the details. The quick fix is fine by me. Though I’m not sure that #4717 is on 5.5

@agavra
Copy link
Contributor

agavra commented Mar 17, 2020

Confirmed that this isn't an issue in 5.5 (I was sure I had tested exactly this!) - it was introduced by #4717

ksql> CREATE STREAM json (id VARCHAR) WITH (kafka_topic='json', value_format='JSON', partitions=1);

 Message
----------------
 Stream created
----------------
ksql> CREATE STREAM json_sr (id VARCHAR) WITH (kafka_topic='json_sr', value_format='JSON_SR', partitions=1);

 Message
----------------
 Stream created
----------------
ksql> INSERT INTO json (id) VALUES ('id');
ksql> INSERT INTO json_sr (id) VALUES ('id');
Failed to insert values into 'JSON_SR'. Could not serialize row: [ 'id' ]

The second fails because SR deployed is 5.4 (error message is not great, but there's a separate ticket to fix that).

@vcrfxia
Copy link
Contributor Author

vcrfxia commented Mar 17, 2020

Good call -- I'll close the change I targeted at 5.5, and only merge the one targeted at master. Thanks for the catch!

@apurvam
Copy link
Contributor

apurvam commented Mar 17, 2020

Awesome! Thanks @vcrfxia and @agavra !

@vcrfxia
Copy link
Contributor Author

vcrfxia commented Mar 17, 2020

Closing this issue since the quick fix has been implemented. Created another JIRA to track the more involved fix going forward: #4802

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants