Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix schema_registry_decode processor #1198

Merged

Conversation

mihaitodor
Copy link
Collaborator

@mihaitodor mihaitodor commented Apr 5, 2022

When dealing with AVRO logical types, we can't serialise the native struct emitted by goavro directly to JSON, since that will discard schema information that codec.TextualFromNative uses to produce the expected JSON.

Also, the tip version of goavro is required because the fix for this regression linkedin/goavro#242 was merged, but there isn't a tagged release yet.

Fixes #1161.

For a bit of extra detail, when converting this JSON to binary AVRO using the schema from the test I added (see #1161 for steps on how to do that):

{"int_time_millis":{"int": 35245000},"long_time_micros":{"long":20192000000000},"long_timestamp_micros":{"long":62135596800000000},"pos_0_33333333":{"bytes":"!"}}

and then we skip the codec.TextualFromNative call when passing it through schema_registry_decode, we get the following serialised JSON from Benthos:

{"int_time_millis":{"int.time-millis":35245000000000},"long_time_micros":{"long.time-micros":20192000000000000},"long_timestamp_micros":{"long.timestamp-micros":"3939-01-01T00:00:00Z"},"pos_0_33333333":{"bytes.decimal":"33/100"}}

which doesn't look like something that users will expect, but not sure. I have very limited experience with AVRO.

On the other hand, calling codec.TextualFromNative produces a JSON string which will then need to be unmarshalled back to a native struct if there are any subsequent processors. This is wasteful, but goavro doesn't seem to have any mechanism for crafting a native struct that is compatible with Benthos. I'm not sure right now what bloblang does under the hood if we call Message.SetStructured() with a native struct that contains types such as time.Time, time.Duration, math/big.Rat etc. Is that supported?

@mihaitodor mihaitodor requested a review from Jeffail April 5, 2022 19:53
When dealing with logical types, we can't serialise the native
struct emitted by goavro directly to JSON, since that will discard
schema information that `codec.TextualFromNative` uses to produce
the expected JSON.

Also, the tip version of goavro is required because the fix for
this regression linkedin/goavro#242 was
merged, but there isn't a tagged release yet.

Fixes redpanda-data#1161.
@mihaitodor mihaitodor force-pushed the fix-schema-registry-decode-processor branch from 587a07c to 6e8022b Compare April 5, 2022 20:00
@Jeffail
Copy link
Collaborator

Jeffail commented Apr 6, 2022

Yeah I think this is the best we can do for now without making some custom component to walk the structure for custom types. In bloblang we can technically deal with any types for basic value getting, but really we need proper basic values in order for basic methods and arithmetic to work as expected.

It's not the end of the world if users make plugins that introduce odd types but our own processors definitely shouldn't as the user would have no way to work out what's happening.

@Jeffail Jeffail merged commit a58fac0 into redpanda-data:main Apr 6, 2022
@mihaitodor
Copy link
Collaborator Author

Sounds good, thanks for the quick merge! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AVRO schema_registry_decode issue
2 participants