Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: allow adjusting array/map block header format #264

Closed
meandnano opened this issue Jun 8, 2023 · 1 comment
Closed

Proposal: allow adjusting array/map block header format #264

meandnano opened this issue Jun 8, 2023 · 1 comment

Comments

@meandnano
Copy link
Contributor

meandnano commented Jun 8, 2023

Hello,

First of all, thanks for the great library. We've been using it for a number of years now on a large scale project, and it's been working great.

Recently I have ported another service which encodes/decodes avro messages from scala to go using hamba/avro and noticed the difference in OCF files produced by this 2 logically similar services. The noticeable difference was in the fact that even though OCF files are valid and contain the same data, the consumer of OCF (in our case, BigQuery) was not always able to read the files produced by the go service (failing with an error that explains nothing).

After long investigation and debugging it was found that the difference was in the way the 2 services were encoding arrays and maps.
According to Avro specification encoded array blocks are being prefixed with either 1 or 2 long values: block length (mandatory) and block size in bytes (optional). The official java implementation of Avro library uses singe long value (block length), while hamba/avro always writes both (negative block length and block size in bytes). Adjusting hamba/avro to write only block length has fixed the issue.

While this is not a bug report and solely a problem of whatever decoder BigQuery uses (and we are reporting it to them right now), I think it would be nice to have a support for both ways in hamba/avro.

This could be achieved with one more configuration option (which will default to the current behavior). The change is trivial and
could be found in my fork along with docs and tests, if this
proposal is considered legit, I can create a PR.

Cheers!

@nrwiersma
Copy link
Member

Hi,

A PR for this would be welcomed. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants