You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thanks for the great library. We've been using it for a number of years now on a large scale project, and it's been working great.
Recently I have ported another service which encodes/decodes avro messages from scala to go using hamba/avro and noticed the difference in OCF files produced by this 2 logically similar services. The noticeable difference was in the fact that even though OCF files are valid and contain the same data, the consumer of OCF (in our case, BigQuery) was not always able to read the files produced by the go service (failing with an error that explains nothing).
After long investigation and debugging it was found that the difference was in the way the 2 services were encoding arrays and maps.
According to Avro specification encoded array blocks are being prefixed with either 1 or 2 long values: block length (mandatory) and block size in bytes (optional). The official java implementation of Avro library uses singe long value (block length), while hamba/avro always writes both (negative block length and block size in bytes). Adjusting hamba/avro to write only block length has fixed the issue.
While this is not a bug report and solely a problem of whatever decoder BigQuery uses (and we are reporting it to them right now), I think it would be nice to have a support for both ways in hamba/avro.
This could be achieved with one more configuration option (which will default to the current behavior). The change is trivial and could be found in my fork along with docs and tests, if this
proposal is considered legit, I can create a PR.
Cheers!
The text was updated successfully, but these errors were encountered:
Hello,
First of all, thanks for the great library. We've been using it for a number of years now on a large scale project, and it's been working great.
Recently I have ported another service which encodes/decodes avro messages from scala to go using hamba/avro and noticed the difference in OCF files produced by this 2 logically similar services. The noticeable difference was in the fact that even though OCF files are valid and contain the same data, the consumer of OCF (in our case, BigQuery) was not always able to read the files produced by the go service (failing with an error that explains nothing).
After long investigation and debugging it was found that the difference was in the way the 2 services were encoding arrays and maps.
According to Avro specification encoded array blocks are being prefixed with either 1 or 2 long values: block length (mandatory) and block size in bytes (optional). The official java implementation of Avro library uses singe long value (block length), while hamba/avro always writes both (negative block length and block size in bytes). Adjusting hamba/avro to write only block length has fixed the issue.
While this is not a bug report and solely a problem of whatever decoder BigQuery uses (and we are reporting it to them right now), I think it would be nice to have a support for both ways in hamba/avro.
This could be achieved with one more configuration option (which will default to the current behavior). The change is trivial and
could be found in my fork along with docs and tests, if this
proposal is considered legit, I can create a PR.
Cheers!
The text was updated successfully, but these errors were encountered: