From 30408a9c192c5f4eaaf42f01f0ffbfffd705aa57 Mon Sep 17 00:00:00 2001
From: Ryan Blue
If a block's count is negative, its absolute value is used,
and the count is followed immediately by a long
block size indicating the number of bytes in the
block. This block size permits fast skipping through data,
e.g., when projecting a record to a subset of its fields.
The blocked representation permits one to read and write maps larger than can be buffered in memory, since one can start writing items without knowing the full length of the map.
- +In some situations a single Avro serialized object is to be stored for a + longer period of time. One very common example is storing Avro records + for several weeks in an Apache Kafka topic.
+In the period after a schema change this persistance system will contain records + that have been written with different schemas. So the need arises to know which schema + was used to write a record to support schema evolution correctly. + In most cases the schema itself is too large to include in the message, + so this binary wrapper format supports the use case more effectively.
+ +Single Avro objects are encoded as follows:
+C3 01
, to show that the message is Avro and uses this single-record format (version 1).Implementations use the 2-byte marker to determine whether a payload is Avro. + This check helps avoid expensive lookups that resolve the schema from a + fingerprint, when the message is not an encoded Avro payload.
+ +"[A] fingerprinting algorithm is a procedure that maps an