diff --git a/CHANGES.txt b/CHANGES.txt index 19f921b058e..3e329aaac4c 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -8,6 +8,8 @@ Trunk (not yet released) AVRO-1704: Java: Add support for single-message encoding. (blue) + AVRO-1704: Spec: Add single-message encoding format. (Niels Basjes via blue) + OPTIMIZATIONS IMPROVEMENTS diff --git a/doc/src/content/xdocs/spec.xml b/doc/src/content/xdocs/spec.xml index ec1f1999726..917d314e7e2 100644 --- a/doc/src/content/xdocs/spec.xml +++ b/doc/src/content/xdocs/spec.xml @@ -487,18 +487,18 @@ value, followed by that many key/value pairs. A block with count zero indicates the end of the map. Each item is encoded per the map's value schema.

- +

If a block's count is negative, its absolute value is used, and the count is followed immediately by a long block size indicating the number of bytes in the block. This block size permits fast skipping through data, e.g., when projecting a record to a subset of its fields.

- +

The blocked representation permits one to read and write maps larger than can be buffered in memory, since one can start writing items without knowing the full length of the map.

- +
@@ -569,6 +569,34 @@
+
+ Single-object encoding + +

In some situations a single Avro serialized object is to be stored for a + longer period of time. One very common example is storing Avro records + for several weeks in an Apache Kafka topic.

+

In the period after a schema change this persistance system will contain records + that have been written with different schemas. So the need arises to know which schema + was used to write a record to support schema evolution correctly. + In most cases the schema itself is too large to include in the message, + so this binary wrapper format supports the use case more effectively.

+ +
+ Single object encoding specification +

Single Avro objects are encoded as follows:

+
    +
  1. A two-byte marker, C3 01, to show that the message is Avro and uses this single-record format (version 1).
  2. +
  3. The 8-byte little-endian CRC-64-AVRO fingerprint of the object's schema
  4. +
  5. The Avro object encoded using Avro's binary encoding
  6. +
+
+ +

Implementations use the 2-byte marker to determine whether a payload is Avro. + This check helps avoid expensive lookups that resolve the schema from a + fingerprint, when the message is not an encoded Avro payload.

+ +
+
@@ -1237,7 +1265,7 @@
-
+
Schema Fingerprints

"[A] fingerprinting algorithm is a procedure that maps an