diff --git a/contrib/seekable_format/zstd_seekable_compression_format.md b/contrib/seekable_format/zstd_seekable_compression_format.md index 55aebfd2e9d..5b87c7f7016 100644 --- a/contrib/seekable_format/zstd_seekable_compression_format.md +++ b/contrib/seekable_format/zstd_seekable_compression_format.md @@ -14,7 +14,7 @@ are clearly marked. Distribution of this document is unlimited. ### Version -0.1.0 (11/04/17) +0.2.0 (31/07/21) ## Introduction This document defines a format for compressed data to be stored so that subranges of the data can be efficiently decompressed without requiring the entire document to be decompressed. @@ -78,26 +78,31 @@ A bitfield describing the format of the seek table. | Bit number | Field name | | ---------- | ---------- | -| 7 | `Checksum_Flag` | -| 6-2 | `Reserved_Bits` | +| 7 | `XXH64_Checksum_Flag` | +| 6 | `SHA512-256_Checksum_Flag`| +| 5-2 | `Reserved_Bits` | | 1-0 | `Unused_Bits` | -While only `Checksum_Flag` currently exists, there are 7 other bits in this field that can be used for future changes to the format, +While only `Checksum_Flag` currently exists, there are 6 other bits in this field that can be used for future changes to the format, for example the addition of inline dictionaries. -__`Checksum_Flag`__ +__`XXH64_Checksum_Flag`__ If the checksum flag is set, each of the seek table entries contains a 4 byte checksum of the uncompressed data contained in its frame. +__`SHA512-256_Checksum_Flag`__ + +If the checksum flag is set, each of the seek table entries contains a 32 byte SHA-512/256 checksum of the uncompressed data contained in its frame. + `Reserved_Bits` are not currently used but may be used in the future for breaking changes, so a compliant decoder should ensure they are set to 0. `Unused_Bits` may be used in the future for non-breaking changes, so a compliant decoder should not interpret these bits. #### __`Seek_Table_Entries`__ `Seek_Table_Entries` consists of `Number_Of_Frames` (one for each frame in the data, not including the seek table frame) entries of the following form, in sequence: -|`Compressed_Size`|`Decompressed_Size`|`[Checksum]`| -|-----------------|-------------------|------------| -| 4 bytes | 4 bytes | 4 bytes | +|`Compressed_Size`|`Decompressed_Size`|`[XXH64_Checksum]`|`[SHA512-256_Checksum]`| +|-----------------|-------------------|------------------|-----------------------| +| 4 bytes | 4 bytes | 4 bytes | 32 bytes | __`Compressed_Size`__ @@ -108,9 +113,14 @@ __`Decompressed_Size`__ The size of the decompressed data contained in the frame. For skippable or otherwise empty frames, this value is 0. -__`Checksum`__ +__`XXH64_Checksum`__ + +Only present if `XXH64_Checksum_Flag` is set in the `Seek_Table_Descriptor`. Value : the least significant 32 bits of the XXH64 digest of the uncompressed data, stored in little-endian format. + +__`SHA512-256_Checksum`__ -Only present if `Checksum_Flag` is set in the `Seek_Table_Descriptor`. Value : the least significant 32 bits of the XXH64 digest of the uncompressed data, stored in little-endian format. +Only present if `SHA512-256_Checksum_Flag` is set in the `Seek_Table_Descriptor`. Value : the 256 bits of the SHA-512/256 digest of the uncompressed data, stored in little-endian format. ## Version Changes - 0.1.0: initial version +- 0.2.0: add cryptographic content hash