-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Splittable Format and API #654
Conversation
For clarity of repository, we want to move the seekable format and demo into |
|
||
__`Skippable_Magic_Number`__ | ||
|
||
Value : 0x184D2A5?, which means any value from 0x184D2A50 to 0x184D2A5F. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All 16 values are fine for "skippable frames" in general,
but for this specification in particular,
should we pin down a specific number instead ?
Maybe one which is less likely to be used, such as *E ?
cc @terrelln for reference (done something equivalent for pzstd
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*E sounds reasonable
#### `Seek_Table_Footer` | ||
The seek table footer format is as follows: | ||
|
||
|`Number_Of_Chunks`|`Seek_Table_Descriptor`|`Seekable_Magic_Number`| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems Seekable_Magic_Number
is not defined afterwards,
maybe it's implied that it is the same as the Skippable_Magic_Number
in previous paragraph ?
Even if that's so, it needs to be described.
Also interesting : explain the reasoning for this construction.
For reference : thejoshwolfe/yauzl#48 (comment)
showing that there is nothing obvious and even commonly distributed formats can have it wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, it does have a value that I forgot to include.
|
||
If the checksum flag is set, each of the seek table entries contains a 4 byte checksum of the uncompressed data contained in its chunk. | ||
|
||
`Reserved_Bits` are not currently used but may be used in the future for breaking changes, so a compliant decoder should ensure they are set to 0. `Unused_Bits` may be used in the future for non-breaking changes, so a compliant decoder should not interpret these bits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
`Seek_Table_Entries` consists of `Number_Of_Chunks` (one for each chunk in the data, not including the seek table frame) entries of the following form, in sequence: | ||
|
||
|`Compressed_Size`|`Decompressed_Size`|`[Checksum]`| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question : what about the initial idea of working with offsets ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided on size because the main upside to offset is not having to parse the entire seek table, but I can't think of scenarios where that's too useful (in large files we have to do a lot of work/reading anyway, in small files the seek table is small), and using offsets would either make the table bigger (8 bytes per field), or complicate the format. As is a max chunk size of 4 GB seems reasonable.
We could change this however if it turns out the ability to seek to the middle of the jump table is desirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
The main benefit to using offsets is to keep the table on-disk, to preserve memory usage, only reading the required offsets when useful.
I can think of a few scenarios which would benefit from this construction, but they all involve specific situations (lowmem) we should not worry about at this stage. There's ample time to decide if it's a worthwhile scenario in some distant future.
U32 hi = table->tableLen; | ||
|
||
while (lo + 1 < hi) { | ||
U32 mid = lo + ((hi - lo) >> 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) const
whenever you can
#define ZSTD_SEEKABLE_MAXCHUNKS 0x8000000U | ||
|
||
/* 0xFE03F607 is the largest number x such that ZSTD_compressBound(x) fits in a 32-bit integer */ | ||
#define ZSTD_SEEKABLE_MAX_CHUNK_DECOMPRESSED_SIZE 0xFE03F607 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get a static assert on this value ?
ZSTD_compressBound()
was recently changed, it may be changed again in the future.
In case of a future change, it's necessary to ensure that ZSTD_compressBound(ZSTD_SEEKABLE_MAX_CHUNK_DECOMPRESSED_SIZE)
doesn't overflow.
Note that it may require to expose a macro version of ZSTD_compressBound()
/*-**************************************************************************** | ||
* Seekable Format | ||
* | ||
* The seekable format splits the compressed data into a series of "chunks", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming : "chunks" or "frames" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the meaning lines up well with regular zstd frames, I can change chunks to frames.
ZSTDLIB_API ZSTD_seekable_DStream* ZSTD_seekable_createDStream(void); | ||
ZSTDLIB_API size_t ZSTD_seekable_freeDStream(ZSTD_seekable_DStream* zds); | ||
|
||
/*===== Seekable decompression functions =====*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decoding API feels generally slightly awkward.
I suspect it's designed to look like the generic streaming API, which works from buffer to buffer.
But it could be that FILE*
operations are in fact a better fit for the job.
That could also be 2 different set of functions.
These are difficult questions.
We will probably need to discuss this a bit together to converge on a design.
A generic tool for this objective is to develop sample user programs to get a "feel" of how usable the API is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it would be much cleaner with FILE*
objects, however I wasn't sure if that would be too restrictive for layers built on top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It certainly is restrictive, and that's why we should have a brainstorm session about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you work with FILE*
objects, you could allow advanced usage like:
typedef void(*ZSTD_seekable_read)(void* opaque, void* buffer, size_t n);
typedef void(*ZSTD_seekable_seek)(void* opaque, unsigned long long offset); /* Careful with files > 4 GiB and fseek() */
typedef struct {
void* opaque;
ZSTD_seekable_read read;
ZSTD_seekable_seek seek;
} ZSTD_seekable_CustomFile;
Users will probably also want to be able to read the seek table.
* | ||
* Data streamed to the seekable compressor will automatically be split into | ||
* frames of size `maxFrameSize` (provided in ZSTD_seekable_initCStream()), | ||
* or if none is provided, will be cut off whenver ZSTD_endFrame() is called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whenever ZSTD_seekable_endFrame()
* small, a size hint for how much data to provide. | ||
* An error code may also be returned, checkable with ZSTD_isError() | ||
* | ||
* Use ZSTD_initDStream to prepare for a new decompression operation using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZSTD_seekable_initDStream()
ZSTDLIB_API ZSTD_seekable_DStream* ZSTD_seekable_createDStream(void); | ||
ZSTDLIB_API size_t ZSTD_seekable_freeDStream(ZSTD_seekable_DStream* zds); | ||
|
||
/*===== Seekable decompression functions =====*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you work with FILE*
objects, you could allow advanced usage like:
typedef void(*ZSTD_seekable_read)(void* opaque, void* buffer, size_t n);
typedef void(*ZSTD_seekable_seek)(void* opaque, unsigned long long offset); /* Careful with files > 4 GiB and fseek() */
typedef struct {
void* opaque;
ZSTD_seekable_read read;
ZSTD_seekable_seek seek;
} ZSTD_seekable_CustomFile;
Users will probably also want to be able to read the seek table.
if (zcs->framelog.size == ZSTD_SEEKABLE_MAXFRAMES) | ||
return ERROR(frameIndex_tooLarge); | ||
|
||
zcs->framelog.entries[zcs->framelog.size] = (framelogEntry_t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to support C versions older than C99?
/* grow the buffer if required */ | ||
if (zcs->framelog.size == zcs->framelog.capacity) { | ||
/* exponential size increase for constant amortized runtime */ | ||
size_t const newCapacity = zcs->framelog.capacity * 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to change this, but see https://github.com/facebook/folly/blob/master/folly/docs/FBVector.md#memory-handling for an interesting read.
/* restrict range to the end of the file, of non-negative size */ | ||
rangeStart = MIN(rangeStart, zds->seekTable.entries[zds->seekTable.tableLen].dOffset); | ||
rangeEnd = MIN(rangeEnd, zds->seekTable.entries[zds->seekTable.tableLen].dOffset); | ||
rangeEnd = MAX(rangeEnd, rangeStart); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rangeEnd = MIN(rangeEnd, zds->seekTable.entries[zds->seekTable.tableLen].dOffset);
rangeStart = MIN(rangeStart, rangeEnd);
|
||
{ /* Allocate an extra entry at the end so that we can do size | ||
* computations on the last element without special case */ | ||
seekEntry_t* entries = malloc(sizeof(seekEntry_t) * (numFrames + 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some compilers will complain that you don't cast to a seekEntry_t*
.
zds->targetStart - zds->decompressedOffset; | ||
size_t const prevInputPos = input->pos; | ||
|
||
ZSTD_outBuffer outTmp = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here about C99.
zds->stage = zsds_seek; | ||
|
||
/* force a seek first */ | ||
zds->curFrame = (U32) -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No space between cast and value
jt->entries[zds->curFrame + 1].cOffset) | ||
return ERROR(corruption_detected); | ||
ZSTD_resetDStream(zds->dstream); | ||
zds->stage = zsds_seek; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to seek after this? You've already asserted you are in the correct spot, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The next frame to read might not be the frame immediately after, e.g. in the case of interleaved skippable frames. This will seek over them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, since dOffset == 0
for skippable frames. Can you add a comment that explains that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite, dSize == 0
for skippable frames, which translates to entries[i].dOffset == entries[i+1].dOffset
if i
is the index of a skippable frame. Using offset in the decoder even though the format uses size is a bit confusing, but necessary for the decoder to be efficient. I'll add a comment explaining why the seek is necessary.
e0cacc9
to
9626cf1
Compare
ZSTDLIB_API size_t ZSTD_seekable_getFrameCompressedSize(ZSTD_seekable* const zs, unsigned frameIndex); | ||
ZSTDLIB_API size_t ZSTD_seekable_getFrameDecompressedSize(ZSTD_seekable* const zs, unsigned frameIndex); | ||
|
||
ZSTDLIB_API unsigned ZSTD_seekable_offsetToFrame(ZSTD_seekable* const zs, unsigned long long offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZSTD_seekable_offsetToFrameIndex
for consistent naming
/* Limit the maximum size to avoid any potential issues storing the compressed size */ | ||
#define ZSTD_SEEKABLE_MAX_FRAME_DECOMPRESSED_SIZE 0x80000000U | ||
|
||
#define ZSTD_SEEKABLE_FRAMEINDEX_TOOLARGE (0ULL-2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This macro value is of different nature.
I would rather #define
it closer to the family of functions which need it.
Demo using
As seen from the timing results, the decompression and processing speed parallelizes very efficiently, gaining a ~2x speed up each time the number of cores is doubled. |
Yes, that's a great example, very clear. Obviously, now that you successfully completed parallel processing, |
Demo using parallel_compression.c, which does compression manually and uses the seekable API to construct a seek table from it. (./parallel_compression FILE BLOCK_SIZE NB_THREADS).
The files can be decompressed using |
To be merged after release |
This is an initial implementation of a zstd splittable format and API, as per #395. Comments and feedback are appreciated.