Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block-Level Sequence Producer API #3333

Merged
merged 60 commits into from
Dec 28, 2022
Merged

Block-Level Sequence Producer API #3333

merged 60 commits into from
Dec 28, 2022

Conversation

embg
Copy link
Contributor

@embg embg commented Dec 7, 2022

This PR introduces an API for external block-level sequence producers to plug into zstd. The user provides a function pointer and state object for the external sequence producer, and zstd will call it to generate sequences for each block. Entropy compression of sequences still remains entirely within the library's internal functions.

Potential applications of the API include hardware-accelerated sequence producers and sequence producers specialized to particular types of data.

There are some subtleties around fallback, sequence validation, memory ownership, etc. Users should read all of the documentation added by this PR to zstd.h before using the API. Note: that documentation has been updated in subsequent PRs, so make sure to look at a recent commit.

An example program is provided (see contrib/externalSequenceProducer) which demonstrates how to use the API with a simple LZ parser.


Note: the original version of this PR used the term "External Matchfinder API". The above summary has been updated to use the new term "Block-Level Sequence Producer API", but the code in this PR still uses old symbol names. Updated symbol names were introduced to the code in #3484.

tests/zstreamtest.c Outdated Show resolved Hide resolved
@embg embg marked this pull request as ready for review December 21, 2022 23:00
@embg embg requested a review from Cyan4973 December 21, 2022 23:01
lib/zstd.h Outdated Show resolved Hide resolved
@embg
Copy link
Contributor Author

embg commented Dec 28, 2022

Changes since review:

  • Rebase onto upstream/dev
  • Additional docs on API limitations: 8052b10
  • Fix minor @Cyan4973 nits: 1e60543
  • Refactor maxNbSeq calculation into a helper function: 49cd2e8
  • Fix copyright: 241f2a7

CI is passing except for a single test, which seems unrelated to this PR (and is failing on other open PRs).

I'm going to merge! :)

@embg embg merged commit 2a40262 into facebook:dev Dec 28, 2022
@nadavrot
Copy link

Congrats on landing this @embg. I am excited about the use of hardware-accelerated matchfinders in zstd.

@embg embg changed the title External matchfinder API External Sequence Producer API Feb 8, 2023
@embg embg changed the title External Sequence Producer API Block-Level Sequence Producer API Feb 9, 2023
@Cyan4973 Cyan4973 mentioned this pull request Feb 9, 2023
@abalib
Copy link

abalib commented Oct 18, 2023

External Sequence Producer idea is brilliant. Although processing the source in 128KB chunks is limiting in terms of compression ratio and possibly hardware performance.

As an alternative, I did see in contrib/seqBench this

ZSTD_generateSequences(zc, seqs, seqsSize, inBuf, inBufSize);
ZSTD_CCtx_setParameter(zc, ZSTD_c_blockDelimiters, ZSTD_sf_explicitBlockDelimiters);
size_t outBufSize = ZSTD_compressSequences(zc, outBuf, inBufSize, seqs, seqsSize, inBuf, inBufSize);

https://github.com/facebook/zstd/blob/dev/contrib/seqBench/seqBench.c#L29

Here, one could call back the external producer instead of ZSTD_generateSequences(), and the caller is not limited to the 128KB block size. I verified this experimentally. I concatenated the same file twice and compressed it using seqBench.c.
The last 6 lines show multiple blocks present and offsets reaching as far back as 200,000 > 128K

zstd/contrib/seqBench$ head -c 200000 junk >> junk1
zstd/contrib/seqBench$ head -c 200000 junk >> junk1
 ./seqBench junk1 
LL      ML      OFFS    REP
9       91      1       1
14      13      8       0
...
0       22      19457   0
1       4       19457   1
0       13      572     0
2       62144   200000  0
0       0       0       0
1       131071  200000  1
0       0       0       0
1       6783    200000  1
0       0       0       0

@embg embg deleted the offload branch October 18, 2023 23:54
@embg
Copy link
Contributor Author

embg commented Oct 19, 2023

Hi @adalib, thanks for reaching out! I would love to learn more about why you are interested in this API and the use-cases you are targeting. Feel free to reply here or email me at [my github username]@meta.com.

You are correct that ZSTD_compressSequences() can be used with externally-generated sequences. This is the approach taken by Intel's QATzip.

The motivation for block-level offload is that it integrates with existing compression APIs such as ZSTD_compress2() and ZSTD_compressStream2(). This is particularly important for streaming compression, which is impossible with ZSTD_compressSequences(). But even for small compressions which don't require streaming, maintaining compatibility with the common APIs used in production is an important feature. That's why we added the block-level API, which is used by Intel's zstd plugin.

processing the source in 128KB chunks is limiting in terms of compression ratio

So, the block-level API does support offsets larger than 128KB. External sequence producer functions are passed a windowSize parameter and are allowed to produce any offset which is compatible with that history window. The precise requirements on sequences returned by an external callback are given here.

The missing component is access to the actual content of the history window. In the future, that will be provided by the dict and dictSize parameters. Currently we pass in NULL as the dictionary buffer, but we could instead provide access to the previous windowSize bytes of history.

This feature is on our long-term roadmap, it simply hasn't been implemented yet. If there is a real-world need for this feature we can add it in the near future.

...and possibly hardware performance.

Your concern is absolutely valid. Breaking the input into 128KB chunks may require more back-and-forth communication with the hardware. This is a trade-off we made to maintain compatibility with existing APIs. ZSTD_compressSequences() is an option to potentially use hardware more efficiently, but that API also has significant downsides (as discussed above).

Please let me know if you have any further questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants