Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Abort() in XZStream and relatives #15

Merged
merged 2 commits into from
Aug 31, 2023
Merged

Implement Abort() in XZStream and relatives #15

merged 2 commits into from
Aug 31, 2023

Conversation

ied206
Copy link
Owner

@ied206 ied206 commented Aug 31, 2023

Summary

Details

Why current implementation does not have Abort()?

My compression stream implementation follows .NET BCL's official DeflateStream interface. It does not define Abort() or similar method. Many other implementations also have a similar behavior.

My guess is that zlib is a relatively fast compression algorithm, so merely calling Close() was enough even though a user wanted to abort the session.

When and why Close() takes much time?

When closing the XZStream (and its relatives), liblzma flushes its internal buffer.

  • Compress mode: The remaining input buffer must be compressed to finalize an xz file into a proper state.
  • Decompress mode: Nothing to do before closing.

In threaded compress mode, liblzma aggressively buffers the input data. The compression itself is merely postponed as much as possible. The compression time itself looks shorter, but it goes into the time needed to properly finalize an xz file.

What does the new Abort() do?

It just frees internal liblzma resources. To be exact, it frees LzmaStream, the context for the liblzma operation. It prevents any other liblzma operations from being conducted.

Furthermore, calling Abort() prevents Flush() and FinishWrite() from being called in Dispose()/Close().

It will save up some time on compress mode, especially threaded compression mode. It does not in decompress mode.

liblzma will refuse any operation after calling Abort().

Users must dispose of XZStream right after calling Abort(). The compressed data will be invalid and must be discarded.

Benchmark: How much does Abort() save time compared to Close()?

Taken from the AbortCompress test prints:

  • Threaded compression benefits a lot.
# Singlethreaded Compression
A.pdf, -1 = Close() took 3.000ms
A.pdf, -1 = Abort() took 1.000ms
B.txt, -1 = Close() took 1.999ms
B.txt, -1 = Abort() took 0.999ms
C.bin, -1 = Close() took 6.999ms
C.bin, -1 = Abort() took 2.000ms
# Multithreaded Compression
A.pdf, 1 = Close() took 20.002ms
A.pdf, 1 = Abort() took 1.000ms
B.txt, 2 = Close() took 8.000ms
B.txt, 2 = Abort() took 0.000ms
C.bin, 2 = Close() took 222.162ms
C.bin, 2 = Abort() took 8.001ms

Abort() is useful when an user wants to shudown XZStream immediately.
The original intended operation to abort is good old Close(), but there
is a report that Dispose()'s internal flushing takes a quite amount of
time.

To mitigate this, add Abort() method. Abort() simply frees LzmaStream
structs. This will prevent any further opertaions WITHOUT ANY FLUSHING.
Users must dispose XZStream right after calling Abort().
@ied206 ied206 merged commit 31200ad into develop Aug 31, 2023
@ied206 ied206 deleted the xz-abort branch August 31, 2023 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant