-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unzipping is too slow #23
Comments
Thank you for reporting. This is expected and I have following language in README.md:
I haven't found the time so far to work on code optimization. On the plus side there is a lot of potential on improving the situation. Unfortunately I cannot promise when I will work on it. |
There is work ahead. I left the issue open. |
I just ran into slow decompression and the (partial) solution is to wrap your reader in I say "partial" as unfortunately this fails on some inputs with
Very weird that it fails when buffered but works when unbuffered.. |
Yes, the library doesn't implement its own buffering and because it uses ReadByte it benefits from buffered readers. I should have documented it. Rationale at the time has been that I wanted to use a buffered reader only if there is a need for it. For instance I didn't want to use a buffered reader for a bytes.Buffer. A buffered reader shouldn't make a difference for the reading process. The gxz tool is using a buffered reader and I have run extensive tests for it. Can you provide the file that you want to decompress? |
Sure, I was decompressing the Zig tarballs from here. |
Fixed! |
I have now downloaded all 0.8.0 files and decompressed it with the gxz tool, which uses bufio.Reader and there were no problems to decompress all of them. Please provide:
|
Oh you're asking for the failing one, sorry, that wasn't clear - I thought you were asking for one of the slow ones. |
This is the one that fails. Interestingly it also fails with github.com/xi2/xz |
Hi, this a deb file, which is an ar file. You must do the following:
The two xz files can easily be uncompressed and generate no issues for me. The debian-binary is a plain-text file. Infos about the deb format can be found by the manual page for deb. |
There are 2 xz golang libraries: * https://github.com/xi2/xz fast but provides Reader functionality only, currently used to unpack modules * https://github.com/ulikunitz/xz has Writer() but Reader path is slower ulikunitz/xz#23 use it for image compression
There are 2 xz golang libraries: * https://github.com/xi2/xz fast but provides Reader functionality only, currently used to unpack modules * https://github.com/ulikunitz/xz has Writer() but Reader path is slower ulikunitz/xz#23 use it for image compression Closes #42
The performance of `github.com/ulikunitz/xz` when decompressing xz data is a known limitation; see ulikunitz/xz#23. `github.com/xi2/xz` is significantly faster for xz decompression; use it in place of `github.com/ulikunitz/xz` in the `unzip` package. `github.com/xi2/xz` doesn't implement xz compression, so the `tar` package must continue to use `github.com/ulikunitz/xz`. Performance evaluation on a sample 576MB xz-compressed tarball (the binary distribution of Clang for Ubuntu 18.04) with a dictionary size of 64MB (which corresponds to compression preset level 9) and a resulting ~13% compression ratio: ``` bash-4.4$ ls -lh $SRCS -rw-r--r-- 2 csn users 576M Nov 19 17:54 clang+llvm-14.0.0-x86_64-linux-gnu-ubuntu-18.04.tar.xz bash-4.4$ xz -lvv $SRCS clang+llvm-14.0.0-x86_64-linux-gnu-ubuntu-18.04.tar.xz (1/1) Streams: 1 Blocks: 1 Compressed size: 575.8 MiB (603,776,352 B) Uncompressed size: 4,408.2 MiB (4,622,376,960 B) Ratio: 0.131 Check: CRC64 Stream padding: 0 B Streams: Stream Blocks CompOffset UncompOffset CompSize UncompSize Ratio Check Padding 1 1 0 0 603,776,352 4,622,376,960 0.131 CRC64 0 Blocks: Stream Block CompOffset UncompOffset TotalSize UncompSize Ratio Check CheckVal Header Flags CompSize MemUsage Filters 1 1 12 0 603,776,312 4,622,376,960 0.131 CRC64 b4d869416c7f940f 12 -- 603,776,291 65 MiB --lzma2=dict=64MiB Memory needed: 65 MiB Sizes in headers: No Minimum XZ Utils version: 5.0.0 ``` With GNU tar 1.29 and liblzma 5.2.2 (a useful baseline): ``` bash-4.4$ time tar xf $SRCS real 0m40.250s user 0m36.544s sys 0m6.847s ``` arcat with `github.com/ulikunitz/xz` handling xz decompression: ``` bash-4.4$ time $TOOLS_ARCAT x $SRCS real 12m6.254s user 4m6.769s sys 8m4.628s ``` arcat with `github.com/xi2/xz` handling xz decompression: ``` bash-4.4$ time $TOOLS_ARCAT x $SRCS real 0m55.643s user 0m50.877s sys 0m2.275s ```
The performance of `github.com/ulikunitz/xz` when decompressing xz data is a known limitation; see ulikunitz/xz#23. `github.com/xi2/xz` is significantly faster for xz decompression; use it in place of `github.com/ulikunitz/xz` in the `unzip` package. `github.com/xi2/xz` doesn't implement xz compression, so the `tar` package must continue to use `github.com/ulikunitz/xz`. Performance evaluation on a sample 576MB xz-compressed tarball (the binary distribution of Clang for Ubuntu 18.04) with a dictionary size of 64MB (which corresponds to compression preset level 9) and a resulting ~13% compression ratio: ``` bash-4.4$ ls -lh $SRCS -rw-r--r-- 2 csn users 576M Nov 19 17:54 clang+llvm-14.0.0-x86_64-linux-gnu-ubuntu-18.04.tar.xz bash-4.4$ xz -lvv $SRCS clang+llvm-14.0.0-x86_64-linux-gnu-ubuntu-18.04.tar.xz (1/1) Streams: 1 Blocks: 1 Compressed size: 575.8 MiB (603,776,352 B) Uncompressed size: 4,408.2 MiB (4,622,376,960 B) Ratio: 0.131 Check: CRC64 Stream padding: 0 B Streams: Stream Blocks CompOffset UncompOffset CompSize UncompSize Ratio Check Padding 1 1 0 0 603,776,352 4,622,376,960 0.131 CRC64 0 Blocks: Stream Block CompOffset UncompOffset TotalSize UncompSize Ratio Check CheckVal Header Flags CompSize MemUsage Filters 1 1 12 0 603,776,312 4,622,376,960 0.131 CRC64 b4d869416c7f940f 12 -- 603,776,291 65 MiB --lzma2=dict=64MiB Memory needed: 65 MiB Sizes in headers: No Minimum XZ Utils version: 5.0.0 ``` With GNU tar 1.29 and liblzma 5.2.2 (a useful baseline): ``` bash-4.4$ time tar xf $SRCS real 0m40.250s user 0m36.544s sys 0m6.847s ``` arcat with `github.com/ulikunitz/xz` handling xz decompression: ``` bash-4.4$ time $TOOLS_ARCAT x $SRCS real 12m6.254s user 4m6.769s sys 8m4.628s ``` arcat with `github.com/xi2/xz` handling xz decompression: ``` bash-4.4$ time $TOOLS_ARCAT x $SRCS real 0m55.643s user 0m50.877s sys 0m2.275s ``` Co-authored-by: jpoole <[email protected]>
I used xz to unpack Python-3.11.4.xz. Using Python 3.10 it took 4sec; using Go it took 1m55sec. So I do think Go xz has a speed issue. I just tried github.com/therootcompany/xz and it took 5sec. |
I posted this two years ago but it got deleted. here is it again. should help with the speed: package test
import (
"archive/tar"
"bufio"
"github.com/ulikunitz/xz"
"io"
"os"
"path"
"testing"
)
const cargo = "cargo-1.54.0-x86_64-pc-windows-gnu.tar.xz"
func readFrom(r io.Reader) error {
tr := tar.NewReader(r)
for {
n, err := tr.Next()
if err == io.EOF {
break
} else if err != nil {
return err
} else if n.Typeflag != tar.TypeReg {
continue
}
os.MkdirAll(path.Dir(n.Name), os.ModeDir)
f, err := os.Create(n.Name)
if err != nil {
return err
}
defer f.Close()
f.ReadFrom(tr)
}
return nil
}
func TestUlikunitz(t *testing.T) {
f, err := os.Open(cargo)
if err != nil {
t.Fatal(err)
}
defer f.Close()
r, err := xz.NewReader(bufio.NewReader(f))
if err != nil {
t.Fatal(err)
}
if err := readFrom(r); err != nil {
t.Fatal(err)
}
} |
When i tried to unzip big file (about 3 GiB size in xz and about 18 GiB unpacked) the process was too slow - only 3 GiB of 18 unpacked in about 40 min on my machine. The same file was unpacked for about 5 minutes using 7 zip tool
The text was updated successfully, but these errors were encountered: