Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization: Faster extraction #33

Closed
pothos opened this issue Nov 27, 2023 · 2 comments
Closed

Optimization: Faster extraction #33

pothos opened this issue Nov 27, 2023 · 2 comments
Labels
kind/feature A feature request.

Comments

@pothos
Copy link
Member

pothos commented Nov 27, 2023

Current situation

The extraction is done sequential with each chunk being read from the file. One file is shared with seek+read.

Impact

Slow extraction

Ideal future situation

Maybe: Mmap the file once and have the verification/extraction functions read the "array" so that the for-loop could do bzip decompression in parallel.
But since we need to use pwrite anyway for parallel writing, we can also use it for parallel reading (or multiple FDs).
Here are Rust traits:
https://doc.rust-lang.org/std/os/unix/fs/trait.FileExt.html#tymethod.read_at
https://doc.rust-lang.org/std/os/unix/fs/trait.FileExt.html#method.read_exact_at
https://doc.rust-lang.org/std/os/unix/fs/trait.FileExt.html#method.write_all_at

Implementation options

Use rayon to run the extraction for-loop in parallel

@pothos
Copy link
Member Author

pothos commented Dec 22, 2023

The preparations are done in #46 but it turns out that we can't directly use rayon's par_iter because we make use of error handling in the loop and would have to pass the errors to the main thread somehow.

@pothos pothos closed this as completed Dec 22, 2023
@pothos
Copy link
Member Author

pothos commented Dec 22, 2023

We can reopen this if necessary but for now I'll close it as it's low priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature A feature request.
Projects
None yet
Development

No branches or pull requests

1 participant