Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan for rewrite branch #50

Open
gingerwizard opened this issue Sep 27, 2022 · 6 comments
Open

Plan for rewrite branch #50

gingerwizard opened this issue Sep 27, 2022 · 6 comments

Comments

@gingerwizard
Copy link

Is https://github.com/ulikunitz/xz/tree/rewrite production ready? When do you anticipate this being promoted to main?

Thanks

@ulikunitz
Copy link
Owner

ulikunitz commented Sep 28, 2022

No it is not production ready. Here is a list of actions that are still to be done:

  • write parallel LZMA2 reader
  • make xz use the new lzma package
  • run tests; fix bugs
  • run benchmarks
  • write sequencer that uses a tree based match finder
  • run benchmarks again
  • run fuzzers; fix bugs
  • publish lz module (new code is dependent on this module)
  • publish release candidate; fix bugs
  • publish release

Please note that the new release will not be backward compatible, but it should be faster and will support parallel encoding and decoding. Since I work full time and I cannot provide a timeline, but I will provide updates under this issue.

@ulikunitz
Copy link
Owner

Update: The rewrite branch is now working. Using multiple threads I have achieved write rates over 150 MByte/s, but the compression ratio is larger (39% vs. 33%). I have not done any work on the defaults. Such parallel encoded streams can also be read in a multi-threaded way and I achieve there reading rates of over 190 MByte/s.

There are still some bug fixes required. I need to make the xz Reader a ReadCloser to stop the threads if not the whole stream is read, but so far it looks promising.

@ulikunitz
Copy link
Owner

Just an update.

I have done optimization work and found that I have very fast compressors but those cannot bring the compression rate smaller on 29% measured for the Silesia corpus. The bt4 match finder mode in xz can achieve compression rates of 23% for the same thing. So I currently write a tree-based match finder to achieve the same results. I have updated the task list above to reflect the activity.

@ulikunitz
Copy link
Owner

ulikunitz commented Jun 12, 2023

I have now a very slow parser (ca. 1 MiB/s) that reaches 26% on the Silesia corpus, but the code supports now multithreaded compression and decompression. I have published an alpha release v0.6.0-alpha.3. The new lz module with the Lempel-Ziv parsers is published as well, so you can actually test it.

@wagoodman
Copy link

wagoodman commented Sep 16, 2024

It looks like your list is a little outdated -- it appears that you're ahead of what's still left (🎉 ). What additional tasks are really left? Would you like any help with some of these tasks?

@ulikunitz
Copy link
Owner

Sorry, there has been a lot of work in my day job. There is a v0.6.0-alpha.3 you can experiment with, it supports the parallel modes. I would be interested in some feedback regarding it. Compression rates are still 2% below the original xz, but encoding is much faster especially using the parallel modes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants