Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow upstream python-zlib-ng and python-isal to support threads directly #126

Closed
2 tasks done
rhpvorderman opened this issue Feb 21, 2023 · 2 comments
Closed
2 tasks done

Comments

@rhpvorderman
Copy link
Collaborator

rhpvorderman commented Feb 21, 2023

Since there are GIL escapes builtin in the zlib module, these are also built into the zlib_ng and isal_zlib modules. It should therefore be possible to use the threading module to efficiently address multiple cores while having all the benefits of shared memory.

In theory this should be more efficient than the current methodology which is used now and utilizes pipes as a means of interprocess communication.

In practice this means that we will not have to bother with igzip and possibly crabz anymore and also can make the non-threaded and threaded opening options much more simple as this will be handled by igzip.open and gzip_ng.open.

Less reliance on external applications + faster code seems like a win. Except that it will be some engineering effort to get it done ;-). A downstream effect of this will be that dnaio will become unbeatable in any metric by any other library. No matter what compression level or amount of threads is chosen.

  • python-isal
  • python-zlib-ng
@rhpvorderman
Copy link
Collaborator Author

I also inquired if this would be a good idea for cpython, but the idea was shot down as an "extreme corner case": https://discuss.python.org/t/multithreaded-gzip-reading-and-writing/24086 .
I guess (de)compressing 100GB gzip files is not a common workload for most python users. So for standard zlib, we will be stuck with pigz. On the upside, this allows me more freedom in how I implement this. I don't have to worry about calls like 'seek()' etc. I will only have to implement a streaming interface with no searching.

@rhpvorderman
Copy link
Collaborator Author

Python-zlib-ng 0.4.0 adds threading. Implemented in xopen in #135 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant