You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since there are GIL escapes builtin in the zlib module, these are also built into the zlib_ng and isal_zlib modules. It should therefore be possible to use the threading module to efficiently address multiple cores while having all the benefits of shared memory.
In theory this should be more efficient than the current methodology which is used now and utilizes pipes as a means of interprocess communication.
In practice this means that we will not have to bother with igzip and possibly crabz anymore and also can make the non-threaded and threaded opening options much more simple as this will be handled by igzip.open and gzip_ng.open.
Less reliance on external applications + faster code seems like a win. Except that it will be some engineering effort to get it done ;-). A downstream effect of this will be that dnaio will become unbeatable in any metric by any other library. No matter what compression level or amount of threads is chosen.
python-isal
python-zlib-ng
The text was updated successfully, but these errors were encountered:
I also inquired if this would be a good idea for cpython, but the idea was shot down as an "extreme corner case": https://discuss.python.org/t/multithreaded-gzip-reading-and-writing/24086 .
I guess (de)compressing 100GB gzip files is not a common workload for most python users. So for standard zlib, we will be stuck with pigz. On the upside, this allows me more freedom in how I implement this. I don't have to worry about calls like 'seek()' etc. I will only have to implement a streaming interface with no searching.
Since there are GIL escapes builtin in the zlib module, these are also built into the zlib_ng and isal_zlib modules. It should therefore be possible to use the threading module to efficiently address multiple cores while having all the benefits of shared memory.
In theory this should be more efficient than the current methodology which is used now and utilizes pipes as a means of interprocess communication.
In practice this means that we will not have to bother with igzip and possibly crabz anymore and also can make the non-threaded and threaded opening options much more simple as this will be handled by igzip.open and gzip_ng.open.
Less reliance on external applications + faster code seems like a win. Except that it will be some engineering effort to get it done ;-). A downstream effect of this will be that dnaio will become unbeatable in any metric by any other library. No matter what compression level or amount of threads is chosen.
The text was updated successfully, but these errors were encountered: