-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Handle .mtx.gz
#28
Comments
This is a significant feature request, and can be done without much difficulty outside of |
It's possible that this could be easy. A big reason FMM uses iostreams is to enable use of existing libraries for specialized uses like this one. Here are two ideas. Idea one: Use a GZip iostream wrapper. There are a bunch of lightweight-ish ones on GitHub, just note that some have dependencies on zlib. If your users are largely using precompiled binaries then Boost has a good one: https://www.boost.org/doc/libs/1_83_0/libs/iostreams/doc/classes/gzip.html . This would be a 4-line solution, like the example on the bottom of that page. Idea two: Do what the Python binds do: provide an adapter between the stream types of Python and iostreams, then use Python's GZip decompressor. This adapter may or may not exist for R, I was lucky to find one for Python. The upside is that you can also use it to adapt all streams for that language (Python users often use StringIO/ByteIO objects), though I'm not sure if that's a common usage pattern in R. The upside is the extra flexibility and not having to maintain gzip/bz2 or whatever dependencies. The downside is that the adapter is likely slower than native C++ file IO, so you'll want two code paths. Gzip decompression is slow anyway. |
Thanks a ton for looking into this.
I think idea two is feasible for R as well (due to its transparent support of Gzip files) However, from a performance perspective idea one is way nicer, and if the dependencies are not too heavy (i.e. the resulting library with dependencies is small enough for CRAN and builds quickly enough, within 30 min) then that would the best option probably. I will investigate both :) |
Curious what you decide on! I bet the performance will be comparable, since it'll likely be zlib doing the work either way. Just who wraps it better :P |
First noted here: ropensci/software-review#606 (comment).
The text was updated successfully, but these errors were encountered: