-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement smart_open instead of .open() to allow efficient streaming (saving/loading) of large files to cloud bucket. #264
Comments
Thanks @hugolytics for your thoughts here. It's definitely interesting to think about ways to leverage To me, this issue is similar to the discussion in #96 and #109. There are certainly backend packages like this that handle the operational side of things that we should consider, since our primary purpose is supporting the pathlib API. That said, we won't merge the PR (#265 ) as is for a number of reasons, so I'm going to close it:
Note that it is worth bumping #92 that lists these alternatives |
I’d love to see smart_open added here also, as an alternative to the cache concept. We have to open large zip files (10s-100s of gb) just to read some content in place and this all works well with cloudpathlib and smart-open (in my fork). |
@msmitherdc Thanks for the comment. To better understand your use case, what are the specific things that you want |
we are opening 3dtiles and i3s (slpk) mesh files. These are large zip files that we read json files out of. We use reads of the files out of the zip to get info about the mesh. For serving them out for cesium, we read byte ranges and serve them out to the client. So streaming and partial reads. |
The current implementation of the .open methods consists of a local cache which is then synchronized with the cloud.
This method can be replaced by smart_open, to allow for a more efficient mechanism.
One can take inspiration from aws' S3PathLib, (however, that library handles boto session in a way that is not thread-safe, which has made me switch to this library).
Currently, I subclassed S3Path and implemented the aforementioned S3Pathlib's implementation as follows:
However, I could also open a pull request to merge this with the cloudpath definition, since smart_open is cloud-agnostic.
The text was updated successfully, but these errors were encountered: