Introduce StorageConnector for GCS #14611
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR adds the storage connector to interact with GCS using the API functions exposed in
google-api-services-storage
. It will allow Durable storage and MSQ's interactive APIs to work with GCS.This also refactors the currently available S3 connector so that the chunking downloads that is currently done by the S3 connector can be extended to other connectors.
Due to the current versions of libraries used, the connector has the following 3 improvement areas:
Currently, due to the limitations of
google-api-services-storage
and the version used by it, we can't use multipart uploads or streaming uploads. Therefore GCS connector writes the intermediate contents to a file and uploads them in a single go. There are composite objects, however, the functionality seems incorrect. This can be improved once we upgrade the libraries.For fetching the file, there is a
isChunkedDownloads
flag which controls if we want to download in chunks using the range header, https://cloud.google.com/storage/docs/xml-api/reference-headers#range, however since it can be ignored, the functionality is kept behind a flag for now. Fetching using range isn't supported in the library currently.All delete requests are done individually and not in a batched manner.
This implementation can be improved provided that we use the
google-cloud-storage
library instead of thegoogle-api-services-storage
library, though that would require a rehaul of the currently existing Google functions.Release note
To be added
Key changed/added classes in this PR
GoogleStorageConnector
OurBar
TheirBaz
This PR has: