-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StorageClient.UploadObject is difficult to use effectively #9250
Comments
@anshuldavid13 - Can you look into this? |
First off, @masonwheeler, thanks for raising this - and I do completely agree that the library API surface is clunky here. Likewise it would be nice if Unfortunately, implementing that API presents significant challenges. The existing implementation is based on ResumableUpload from the Google.Apis package - which only supports the current mode, and is a hairy piece of code to work with. (It's worth understanding that Google.Apis is effectively support for HTTP/1.1+JSON API calls. It's in maintenance mode - we're actively maintaining it, but not putting effort into adding features unless we absolutely need to. The newer generation of gRPC-based libraries is where most of our effort goes at the moment.) There are three options we could use to move forward:
None of these is appealing, as you can imagine. We'll need to discuss internally how important it is to fix this, estimate timelines, and see how it fits in with any other future plans. I don't want to make any promises that we'll ever address this - but I definitely acknowledge that the issue is a real one. |
Fair enough. The other alternative is to use an
Huh. I'm actually a bit surprised to hear that, well over a decade after they were introduced, Google hasn't provided a WebSocket-based upload API. You'd think that would be the obvious implementation for this particular use case.
True. Well, of those three, I have to ask, does the buffer you describe as being necessary for making the resumable upload resumable already exist, seeing as how it's necessary? (Or does your implementation depend on |
On a quick investigation, it looks like we always buffer - I had thought we might only buffer if So option 1 would be no worse than the current code in terms of overhead - it just wouldn't reduce the overhead either. (In terms of the API, there are various options here; we'll think about that after deciding whether or not to actually go ahead at all.) |
Well congrats on that; that puts you a step ahead of your AWS S3 counterpart! |
I'm looking at this now, with a tentative design that's basically a stream to perform the buffering - so you might have something like: var mediaStream = new MediaStream(/* buffer size maybe? */);
// Important - we *don't* await this task yet...
var uploadTask = client.UploadObjectAsync("my-bucket", "my-object", "text/plain", mediaStream);
// We'd probably *only* support async code for this. It makes things simpler.
await mediaStream.WriteAsync(...);
mediaStream.Close(); // Or Dispose, or a different method entirely...
// This would only finish when the buffered data had all been pulled by the uploader
await uploadTask; We might be able to use the same class (or at least bits of it) for a cleaner download as well (where you read from the download stream). Or we may be able to use an entirely different design for that. (That gets tricky in terms of hash checking.) It's still a bit clunky - more so now that I'm focused on the user code rather than the implementation. Hmm. |
Honestly that setup looks like a footgun, if a minor one. " What happens if you do await the call to |
@masonwheeler: Yup, that's my big concern. And yes, it would just never complete because it would be waiting for the stream to be completed. I'm starting to think about different options (which may basically use that underneath)... |
@jskeet Aside from the ValueTuple, which could probably be worked around in a few different ways, is there any reason why the API I suggested in the first post wouldn't be viable? |
@masonwheeler: I think I'd probably want to wrap it into a separate type. There's still the concern that someone will see a task and just await it though. I don't want to leap to one suggested API rather than trying to think of multiple options - and ones that will work for other media upload/download in Google.Apis as well. |
Apologies for the silence on this for a couple of months - and apologies that you're not going to like the result, which is that I'm going to close this issue with a new PR that adds this to our backlog file. I've looked into it, and I just don't think there's anything nice we can do in the current library. If we end up with a new Storage library based on gRPC, we can look at it from scratch at that point, as we'll have a clean slate in terms of API surface - we won't need to worry about how the existing REST-based support library works, and we won't need to make changes to the existing API of Google.Cloud.Storage.V1. You have my word that we'll at least take another close look - and hopefully with that added freedom, we'll be able to come up with something appropriate. I think it's unlikely that we'll ever end up back-porting to Google.Cloud.Storage.V1 though. I know this isn't the news you want to hear, but I'm afraid we have limited resources in the team, and implementing a robust and "clean" API (avoiding traps) here would just be too expensive in terms of time. |
Adds googleapis#9250 to the backlog, and removes a couple of items that have been addressed. Closes googleapis#9250
Wow. "We have limited resources" is kind of the last thing you expect to ever hear from Google... 😮 |
A possible way to avoid traps is to add another method that accepts System.IO.Pipelines.PipeReader and continuously uploads chunks from pipe until the pipe is completed by another thread with .Complete() method. |
Describe the solution you'd like
The documentation claims that
StorageClient.UploadObject[Async]
is useful for situations where "You want to upload data but don't know the final size at the start of the upload, such as when generating the upload data from a process, or when compressing an object on-the-fly." But the API it provides is not suited to either scenario. Its working is the polar opposite of the standard .NET idiom for generating and storing data: the API gives you a writeable Stream, and you write the data to it.Instead,
StorageClient.UploadObject[Async]
asks you for a readable Stream, and then uploads the data from it. The example given in the linked doc page even involves usingFileStream
to upload an existing file from the file system, which is the precise opposite of generating data in-process where you don't know the final size at the start of the upload. The scenario of "compressing an object on-the-fly" that the documentation claims to support does not work at all here; compressors such as GZipStream or BrotliStream require a writeable stream for their output.While this API can certainly be useful for pre-existing data, it is clearly not fit for its advertised purpose. A better API would look like this:
This would return a writeable Stream to write data to. The API will know that the data production is complete when
Close
orDispose
is called on the stream. Generation of the Object metatata that serves as a receipt for your object storage would be the responsibility of an async Task, even for the synchronous API.Describe alternatives you've considered
Right now, the only way to upload generated data without buffering it either in memory or in a temp file is to come up with some sort of hack to adapt a writeable Stream to a readable Stream.
System.IO.Pipelines.Pipe
seems to work, but that imposes additional overhead for moving the uploading code onto a thread pool thread. It would be nice to instead have an API that's actually designed to fit the advertised workflow.The text was updated successfully, but these errors were encountered: