storage/cloud: replace WriteFile(Reader) with Writer #65057

dt · 2021-05-12T14:22:44Z

This changes the ExternalStorage API's writing method from WriteFile which takes an io.Reader
and writes its content to the requested destination or returns an error encountered while doing
so to instead have Writer() which returns an io.Writer pointing to the destination that can be
written to later and then closed to finish the upload (or CloseWithError'ed to cancel it).

All existing callers use the shim and are unaffected, but can later choose to change to a
push-based model and use the Writer directly. This is left to a follow-up change.

(note: first commit is just adding a shim for existing callers and switching them)

Release note: none.

cockroach-teamcity · 2021-05-12T14:22:53Z

This change is

dt · 2021-05-12T14:23:20Z

This will need a rebase over #65033 but I wanted to start letting CI chew on it.

miretskiy

Reviewed 15 of 17 files at r1, 10 of 15 files at r2, 7 of 8 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @dt)

pkg/storage/cloud/cloud_io.go, line 256 at r2 (raw file):

}

func BackgroundPipe(ctx context.Context, fn func(ctx context.Context, pr io.Reader) error) WriteCloserWithError {

nit: comment needed for exported function (and all aexported methods).

pkg/storage/cloud/cloud_io.go, line 300 at r3 (raw file):

// WriteFile is a helper for writing the content of a Reader to the given path
// of an ExternalStorage.
func WriteFile(ctx context.Context, basename string, src io.Reader, dest ExternalStorage) error {

should the signature be changed (ctx, src, dest, basename) to be more of a "src -> dest" pattern?

pkg/storage/cloud/cloud_io.go, line 307 at r3 (raw file):

	if _, err := io.Copy(w, src); err != nil {
		_ = w.CloseWithError(err)
		return err

you could keep both errors if you so choose: return errors.CombineErrors(err, w.CloseWithError(err))

pkg/storage/cloud/amazon/s3_storage.go, line 293 at r2 (raw file):

		return nil, err
	}
	uploader := s3manager.NewUploader(sess)

not saying that this is bad... but we're changing implementation to use upload vs put object.
We should probably add this in the pr descriptor.
s3 manager seems to have settings around concurrency (default upload == 5). What does that mean for us
(if anything)? Should that be configurable? Perhaps a TODO is in order to evaluate this.

pkg/storage/cloud/azure/azure_storage.go, line 135 at r2 (raw file):

		_, err := azblob.UploadStreamToBlockBlob(
			ctx, r, blob, azblob.UploadStreamToBlockBlobOptions{
				BufferSize: 4 << 20,

This seems rather arbitrary?
I'm also curious/worried about the implication of this call, particularly in the face of slowness/unavailabilty of azure. You could imagine having bunch of these uploads starting ,all buffering 4 MB?

pkg/storage/cloud/gcp/gcs_storage.go, line 162 at r2 (raw file):

func (g *gcsStorage) WriteFile(ctx context.Context, basename string, content io.ReadSeeker) error {
	const maxAttempts = 3
	err := retry.WithMaxAttempts(ctx, base.DefaultRetryOptions(), maxAttempts, func() error {

Do we know if gcs retries? Is it safe to drop this?

pkg/storage/cloud/httpsink/http_storage.go, line 179 at r3 (raw file):

func (h *httpStorage) WriteFile(ctx context.Context, basename string, content io.ReadSeeker) error {
	return contextutil.RunWithTimeout(ctx, fmt.Sprintf("PUT %s", basename),

I assume we're dropping this because, presumably the caller could set deadlines themselves...
Do we know if anything is effected by this? Is this a safe change to make?

pkg/storage/cloud/userfile/file_table_storage.go, line 269 at r3 (raw file):

	// retry we are not able to seek to the start of `content` and try again,
	// resulting in bytes being missed across txn retry attempts.
	// See chunkWriter.WriteFile for more information about writing semantics.

Can you explain why it's safe to remove this?

dt

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @miretskiy)

pkg/storage/cloud/cloud_io.go, line 256 at r2 (raw file):