Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: NewReader() returns outdated objects #9763

Closed
wjkoh opened this issue Apr 13, 2024 · 6 comments · Fixed by #10331
Closed

storage: NewReader() returns outdated objects #9763

wjkoh opened this issue Apr 13, 2024 · 6 comments · Fixed by #10331
Assignees
Labels
api: storage Issues related to the Cloud Storage API. type: docs Improvement to the documentation for an API.

Comments

@wjkoh
Copy link

wjkoh commented Apr 13, 2024

Client

Storage

Environment

Local

Go Environment

$ go version
$ go env

go version go1.22.1 darwin/arm64

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/wjkoh/Library/Caches/go-build'
GOENV='/Users/wjkoh/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/wjkoh/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/wjkoh/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/opt/homebrew/Cellar/go/1.22.1/libexec'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/opt/homebrew/Cellar/go/1.22.1/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.22.1'
GCCGO='gccgo'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
GOMOD='/Users/wjkoh/bake/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/y1/jsmqrm1s6y55qh9s2rmw39lh0000gn/T/go-build2463423068=/tmp/go-build -gno-record-gcc-switches -fno-common'

Code

e.g.

package main

import (
	"context"
	"log"
	"os"
	"time"

	"cloud.google.com/go/storage"
	"github.com/google/uuid"
)

func main() {
	const bucket = "some-public-bucket"
	object := uuid.New().String()

	ctx := context.Background()
	storageClient, err := storage.NewClient(ctx)
	if err != nil {
		log.Fatal(err)
	}

	// Create a test object.
	w := storageClient.Bucket(bucket).Object(object).NewWriter(ctx)
	_, err = w.Write([]byte("foobar 1"))
	if err != nil {
		log.Fatal(err)
	}
	err = w.Close()
	if err != nil {
		log.Fatal(err)
	}

	// First read.
	reader1, err := storageClient.Bucket(bucket).Object(object).NewReader(ctx)
	if err != nil {
		log.Fatal(err)
	}
	log.Printf("Generation from reader1.NewReader(): %d", reader1.Attrs.Generation)
	err = reader1.Close()
	if err != nil {
		log.Fatal(err)
	}
	attrs, err := storageClient.Bucket(bucket).Object(object).Attrs(ctx)
	if err != nil {
		log.Fatal(err)
	}
	log.Printf("Generation from Attrs(): %d", attrs.Generation)

	// Overwrite the test object to change its generation.
	w = storageClient.Bucket(bucket).Object(object).NewWriter(ctx)
	_, err = w.Write([]byte("foobar 2"))
	if err != nil {
		log.Fatal(err)
	}
	err = w.Close()
	if err != nil {
		log.Fatal(err)
	}

        // This delay didn't change the behavior.
	//time.Sleep(10 * time.Second)

	// Second read.
	reader2, err := storageClient.Bucket(bucket).Object(object).NewReader(ctx)
	if err != nil {
		log.Fatal(err)
	}
	log.Printf("Generation from reader2.NewReader(): %d", reader2.Attrs.Generation)
	err = reader2.Close()
	if err != nil {
		log.Fatal(err)
	}
	attrs, err = storageClient.Bucket(bucket).Object(object).Attrs(ctx)
	if err != nil {
		log.Fatal(err)
	}
	log.Printf("Generation from Attrs(): %d", attrs.Generation)
}

Expected behavior

I anticipated that the generation numbers from NewReader().Attrs.Generation and object.Attrs().Generation would match.

Actual behavior

However, after overwriting the test object, the generation number obtained from ReaderObjectAttrs is outdated, while the number from object.Attrs() is current. For instance, in the following output, the generation number from the second Reader is 1712996651719769, despite it being expected to be 1712996653117313.

2024/04/13 17:24:12 Generation from reader1.NewReader(): 1712996651719769
2024/04/13 17:24:12 Generation from Attrs(): 1712996651719769
2024/04/13 17:24:13 Generation from reader2.NewReader(): 1712996651719769
2024/04/13 17:24:13 Generation from Attrs(): 1712996653117313

Additional context

cloud.google.com/go/storage v1.40.0

@wjkoh wjkoh added the triage me I really want to be triaged. label Apr 13, 2024
@product-auto-label product-auto-label bot added the api: storage Issues related to the Cloud Storage API. label Apr 13, 2024
@wjkoh wjkoh changed the title storage: ReaderObjectAttrs.Generation rerurns an incorrect generation number storage: ReaderObjectAttrs.Generation returns an incorrect generation number Apr 13, 2024
@wjkoh wjkoh changed the title storage: ReaderObjectAttrs.Generation returns an incorrect generation number storage: ReaderObjectAttrs.Generation has an incorrect generation number Apr 13, 2024
@wjkoh
Copy link
Author

wjkoh commented Apr 13, 2024

I have identified the cause of the issue. My test bucket was public, so its Cache-Control was set to public, max-age=3600 by default as stated in https://cloud.google.com/storage/docs/caching#built-in_caching_for. It appears that NewReader() adheres to this cache setting, which was unexpected. I would have assumed that the cache-control setting only applies to HTTP requests, such as when retrieving an object's public or signed URL through a GET request, as it is an HTTP header. It is unclear why the Go client library is also affected by this HTTP caching, as users of the library may not be aware of the underlying network protocols being used (e.g. HTTP, gRPC, or other). Furthermore, the documentation does not mention that the HTTP Cache-Control setting can impact an object reader created through NewReader().

Is there a way to completely disable caching for the Go client library? Alternatively, could we at least add a note to NewReader() informing users that caching is applied and they may encounter outdated objects?

@wjkoh wjkoh changed the title storage: ReaderObjectAttrs.Generation has an incorrect generation number storage: NewReader() returns outdated objects Apr 13, 2024
@tritone
Copy link
Contributor

tritone commented Apr 14, 2024

Thanks for the issue @wjkoh .

The Go library does use HTTP requests (using the XML API by default for downloads) so it is equivalent to a GET request via the browser. We also offer support for the JSON API (also HTTP based) as well as gRPC (this is new and still under an allowlist). We can add a note about this to the docs to clarify potentially, but I don't think it is possible to disable cache-control for HTTP requests on public objects (unless you disable cache-control on that object entirely).

Can you confirm if doing the download with https://pkg.go.dev/cloud.google.com/go/storage#WithJSONReads set yields the same result?

@tritone tritone added type: docs Improvement to the documentation for an API. and removed triage me I really want to be triaged. labels Apr 14, 2024
@wjkoh
Copy link
Author

wjkoh commented Apr 15, 2024

Thank you for your response, @tritone . I now understand the behavior, but could you please explain why client.Bucket(bucket).Object(object).Attrs(ctx) returns uncached information? This discrepancy was one of the factors that caused confusion and led to a bug in our code. I will test out JSONRead today and provide an update.

@tritone
Copy link
Contributor

tritone commented Apr 19, 2024

@wjkoh there are a couple possible reasons:

  1. ObjectHandle.Attrs() uses the JSON API (as do all other library methods aside from downloads using XML). So it's a different endpoint.
  2. The Attrs call is a metadata request rather than a data request (no alt=media). I'm not sure how that behaves w.r.t. cache control and would need to test it out to understand better.

Overall if you're concerned about consistency between those two methods I think it would be a good idea to switch to using JSON downloads. We are planning on switching that to be the library default in some future release.

@hkawa608
Copy link

@tritone @wjkoh
I have confirmed the WithJSONReads option solves this issue.

@wjkoh
Copy link
Author

wjkoh commented Apr 30, 2024

Thank you for your input, @tritone and @hkawa608 ! I have decided to abandon the idea of using Cloud Storage for real-time data sharing among our nodes after coming across this issue. Unfortunately, I did not follow up on it. At this point, I do not have a vested interest in this matter. What would be the most suitable solution until the "switch" occurs? I believe simply including a warning in the documentation would suffice.

tritone added a commit to tritone/google-cloud-go that referenced this issue Jun 5, 2024
Update docs to recommend using the JSON API for downloads
over XML, and note that we will switch the default to JSON
at some point.

Fixes googleapis#9763
gcf-merge-on-green bot pushed a commit that referenced this issue Jun 7, 2024
Update docs to recommend using the JSON API for downloads over XML, and note that we will switch the default to JSON at some point.

Fixes #9763
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API. type: docs Improvement to the documentation for an API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants