Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bigquery: storage API fails silently when user does not have correct privileges #9102

Closed
stbenjam opened this issue Dec 7, 2023 · 8 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@stbenjam
Copy link

stbenjam commented Dec 7, 2023

Client

cloud.google.com/go/bigquery v1.57.1

Environment

Mac OS

Go Environment

$ go version
go version go1.21.4 darwin/arm64
$ go env
GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/stbenjam/Library/Caches/go-build'
GOENV='/Users/stbenjam/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/stbenjam/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/stbenjam/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/opt/homebrew/Cellar/go/1.21.4/libexec'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/opt/homebrew/Cellar/go/1.21.4/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.21.4'
GCCGO='gccgo'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
GOMOD='/Users/stbenjam/go/src/github.com/openshift/sippy/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/yd/cl9xxt2j0vs9714bc4k22rf00000gn/T/go-build3292277515=/tmp/go-build -gno-record-gcc-switches -fno-common'

Code

package main

import (
	"context"
	"errors"
	"flag"
	"fmt"
	"log"
	"time"

	"cloud.google.com/go/bigquery"
	"google.golang.org/api/iterator"
	"google.golang.org/api/option"
)

func main() {
	ctx := context.Background()

	serviceAccountPath := flag.String("service-account", "", "Path to service account JSON file")
	project := flag.String("project", "", "Project")
	dataset := flag.String("dataset", "", "Dataset")
	table := flag.String("table", "", "Table")
	flag.Parse()

	client, err := bigquery.NewClient(ctx, *project, option.WithCredentialsFile(*serviceAccountPath))
	if err != nil {
		log.Fatalf("Failed to create client: %v", err)
	}

	// Enable Storage API usage for fetching data
	if err := client.EnableStorageReadClient(context.Background(), option.WithCredentialsFile(*serviceAccountPath)); err != nil {
		panic(err)
	}

	query := client.Query(fmt.Sprintf(`
		SELECT * FROM `+"`%s.%s.%s`", *project, *dataset, *table))

	start := time.Now()
	it, err := query.Read(ctx)
	if err != nil {
		log.Fatalf("Failed to read query: %v", err)
	}

	log.Printf("processing records...")
	var rows []bigquery.Value
	for {
		var row []bigquery.Value
		err := it.Next(&row)
		if errors.Is(err, iterator.Done) {
			break
		}
		if err != nil {
			log.Fatalf("Failed during iteration: %v", err)
		}
		rows = append(rows, row)
	}

	elapsed := time.Since(start)
	log.Printf("Time taken: %s\n", elapsed)
}

Expected behavior

client.EnableStorageReadClient should return an error if the storage API cannot be used. For example, if the user does not have the bigquery.readsessions.create permission. I only got an error when I tried to use the storage API directly, which hinted my user didn't have the right privileges.

Actual behavior

It does not fail, but instead falls back to the very slow REST API. I used 2 service accounts, one without bigquery.readsessions.create and one with bigquery.readsessions.create.

$ go run bigquery.go -service-account $HOME/tokens/no-read-session.json -project XXXX -dataset XXXX -table junit_precomputed_20231004_20231031_414
Time taken: 1m50.318955958s
$ go run bigquery.go -service-account $HOME/tokens/with-read-session.json -project XXXX -dataset XXXX -table junit_precomputed_20231004_20231031_414
Time taken: 17.922398s
@stbenjam stbenjam added the triage me I really want to be triaged. label Dec 7, 2023
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the BigQuery API. label Dec 7, 2023
@stbenjam stbenjam changed the title bigquery: unable to tell if it's using the bigquery storage API bigquery: storage API fails silently when user does not have correct privileges Dec 8, 2023
@shollyman
Copy link
Contributor

The isAccelerated method on the RowIterator is intended to make it clearer which API is being used to scan rows. The API provides no ability to signal missing permissions prior to the read session being created, so we have no mechanism by which to surface such information at client instantiation time.

Reviewing the storage acceleration implementation, it looks like we're not exposing trace spans sufficiently there, so that's another mechanism by which we could provide more insight, such as attaching the session error to the span info.

@shollyman shollyman added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. priority: p3 Desirable enhancement or fix. May not be included in next release. and removed triage me I really want to be triaged. labels Dec 11, 2023
@markvai
Copy link

markvai commented Mar 5, 2024

@shollyman
Sorry to bump an old issue but came across this and found out that if bigquery.readsessions.create exists for the role used but not bigquery.readsessions.getData the response from isAccelerated will be true but trying to iterate over the response will hang and no rows will be returned.

@alvarowolfx
Copy link
Contributor

hey @markvai, can you provide the version that you tested that ? This PR #8666 was suppose to exactly that scenario and return the error to users. So anything >= v1.57.0 of the Go BQ package should have that fix. Or we might have to revisit that.

@markvai
Copy link

markvai commented Mar 6, 2024

Thanks @alvarowolfx, I was running a version without the fix, thanks for the quick answer!

@Pascal-Delange
Copy link

I think I encountered the same issue.

Context

go 1.22.0
cloud.google.com/go/storage v1.41.0

Snippet

gcsClient, err := storage.NewClient(ctx)
if err != nil {
	return nil, err
}
bucket := gcsClient.Bucket(bucketName)
_, err = bucket.Attrs(ctx)
if err != nil {
	return nil, fmt.Errorf("failed to get bucket attributes: %v", err)
}

file, err := fs.Open(filename)
if err!=nil {
     panic(err)
}
blob := bucket.Object(FILENAME).NewWriter(ctx)
defer blob.Close()
writer := blob.NewWriter(ctx)
_, err = io.Copy(writer, file)
if err!=nil {
    panic(err)
}

(The same happens if I'm using an encoding/csv Writer and write row by row)


Expected

The writing fails if permissions are missing

Observed

No error is raised, but writing fails silently if the caller has insufficient permissions (in my case, Storage Legacy Bucket Reader and Storage Object Viewer)

@shollyman
Copy link
Contributor

Following up here. I'm going to close this out as we're surfacing information about the bigquerystorage API acceleration via the row iterator property, and there's not a better way to surface the error directly in the call chain.

Regarding the GCS report, that's actually related to a different package (Cloud Storage vs BigQuery Storage) though the silent failure is concerning. If it's still occurring, could you please open a new issue?

@Pascal-Delange
Copy link

Hi,
Unfortunately I'm quite busy now and won't have time to test again or do another ticket. Can't tell if it's still occurring since I avoided the error in my code.

@stbenjam
Copy link
Author

Following up here. I'm going to close this out as we're surfacing information about the bigquerystorage API acceleration via the row iterator property, and there's not a better way to surface the error directly in the call chain.

No problem thanks for looking at this! I can always do a small limit 1 query first to check that it's accelerated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

5 participants