Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

if access token is expired / invalid, but refresh token is still valid, requests fail with google.api_core.exceptions.Unauthenticated: 401 Request had invalid authentication credentials #223

Open
tswast opened this issue Jun 29, 2021 · 7 comments
Assignees
Labels
type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@tswast
Copy link
Contributor

tswast commented Jun 29, 2021

As stated in the issue title, if access token is expired / invalid, but refresh token is still valid, requests fail with google.api_core.exceptions.Unauthenticated: 401 Request had invalid authentication credentials. This could happen if the customer's clock is out-of-sync, for example.

This issue is in response to a customer issue (internal tracker 191460918).

Environment details

  • OS: macOS
  • Python version: Python 3.9.5
  • pip version: pip 21.1.3
  • google-auth version:
$ conda list | grep google
google-api-core           1.30.0             pyhd8ed1ab_0    conda-forge
google-api-core-grpc      1.30.0               hd8ed1ab_0    conda-forge
google-auth               1.32.0             pyh6c4a22f_0    conda-forge
google-cloud-bigquery     2.20.0             pyhd3deb0d_0    conda-forge
google-cloud-bigquery-core 2.20.0             pyhd3deb0d_0    conda-forge
google-cloud-bigquery-storage-core 2.2.1              pyh44b312d_0    conda-forge
google-cloud-core         1.7.1              pyh6c4a22f_0    conda-forge
google-crc32c             1.1.2            py39he650545_0    conda-forge
google-resumable-media    1.3.1              pyh6c4a22f_0    conda-forge
googleapis-common-protos  1.53.0           py39h6e9494a_0    conda-forge

Steps to reproduce

Works fine:

from google.cloud import bigquery_storage
from google.cloud.bigquery_storage import types
import google.auth
import google.auth.transport.requests

creds, _ = google.auth.default()
creds.refresh(google.auth.transport.requests.Request())
creds.token = "DEFINITELY_EXPIRED"
creds.refresh(google.auth.transport.requests.Request())

session = google.auth.transport.requests.AuthorizedSession(creds)
bqstorageclient = bigquery_storage.BigQueryReadClient(credentials=creds)

project_id = "bigquery-public-data"
dataset_id = "new_york_trees"
table_id = "tree_species"
table = f"projects/{project_id}/datasets/{dataset_id}/tables/{table_id}"
parent = "projects/{}".format(PROJECT_ID)

requested_session = types.ReadSession(
    table=table,
    data_format=types.DataFormat.ARROW,
)
read_session = bqstorageclient.create_read_session(
    parent=parent, read_session=requested_session, max_stream_count=1,
)
print(read_session.streams[0])

Fails:

from google.cloud import bigquery_storage
from google.cloud.bigquery_storage import types
import google.auth
import google.auth.transport.requests

creds, _ = google.auth.default()
creds.refresh(google.auth.transport.requests.Request())
creds.token = "DEFINITELY_EXPIRED"
# creds.refresh(google.auth.transport.requests.Request()) <<< No forced refresh after token expiration.

session = google.auth.transport.requests.AuthorizedSession(creds)
bqstorageclient = bigquery_storage.BigQueryReadClient(credentials=creds)

project_id = "bigquery-public-data"
dataset_id = "new_york_trees"
table_id = "tree_species"
table = f"projects/{project_id}/datasets/{dataset_id}/tables/{table_id}"
parent = "projects/{}".format(PROJECT_ID)

requested_session = types.ReadSession(
    table=table,
    data_format=types.DataFormat.ARROW,
)
read_session = bqstorageclient.create_read_session(
    parent=parent, read_session=requested_session, max_stream_count=1,
)
print(read_session.streams[0])

Stack trace:

Traceback (most recent call last):
  File "/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 67, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNAUTHENTICATED
        details = "Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project."
        debug_error_string = "{"created":"@1625004024.706701000","description":"Error received from peer ipv6:[2607:f8b0:4009:818::200a]:443","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.","grpc_status":16}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/swast/src/scratch/2021/06-b191460918-read-api/access_token_experiment.py", line 37, in <module>
    read_session = bqstorageclient.create_read_session(
  File "/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/google/cloud/bigquery_storage_v1/services/big_query_read/client.py", line 508, in create_read_session
    response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
  File "/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__
    return wrapped_func(*args, **kwargs)
  File "/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/google/api_core/retry.py", line 285, in retry_wrapped_func
    return retry_target(
  File "/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/google/api_core/retry.py", line 188, in retry_target
    return target()
  File "/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 69, in error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
google.api_core.exceptions.Unauthenticated: 401 Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.

Desired fix

Refresh and retry after 401 errors. This is subtly different from adding Unauthenticated to the default list of retryable errors in that a refresh should be attempted first.

Perhaps this issue is best addressed in google-api-core?

@busunkim96
Copy link
Contributor

Thank you for the detailed repro!

I think you're correct that this likely needs to be tweaked in google-api-core. I'm not quite sure what the best way to do that is yet. I'd like to check on the auth lib implementations in other languages to see how they handle the 401 case.

Description of what's happening:

Since google-cloud-bigquery-storage uses the gRPC transport, the credential refresh is part of the AuthMetadataPlugin and is called by gRPC before making the request.

https://github.com/googleapis/google-auth-library-python/blob/f1fee1f4c3d511d9e6ecbc1c0397e743bf2583db/google/auth/transport/grpc.py#L69-L94

before_request checks if a credential is "valid" prior to sending the request.

https://github.com/googleapis/google-auth-library-python/blob/f1fee1f4c3d511d9e6ecbc1c0397e743bf2583db/google/auth/credentials.py#L115-L134
Valid means that a token is present and that the token is not expired. This check can fall apart if the clock is incorrect. The credential is expired but self.expired = False, which means the token isn't refreshed before making the gRPC request.

https://github.com/googleapis/google-auth-library-python/blob/f1fee1f4c3d511d9e6ecbc1c0397e743bf2583db/google/auth/credentials.py#L72-L78

https://github.com/googleapis/google-auth-library-python/blob/f1fee1f4c3d511d9e6ecbc1c0397e743bf2583db/google/auth/credentials.py#L56-L69

After the initial failed gRPC request the retry behavior follows the retryable errors and backoff/timeout settings comes from the gRPC service config that sits next to the protos. The retry objects are added to the wrapped methods. Unauthorized is probably never in that list, since per https://google.aip.dev/194 it usually isn't helpful to retry such an error.

It would be nice to be able to handle this like the Request transport, which would retry the 401 up to two times.

https://github.com/googleapis/google-auth-library-python/blob/f1fee1f4c3d511d9e6ecbc1c0397e743bf2583db/google/auth/transport/requests.py#L492-L501
https://github.com/googleapis/google-auth-library-python/blob/f1fee1f4c3d511d9e6ecbc1c0397e743bf2583db/google/auth/transport/__init__.py#L32-L35

@busunkim96 busunkim96 self-assigned this Jun 29, 2021
@busunkim96 busunkim96 transferred this issue from googleapis/google-auth-library-python Jun 29, 2021
@parthea parthea added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Jun 30, 2021
@tswast
Copy link
Contributor Author

tswast commented Jun 30, 2021

Retry could help, but really it needs to be a retry after refreshing the credentials. I'm thinking about the case when the clock skew is so large that client-side we don't think an access token is expired but it actually is.

@tswast
Copy link
Contributor Author

tswast commented Jun 30, 2021

That _refresh_status_codes is a good find! That does seem to be the right place.

@busunkim96
Copy link
Contributor

I checked with other folks and this is what I heard back:

  • Most languages use a higher clock skew to decrease the likelihood of this happening It looks like we decreased clock_skew from 5 minutes -> 10 seconds because of how the metadata server behaves. This also won't help if the clock is off by more than a few minutes
  • Node.js additionally offers users the following options to following options:
    • adjust the eager refresh threshold time
    • opt into retrying a 4xx once

@tswast
Copy link
Contributor Author

tswast commented Jul 8, 2021

higher clock skew

Like the 5 minutes clock skew? From "since metadata server token endpoint doesn't generate a new token until 30s before the expiration", perhaps 30s might help and also avoid the [metadata server] issue somewhat?

opt into retrying a 4xx once

I still think this is necessary. I'm not sure why it'd be opt-in, though. It's on by default currently on our REST transport, right? A single retry is probably enough. If it fails, hopefully it's with a retryable error and one layer up at api-core retries can kick in.

@parthea
Copy link
Collaborator

parthea commented Oct 8, 2021

@busunkim96 @tswast Is this issue resolved with googleapis/google-auth-library-python#863?

@tswast
Copy link
Contributor Author

tswast commented Oct 8, 2021

Maybe? I'd be much more comfortable saying yes if there was also the "opt into retrying a 4xx once" feature. Given all the issues I've seen, especially with Cloud Functions, I don't really trust that we can get the token proactively in all cases.

@parthea parthea added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Nov 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

3 participants