Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent DefaultCredentialsError on GCE #211

Closed
dhermes opened this issue Nov 9, 2017 · 23 comments · Fixed by #398
Closed

Intermittent DefaultCredentialsError on GCE #211

dhermes opened this issue Nov 9, 2017 · 23 comments · Fixed by #398
Assignees
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@dhermes
Copy link
Contributor

dhermes commented Nov 9, 2017

Original issue: googleapis/google-cloud-python#4358

After successful use of credentials, _ = google.auth.default(), an application crashes when credentials cannot be detected:

...
  File "/usr/local/lib/python2.7/dist-packages/google/cloud/client.py", line 212, in __init__
    Client.__init__(self, credentials=credentials, _http=_http)
  File "/usr/local/lib/python2.7/dist-packages/google/cloud/client.py", line 125, in __init__
    credentials, _ = google.auth.default()
  File "/usr/local/lib/python2.7/dist-packages/google/auth/_default.py", line 286, in default
    raise exceptions.DefaultCredentialsError(_HELP_MESSAGE)
DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or
explicitly create credential and re-run the application. For more
information, please see
https://developers.google.com/accounts/docs/application-default-credentials.

/cc @dmho418

@dhermes
Copy link
Contributor Author

dhermes commented Nov 9, 2017

Also from @dmho418 (may be unrelated to this issue, but is probably the root cause):

...
  File "/usr/local/lib/python2.7/dist-packages/google/resumable_media/requests/upload.py", line 97, in transmit
    retry_strategy=self._retry_strategy)
  File "/usr/local/lib/python2.7/dist-packages/google/resumable_media/requests/_helpers.py", line 101, in http_request
    func, RequestsMixin._get_status_code, retry_strategy)
  File "/usr/local/lib/python2.7/dist-packages/google/resumable_media/_helpers.py", line 146, in wait_and_retry
    response = func()
  File "/usr/local/lib/python2.7/dist-packages/google/auth/transport/requests.py", line 176, in request
    self._auth_request, method, url, request_headers)
  File "/usr/local/lib/python2.7/dist-packages/google/auth/credentials.py", line 121, in before_request
    self.refresh(request)
  File "/usr/local/lib/python2.7/dist-packages/google/auth/compute_engine/credentials.py", line 93, in refresh
    raise exceptions.RefreshError(exc)
RuntimeError: RefreshError: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true

@dhermes
Copy link
Contributor Author

dhermes commented Nov 9, 2017

@jonparrott Would there be a way to access the number of retries that were exceeded?

@theacodes
Copy link
Contributor

It'll be the requests / urllib3 default

@dhermes
Copy link
Contributor Author

dhermes commented Nov 9, 2017

Default retry strategy for the requests transport:

>>> import requests
>>> session = requests.Session()
>>> adapter1, adapter2 = session.adapters.values()
>>> adapter1.max_retries
Retry(total=0, connect=None, read=False, redirect=None, status=None)
>>> adapter2.max_retries
Retry(total=0, connect=None, read=False, redirect=None, status=None)

From the docstring:

By default, Requests does not retry failed connections.

@dhermes
Copy link
Contributor Author

dhermes commented Nov 9, 2017

I'm trying to reproduce on GCE with:

$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get install -y python-dev python-pip
$ sudo -H pip install --upgrade google-auth requests
$ cat << EOF > repro.py
> import pdb
> import sys
> import time
>
> import google.auth
>
>
> def main():
>     if len(sys.argv) < 2:
>         num_attempts = 100
>     else:
>         num_attempts = int(sys.argv[1])
>     for n in range(num_attempts):
>         print(n)
>         try:
>             credentials, project = google.auth.default()
>         except:
>             pdb.set_trace()
>             raise
>         time.sleep(0.5)
>
>
> if __name__ == '__main__':
>     main()
> EOF
$
$ python repro.py 10
$ python repro.py 100
$ python repro.py 1000

and am not having any luck.

If I had to guess, I'd say the only way to reproduce would be to stress out the instance (e.g. high CPU usage) so that the process handling the metadata server fails.

@antoineazar
Copy link

antoineazar commented Nov 16, 2017

We see this issue in our setup as well. Python script connecting to BQ and running hundreds of queries in rapid fire sequence. We'll occasionally see:
A INFO:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable.

followed by a crash:
google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credential and re-run the application.

@dhermes
Copy link
Contributor Author

dhermes commented Nov 16, 2017

Awesome data @antoineazar! Thanks for the confirmation.

Any code you could share so we could try to reproduce / strategize?

@antoineazar
Copy link

antoineazar commented Dec 1, 2017

@dhermes any code that runs multiple (dozens usually suffice) queries on BQ in rapid sequence. Some simple sample code (pre-0.28 library, didn't test with 0.28), you can wrap this in a loop:

client = bigquery.Client()
query_job = client.run_async_query(str(uuid.uuid4()), query)

# Use standard SQL syntax.
query_job.use_legacy_sql = False

# Set a destination table.
dest_dataset = client.dataset(dest_dataset_id)
dest_table = dest_dataset.table(dest_table_id)
query_job.destination = dest_table

# Allow the results table to be overwritten.
query_job.write_disposition = 'WRITE_TRUNCATE'

query_job.begin()
query_job.result()  # Wait for query to finish.

@GEverding
Copy link

I'm seeing this issue in production. Is there a workaround/fix?

@theacodes
Copy link
Contributor

@GEverding the recommendation right now is to use a service account keyfile instead of relying on the GCE metadata service.

It's possible that we could make retry failed connections to the metadata service, but I'm unsure on that at the moment.

@theacodes theacodes self-assigned this Mar 1, 2018
@theacodes theacodes added the bug label Mar 1, 2018
@GEverding
Copy link

GEverding commented Mar 1, 2018 via email

@theacodes
Copy link
Contributor

@vanpelt
Copy link

vanpelt commented Apr 26, 2018

I'm getting "[Errno 111] Connection refused" triggering the same exceptions @dhermes mentioned fairly regularly on my appengine flex deployment. Is the best solution getting a keyfile into my flex deployment? I'm going to try passing in a requests adapter that has retry logic configured, but it's frustrating I've already spent this much time on such a fundamental feature of this library.

@theacodes
Copy link
Contributor

theacodes commented Apr 26, 2018

@vanpelt yep, that is a completely acceptable approach.

@JustinBeckwith JustinBeckwith added triage me I really want to be triaged. 🚨 This issue needs some love. labels Jun 8, 2018
@tseaver tseaver added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed bug triage me I really want to be triaged. labels Jun 11, 2018
@KevinTydlacka
Copy link

I'm also seeing these errors. Stack trace below:

DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://developers.google.com/accounts/docs/application-default-credentials.
at default (/env/local/lib/python2.7/site-packages/google/auth/_default.py:306)
at create_channel (/env/local/lib/python2.7/site-packages/google/api_core/grpc_helpers.py:170)
at __init__ (/env/local/lib/python2.7/site-packages/google/cloud/pubsub_v1/publisher/client.py:81)
at handle_asc_sub_notif (/home/vmagent/app/main.py:101)
at dispatch_request (/env/local/lib/python2.7/site-packages/flask/app.py:1598)
at full_dispatch_request (/env/local/lib/python2.7/site-packages/flask/app.py:1612)
at handle_user_exception (/env/local/lib/python2.7/site-packages/flask/app.py:1517)
at full_dispatch_request (/env/local/lib/python2.7/site-packages/flask/app.py:1614)
at wsgi_app (/env/local/lib/python2.7/site-packages/flask/app.py:1982)

I'm running a very, very simple App Engine app in the Flexible Python 2.7 environment. It really doesn't do anything more than the sample found here, using the exact code: https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/appengine/flexible/pubsub

I ingest post data within a flask app, then publish it to a topic using the pubsub_v1 client. No other errors in the logs before or after, and the app/method successfully processes other requests within seconds before the failed attempt, and within a minute after.

This is a little frustrating/concerning since I'm basically seeing the issue using the provided GAE sample code. I'm updating my app and pushing a service account secrets json with my code, and will see what happens if I manually create the credentials for the client instead, but would love to see this resolved.

@ocervell
Copy link

ocervell commented Sep 3, 2018

Still seeing this today. Any workaround if storing the credentials file on the server is not possible (riskier than default service account credentials) ?

@joshnewlinatclearobject

Is this something that I'll need to push up a keyfile with my flex deployment (as @vanpelt mentioned), or will it be fixed in the PR that was just linked?

@dhendry
Copy link

dhendry commented Oct 1, 2018

I would love to see a fix for this issue

@cpavon
Copy link

cpavon commented Feb 8, 2019

Is this still open 1 year later?

@JustinBeckwith JustinBeckwith added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed priority: p2 Moderately-important priority. Fix may not be included in next release. labels Feb 11, 2019
@mike-seekwell
Copy link

mike-seekwell commented Jun 30, 2019

@JustinBeckwith @theacodes I see #323 was merged, should this issue be fixed now? A lot of people will likely land here when they run into this issue, you should confirm the fix here and, if it's really a fix, tell people to upgrade (google-auth==1.6.3).

@busunkim96
Copy link
Contributor

@mike-seekwell Thanks for the call out! #323 merged a fix to retry the ping to the metadata server.

If you're seeing this error, please upgrade to version 1.6.3 or greater.

@ghost
Copy link

ghost commented Oct 26, 2019

I'm still seeing this error fairly regularly running on GAE flex for Python 3.6 with google-auth==1.6.3. Here's the full stack trace:

  ...
  File "/env/lib/python3.6/site-packages/google/auth/transport/requests.py", line 205, in request
    self._auth_request, method, url, request_headers)
  File "/env/lib/python3.6/site-packages/google/auth/credentials.py", line 122, in before_request
    self.refresh(request)
  File "/env/lib/python3.6/site-packages/google/auth/compute_engine/credentials.py", line 102, in refresh
    six.raise_from(new_exc, caught_exc)
  File "<string>", line 3, in raise_from
google.auth.exceptions.RefreshError: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/[email protected]/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9964d74080>: Failed to establish a new connection: [Errno 111] Connection refused',))

@JustinBeckwith
Copy link
Contributor

Greetings! Would you mind opening a new issue? It makes tracking these discussions much easier.

DaniilAnichin pushed a commit to DaniilAnichin/google-auth-library-python that referenced this issue Dec 5, 2019
Initial fix of issue googleapis#211 was done in CL googleapis#323, but only for .ping(). This one is adding same behaviour & tests for .get() method, as the problem still occurres
DaniilAnichin pushed a commit to DaniilAnichin/google-auth-library-python that referenced this issue Dec 5, 2019
Initial fix of issue googleapis#211 was done in CL googleapis#323, but only for .ping(). This one is adding same behaviour & tests for .get() method, as the problem still occurres
DaniilAnichin added a commit to DaniilAnichin/google-auth-library-python that referenced this issue Dec 5, 2019
Initial fix of issue googleapis#211 was done in CL googleapis#323, but only for .ping(). This one is adding same behaviour & tests for .get() method, as the problem still occurres
DaniilAnichin added a commit to DaniilAnichin/google-auth-library-python that referenced this issue Dec 5, 2019
Initial fix of issue googleapis#211 was done in CL googleapis#323, but only for .ping(). This one is adding same behaviour & tests for .get() method, as the problem still occurres
DaniilAnichin added a commit to DaniilAnichin/google-auth-library-python that referenced this issue Dec 5, 2019
Initial fix of issue googleapis#211 was done in CL googleapis#323, but only for .ping()n This one is adding same behaviour & tests for .get() method, as the problem still occurres\n\nResolves: googleapis#211
DaniilAnichin added a commit to DaniilAnichin/google-auth-library-python that referenced this issue Dec 5, 2019
Initial fix of issue googleapis#211 was done in CL googleapis#323, but only for .ping()n This one is adding same behaviour & tests for .get() method, as the problem still occurres\n\nResolves: googleapis#211
DaniilAnichin added a commit to DaniilAnichin/google-auth-library-python that referenced this issue Dec 5, 2019
…mpute_engine._metadata.get()

Initial fix of issue googleapis#211 was done in CL googleapis#323, but only for .ping()
This one is adding same behaviour & tests for .get() method, as the problem still occurres
See the issue for details

Refs: googleapis#323
Resolves: googleapis#211
DaniilAnichin added a commit to DaniilAnichin/google-auth-library-python that referenced this issue Dec 5, 2019
…mpute_engine._metadata.get()

Initial fix of issue googleapis#211 was done in CL googleapis#323, but only for .ping()
This one is adding same behaviour & tests for .get() method, as the problem still occurres
See the issue for details

Refs: googleapis#323
Resolves: googleapis#211
DaniilAnichin added a commit to DaniilAnichin/google-auth-library-python that referenced this issue Dec 19, 2019
…mpute_engine._metadata.get()

Initial fix of issue googleapis#211 was done in CL googleapis#323, but only for .ping()
This one is adding same behaviour & tests for .get() method, as the problem still occurres
See the issue for details

Refs: googleapis#323
Resolves: googleapis#211
DaniilAnichin added a commit to DaniilAnichin/google-auth-library-python that referenced this issue Dec 26, 2019
…mpute_engine._metadata.get()

Initial fix of issue googleapis#211 was done in CL googleapis#323, but only for .ping()
This one is adding same behaviour & tests for .get() method, as the problem still occurres
See the issue for details

Refs: googleapis#323
Resolves: googleapis#211
DaniilAnichin added a commit to DaniilAnichin/google-auth-library-python that referenced this issue Jan 9, 2020
…mpute_engine._metadata.get()

Initial fix of issue googleapis#211 was done in CL googleapis#323, but only for .ping()
This one is adding same behaviour & tests for .get() method, as the problem still occurres
See the issue for details

Refs: googleapis#323
Resolves: googleapis#211
busunkim96 pushed a commit that referenced this issue Jan 9, 2020
…mpute_engine._metadata.get() (#398)

Initial fix of issue #211 was done in CL #323, but only for .ping()
This one is adding same behaviour & tests for .get() method, as the problem still occurres
See the issue for details

Refs: #323
Resolves: #211
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet