Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.17.0 breaking pubsub #25

Closed
bill-within opened this issue Apr 15, 2020 · 12 comments · Fixed by #30
Closed

1.17.0 breaking pubsub #25

bill-within opened this issue Apr 15, 2020 · 12 comments · Fixed by #30
Assignees
Labels
api: pubsub Issues related to the Pub/Sub API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@bill-within
Copy link

Hi all,
Just a heads-up - 1.17.0 appears to be breaking the current pubsub client. Lots of time spent debugging this, hopefully it will help some other poor bastards out there:

DEBUG:google.api_core.retry:Retrying due to 503 The service was unable to fulfill your request. Please try again. [code=8a75], sleeping 0.2s ...
ERROR:grpc._channel:Exception iterating requests!
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 195, in consume_request_iterator
    request = next(request_iterator)
ValueError: generator already executing
DEBUG:google.api_core.bidi:Thread-ConsumeBidirectionalStream caught error None Exception iterating requests! and will exit. Generally this is due to the RPC itself being cancelled and the error will be surfaced to the calling code.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 144, in error_remapped_callable
    return _StreamingResponseIterator(result)
  File "/usr/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 72, in __init__
    self._stored_first_result = six.next(self._wrapped)
  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 416, in __next__
    return self._next()
  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 706, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
    status = StatusCode.UNKNOWN
    details = "Exception iterating requests!"
    debug_error_string = "None"
>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/google/api_core/bidi.py", line 637, in _thread_main
    self._bidi_rpc.open()
  File "/usr/local/lib/python3.7/site-packages/google/api_core/bidi.py", line 280, in open
    call = self._start_rpc(iter(request_generator), metadata=self._rpc_metadata)
  File "/usr/local/lib/python3.7/site-packages/google/cloud/pubsub_v1/gapic/subscriber_client.py", line 1076, in streaming_pull
    requests, retry=retry, timeout=timeout, metadata=metadata
  File "/usr/local/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py", line 143, in __call__
    return wrapped_func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 286, in retry_wrapped_func
    on_error=on_error,
  File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 184, in retry_target
    return target()
  File "/usr/local/lib/python3.7/site-packages/google/api_core/timeout.py", line 214, in func_with_timeout
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 146, in error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
google.api_core.exceptions.Unknown: None Exception iterating requests!
INFO:google.api_core.bidi:Thread-ConsumeBidirectionalStream exiting
INFO:google.cloud.pubsub_v1.subscriber._protocol.heartbeater:Thread-Heartbeater exiting.
INFO:google.cloud.pubsub_v1.subscriber._protocol.leaser:Thread-LeaseMaintainer exiting.

Pinning to 1.16.0 in requirements.txt fixes the issue.

@sweatybridge
Copy link

Another related issue: googleapis/python-pubsub#74

@IlyaFaer IlyaFaer added api: pubsub Issues related to the Pub/Sub API. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Apr 16, 2020
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Apr 16, 2020
@tameem92
Copy link

I was seeing the same issue in all our production apps, and had to revert it to v1.16.0 for it to start working again.

@busunkim96
Copy link
Contributor

busunkim96 commented Apr 16, 2020

Most likely caused by 2b103b6, but possibly 14f1f34.

@plamut Could you try reproducing the issue at each of these commits to track down which one is the problem?

1.17.0 (2020-04-14)

Features

  • api_core: add retry param into PollingFuture() and it's inheritors (#9923) (14f1f34), closes #6197
  • api-core: add client_cert_source to ClientOptions (#17) (748c935)

Bug Fixes

  • consume part of StreamingResponseIterator to support failure while under a retry context (#10206) (2b103b6)

@plamut
Copy link
Contributor

plamut commented Apr 17, 2020

@busunkim96 I checked and 2b103b6 indeed seems to be the culprit. Installing that version of google-api-core results in a bug (the message stream is not resumed), while using its predecessor commit 14f1f34 works fine after re-enabling the network (the stream is re-established as expected).

@busunkim96 busunkim96 added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed triage me I really want to be triaged. labels Apr 17, 2020
@yoshi-automation yoshi-automation added 🚨 This issue needs some love. and removed 🚨 This issue needs some love. labels Apr 22, 2020
@busunkim96 busunkim96 added priority: p2 Moderately-important priority. Fix may not be included in next release. and removed priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Apr 28, 2020
@busunkim96
Copy link
Contributor

Lowering priority as discussed in the weekly meeting. Pub/Sub is pinning the api-core version as a workaround.

@kunduanqb
Copy link

Wondering if this is addressed in 1.19?

@busunkim96
Copy link
Contributor

@kunduanqb Not yet, see #30

gcf-merge-on-green bot pushed a commit that referenced this issue Jun 9, 2020
Closes #25.

This PR adds the ability to disable automatically pre-fetching the first item of a stream returned by `*-Stream` gRPC callables. This hook will be used in PubSub to fix the [stalled stream issue](googleapis/python-pubsub#93), while also not affecting Firestore, since the default behavior is preserved.

I realize the fix is far from ideal, but it's the least ugly among the approaches I tried, e.g. somehow passing the flag through `ResumableBidiRpc` (it's a messy rabbit hole). On the PubSub side monkeypatching the generated SubscriberClient will be needed, but it's a (relatively) clean one-liner:
```patch
diff --git google/cloud/pubsub_v1/gapic/subscriber_client.py google/cloud/pubsub_v1/gapic/subscriber_client.py
index e98a686..1d6c058 100644
--- google/cloud/pubsub_v1/gapic/subscriber_client.py
+++ google/cloud/pubsub_v1/gapic/subscriber_client.py
@@ -1169,6 +1169,8 @@ class SubscriberClient(object):
                 default_timeout=self._method_configs["StreamingPull"].timeout,
                 client_info=self._client_info,
             )
+            # TODO: explain this monkeypatch!
+            self.transport.streaming_pull._prefetch_first_result_ = False
 
         return self._inner_api_calls["streaming_pull"](
             requests, retry=retry, timeout=timeout, metadata=metadata
```

If/when we merge this, we should also release it, and then we can add `!= 1.17.0` to the `google-api-core` version pin in PubSub.

### PR checklist
- [x] Make sure to open an issue as a [bug/issue](https://github.com/googleapis/python-api-core/issues/new/choose) before writing your code!  That way we can discuss the change, evaluate designs, and agree on the general idea
- [x] Ensure the tests and linter pass
- [x] Code coverage does not decrease (if any source code was changed)
- [x] Appropriate docs were updated (if necessary)
@pradn
Copy link

pradn commented Aug 27, 2020

I'm seeing a customer report this for 1.22.1. Does like look familiar or is it an unrelated thing? cc @plamut

Their dependencies:

google-api-core==1.22.1
google-api-python-client==1.10.0
google-auth==1.20.1
google-auth-httplib2==0.0.4
google-cloud==0.34.0
google-cloud-core==1.4.1
google-cloud-firestore==1.8.1
google-cloud-logging==1.15.1
google-cloud-monitoring==0.36.0
google-cloud-pubsub==1.7.0
google-cloud-secret-manager==1.0.0
google-cloud-storage==1.30.0
google-cloud-trace==0.24.0
google-crc32c==0.1.0
google-resumable-media==0.7.1
googleapis-common-protos==1.52.0
grpc-google-iam-v1==0.12.3
grpcio==1.31.0

Error:

xtPayload: "ERROR:root:{'msg': 'Exiting with an exception', 'error': "Unknown('Stream removed')", 'stack_tracke': 'Traceback (most recent call last):\n File "/usr/local/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable\n return callable_(*args, **kwargs)\n File "/usr/local/lib/python3.8/site-packages/grpc/_channel.py", line 826, in call\n return _end_unary_response_blocking(state, call, False, None)\n File "/usr/local/lib/python3.8/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking\n raise _InactiveRpcError(state)\ngrpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNKNOWN\n\tdetails = "Stream removed"\n\tdebug_error_string = "{"created":"@1598295328.810963156","description":"Error received from peer ipv4:74.125.69.95:443","file":"src/core/lib/surface/call.cc","file_line":1061,"grpc_message":"Stream removed","grpc_status":2}"\n>\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/app/main.py", line 37, in main\n consume_messages(subscriber, subscription_path, ingest_recording)\n File "/app/app/consumer.py", line 41, in consume_messages\n subscriber.modify_ack_deadline(\n File "/usr/local/lib/python3.8/site-packages/google/cloud/pubsub_v1/_gapic.py", line 40, in \n fx = lambda self, *a, **kw: wrapped_fx(self.api, *a, **kw) # noqa\n File "/usr/local/lib/python3.8/site-packages/google/cloud/pubsub_v1/gapic/subscriber_client.py", line 939, in modify_ack_deadline\n self._inner_api_calls["modify_ack_deadline"](\n File "/usr/local/lib/python3.8/site-packages/google/api_core/gapic_v1/method.py", line 145, in call\n return wrapped_func(*args, **kwargs)\n File "/usr/local/lib/python3.8/site-packages/google/api_core/retry.py", line 281, in retry_wrapped_func\n return retry_target(\n File "/usr/local/lib/python3.8/site-packages/google/api_core/retry.py", line 184, in retry_target\n return target()\n File "/usr/local/lib/python3.8/site-packages/google/api_core/timeout.py", line 214, in func_with_timeout\n return func(*args, **kwargs)\n File "/usr/local/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable\n six.raise_from(exceptions.from_grpc_error(exc), exc)\n File "", line 3, in raise_from\ngoogle.api_core.exceptions.Unknown: None Stream removed\n'}

@plamut
Copy link
Contributor

plamut commented Aug 28, 2020

@pradn While the status code is the same (UNKNOWN), the reason seems different here:

<_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNKNOWN\n\tdetails = "Stream removed"

(the original issue's reason was "Exception iterating requests!")

Can you check if "stream removed" is something that can be triggered by the server when it decides to terminate a stream for some reason? I don't recall seeing this before, nor any part of the hand-written code that would "remove" the stream.

@pradn
Copy link

pradn commented Sep 8, 2020

I don't think "stream removed" errors come from the server side. I also see this issue in other libraries, like in node. There seems to be no resolution, yet.

@MarcusTheSmith
Copy link

So, has there been any resolution to the stream removed error? I'm getting it on a long running win10 Pub/Sub subscriber in a while(true) loop. I'm using the latest core 1.26.1 do I need to go back to 1.17?

@brian-liegl-sada
Copy link

I am running into the same issue as Marcus. The core is at 1.26.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the Pub/Sub API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.