-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Observed non-terminating stream error 500 Received RST_STREAM with error code 2 #504
Comments
@saitej09 Stream errors like this are expected, either due to random network glitches or when the server terminates a stream. In fact, streams always terminate with an error. From the documentation:
Terminating a stream by the server after half an hour or so is actually expected, this is to prevent the streams to be open indefinitely, should a client crash. Upon terminating, the clients are expected to re-establish the stream, which is exactly what happens:
Even in the event that re-establishing the stream fails, the client will keep retrying (with an exponential backoff), thus client applications do not have to worry about this. |
@plamut Thanks for your reply. Also, you mentioned about a backoff factor. Is it configurable and does that affect the half an hour or so black-out period ? |
@saitej09 The stream is established (or at least should be) almost immediately after the server terminates it, and you should still be receiving messages. Or did you observe that you had not been receiving new messages for full 30 minutes? (the latter would be an actual issue that should be investigated.
I cannot think of anything from the top of my head. Did you perhaps update any dependencies, change any dependencies, vhanged the message load, or do any other changes to your system? If not or if those changes are minor, I can check with the backend people if they have noticed anything unusual in the metrics recently.
I don't believe the exact call that initiates/reopens a streaming pull channel is exposed to the end users, but it should not, in principle affect this. A blackout period this long should not happen, but if understood correctly, you actually observe it often? In other words, do you see the following messages being repeated in the logs over and over again for 30 minutes before a stream is finally re-established?
It would also help if you could paste the output of |
Yes, I have observed that there are no incoming messages (new or repetitive) during this 30 min or so. I see the errors in this manner. INFO 2021-09-24 08:08:01,737 google.cloud.pubsub_v1.subscriber._protocol.streaming_pull_manager _should_terminate 685 : Observed non-terminating stream error 503 The service was unable to fulfill your request. Please try again. [code=8a75] |
@saitej09 Thanks for confirming. Assuming that publishing the messages works without issues, a 30-minute long "blackout" on the subscriber side is not expected behavior. Periodically disconnecting and re-connecting is expected (every 35 minutes in this case), but it's not clear why the client stops receiving messages. Could you please post the library versions that you application uses? In the meantime I also checked with the people on the backend and they suggested to file a support ticket at https://console.cloud.google.com/support/cases. Somebody can then examine the server logs of your project and see exactly what happens when the client tries to re-connect after the stream get terminated. |
@plamut Thanks a lot ! I will create a support ticket and try to debug there. |
@saitej09 Sounds good, fingers crossed. And if there's anything we could do on the client side to assist with debugging, please let me know. |
@Tonow To date we haven't heard back from the OP, unfortunately, but would you mind sharing the requested information? The OS and Python version, the output of In addition, a support ticked should be created at https://console.cloud.google.com/support/cases so that the backend people can examine the server logs to see if they can spot anything unusual. All this can help with narrowing down the possible causes for this behavior, thanks! |
@plamut astroid==2.8.6
backports.entry-points-selectable==1.1.1
bcrypt==3.2.0
cachetools==4.2.4
certifi==2021.10.8
cffi==1.15.0
cfgv==3.3.1
cftime==1.5.1.1
charset-normalizer==2.0.7
coverage==6.1.2
cryptography==36.0.0
decorator==5.1.0
distlib==0.3.3
filelock==3.4.0
google-api-core==2.2.2
google-auth==2.3.3
google-cloud-core==2.2.1
google-cloud-pubsub==2.9.0
google-cloud-storage==1.43.0
google-crc32c==1.3.0
google-resumable-media==2.1.0
googleapis-common-protos==1.53.0
grpc-google-iam-v1==0.12.3
grpcio==1.42.0
grpcio-status==1.42.0
h5py==3.6.0
identify==2.4.0
idna==3.3
influxdb==5.3.1
influxdb-client==1.23.0
invoke==1.6.0
isort==5.10.1
lazy-object-proxy==1.6.0
libcst==0.3.22
mccabe==0.6.1
msgpack==1.0.2
mypy-extensions==0.4.3
netCDF4==1.5.8
nodeenv==1.6.0
numpy==1.21.4
pandas==1.3.4
paramiko==2.8.0
platformdirs==2.4.0
pre-commit==2.15.0
proto-plus==1.19.8
protobuf==3.19.1
pvlib==0.9.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.21
pylint==2.11.1
PyNaCl==1.4.0
pysftp==0.2.9
python-dateutil==2.8.2
pytz==2021.3
PyYAML==6.0
requests==2.26.0
rsa==4.7.2
Rx==3.2.0
scipy==1.7.2
six==1.16.0
slack-sdk==3.11.2
steadysun-libbaseconnection==1.6.0
steadysun-libsolar==0.0.3
toml==0.10.2
typing-extensions==4.0.0
typing-inspect==0.7.1
urllib3==1.26.7
validators==0.18.2
virtualenv==20.10.0
wrapt==1.13.3
xarray==0.20.1 the code runs on GKE on a cluster version
Linux 91dd13b704e0 5.4.0-90-generic #101~18.04.1-Ubuntu SMP Fri Oct 22 09:25:04 UTC 2021 x86_64 GNU/Linux The pubsub stream following this bucket https://console.cloud.google.com/marketplace/product/noaa-public/goes-16 Without any apparent incident, the pod disconnects from the stream, then seems to reconnect:
However it does not process any message afterwards. However, it sends back this same error every half hour (normally there are messages on this thread every 10 minutes at most). And these three logs loop every 30 min for hours until my pod is stopped because my node is preempted Yes I will create a ticket on https://console.cloud.google.com/support/cases and link it to this issue :-) |
Internal support case was closed, closing this as well. |
Hi is it possible to ignore this logging messages? any params like |
Is there any hints as to what causes this? Having an application go into infinite error scroll like this is not helpful. I would except a certain number of reconnect attempts and then failure, instead of an infinite look up reconnect attempts. This has been attempting reconnects for a week now.
I see it started with this, which appears to have some issues in the SDK:
|
@sls-cat I'll reopen this issue and see if someone can address the issue. |
@sls-cat I believe your issue was caused by a bug, which was fixed in version 2.12. Please update, and let me know if that fixes your issue. Thanks! |
@acocuzzo in the release 2.12 which PR should fix this issue :
? |
It was PR #626. |
I'm seeing the same thing as @sls-cat. Here are my dependencies:
I was previously on v2.8 on python3.9 and it worked fine. |
@gzamb Are you able to attach the errors you are seeing? |
Creating new issue for reference, as these are different errors: #785 |
Hi
I am using google pubsub service with flow control mechanism. I get this error very often after processing few messages.
INFO 2021-09-22 12:56:13,932 google.cloud.pubsub_v1.subscriber._protocol.streaming_pull_manager _should_terminate 685 : Observed non-terminating stream error 503 The service was unable to fulfill your request. Please try again. [code=8a75]
INFO 2021-09-22 12:56:13,933 google.cloud.pubsub_v1.subscriber._protocol.streaming_pull_manager _should_recover 663 : Observed recoverable stream error 503 The service was unable to fulfill your request. Please try again. [code=8a75]
INFO 2021-09-22 12:56:13,934 google.api_core.bidi _reopen 487 : Re-established stream
INFO 2021-09-22 12:56:13,934 google.cloud.pubsub_v1.subscriber._protocol.streaming_pull_manager _should_terminate 685 : Observed non-terminating stream error 503 The service was unable to fulfill your request. Please try again. [code=8a75]
INFO 2021-09-22 12:56:13,934 google.cloud.pubsub_v1.subscriber._protocol.streaming_pull_manager _should_recover 663 : Observed recoverable stream error 503 The service was unable to fulfill your request. Please try again. [code=8a75]
I am using the following code for flow control. The service restarts randomly after say half an hour to one hour. Is there a solution to handle this error so that the service is not down for that half an hour to one hour period.
The text was updated successfully, but these errors were encountered: