fix: recover watch stream on more error types #9995

crwilcox · 2019-12-18T18:06:39Z

Watch Retry is more permissive in Go. This PR replicates that in Python.

Fixes #9890 and b/144734355

frankyn

LGTM. Reviewed error codes map back to the Go list.

tritone

Just noting that several of these errors are noted as non-retryable under https://aip.dev/194 , however I don't have enough context specific to firestore and this client to understand whether this is problematic or not.

BenWhitehead · 2019-12-18T20:57:13Z

@schmidt-sebastian Can you take a look at the list of error codes here and weigh in if they are okay to retry or not?

crwilcox · 2019-12-18T21:11:31Z

I should note in theory, UNAVAILABLE should be sufficient, but Go is functioning, Python isn't and this seems like a likely suspect. I am currently letting a long run occur on my machine to see the results of this change over a few hours.

schmidt-sebastian · 2019-12-19T23:49:47Z

Can we match Node? The list is even more permissive: https://github.com/googleapis/nodejs-firestore/blob/25472e11a0e1a4a5e1931b1652d125f9c8cabf11/dev/src/watch.ts#L817

crwilcox · 2019-12-20T16:57:45Z

@jadekler voiced concerns about many of the retry codes in Go that I copied. I also found that this didn't fully stop the issue. It is possible I could match node. I have a debug session running and am waiting for a failure so I can dig into what went on.

crwilcox · 2019-12-23T00:37:45Z

I left this run for a few days. I think if we just retry INTERNAL that will be sufficient. @jadekler @schmidt-sebastian thoughts?

2019-12-23 00:20:52,887 - google.api_core.bidi - DEBUG - waiting for recv.
DEBUG:google.api_core.bidi:Re-opening stream from gRPC callback.
2019-12-23 00:21:05,749 - google.api_core.bidi - DEBUG - Re-opening stream from gRPC callback.
DEBUG:google.auth.transport.requests:Making request: POST https://accounts.google.com/o/oauth2/token
2019-12-23 00:21:05,787 - google.auth.transport.requests - DEBUG - Making request: POST https://accounts.google.com/o/oauth2/token
DEBUG:urllib3.connectionpool:Resetting dropped connection: accounts.google.com
INFO:google.api_core.bidi:Re-established stream
2019-12-23 00:21:05,801 - urllib3.connectionpool - DEBUG - Resetting dropped connection: accounts.google.com
2019-12-23 00:21:05,802 - google.api_core.bidi - INFO - Re-established stream
  File "/usr/local/google/home/crwilcox/scratch/firestore_rst_stream_unavailable_retry/venv/lib/python3.7/site-packages/grpc/_channel.py", line 519, in traceback
    raise self
  File "/usr/local/google/home/crwilcox/scratch/firestore_rst_stream_unavailable_retry/venv/lib/python3.7/site-packages/google/api_core/bidi.py", line 505, in _recoverable
    return method(*args, **kwargs)
  File "/usr/local/google/home/crwilcox/scratch/firestore_rst_stream_unavailable_retry/venv/lib/python3.7/site-packages/google/api_core/bidi.py", line 561, in _recv
    return next(call)
  File "/usr/local/google/home/crwilcox/scratch/firestore_rst_stream_unavailable_retry/venv/lib/python3.7/site-packages/grpc/_channel.py", line 392, in __next__
    return self._next()
  File "/usr/local/google/home/crwilcox/scratch/firestore_rst_stream_unavailable_retry/venv/lib/python3.7/site-packages/grpc/_channel.py", line 561, in _next
    raise self
DEBUG:google.api_core.bidi:Call to retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7f8bd4199910>> caused <_Rendezvous of RPC that terminated with:
        status = StatusCode.INTERNAL
        details = "Received RST_STREAM with error code 0"
        debug_error_string = "{"created":"@1577060465.749246373","description":"Error received from peer ipv4:74.125.142.95:443","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"Received RST_STREAM with error code 0","grpc_status":13}"
>.
2019-12-23 00:21:05,810 - google.api_core.bidi - DEBUG - Call to retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7f8bd4199910>> caused <_Rendezvous of RPC that terminated with:
        status = StatusCode.INTERNAL
        details = "Received RST_STREAM with error code 0"
        debug_error_string = "{"created":"@1577060465.749246373","description":"Error received from peer ipv4:74.125.142.95:443","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"Received RST_STREAM with error code 0","grpc_status":13}"
>.
DEBUG:google.api_core.bidi:Re-opening stream from retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7f8bd4199910>>.
2019-12-23 00:21:05,811 - google.api_core.bidi - DEBUG - Re-opening stream from retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7f8bd4199910>>.
DEBUG:google.api_core.bidi:Stream was already re-established.
2019-12-23 00:21:05,811 - google.api_core.bidi - DEBUG - Stream was already re-established.
DEBUG:urllib3.connectionpool:https://accounts.google.com:443 "POST /o/oauth2/token HTTP/1.1" 200 None
2019-12-23 00:21:05,847 - urllib3.connectionpool - DEBUG - https://accounts.google.com:443 "POST /o/oauth2/token HTTP/1.1" 200 None
DEBUG:google.api_core.bidi:Re-opening stream from gRPC callback.
2019-12-23 00:21:05,852 - google.api_core.bidi - DEBUG - Re-opening stream from gRPC callback.
INFO:google.api_core.bidi:Re-established stream
2019-12-23 00:21:05,853 - google.api_core.bidi - INFO - Re-established stream
  File "/usr/local/google/home/crwilcox/scratch/firestore_rst_stream_unavailable_retry/venv/lib/python3.7/site-packages/grpc/_channel.py", line 519, in traceback
    raise self
  File "/usr/local/google/home/crwilcox/scratch/firestore_rst_stream_unavailable_retry/venv/lib/python3.7/site-packages/google/api_core/bidi.py", line 505, in _recoverable
    return method(*args, **kwargs)
  File "/usr/local/google/home/crwilcox/scratch/firestore_rst_stream_unavailable_retry/venv/lib/python3.7/site-packages/google/api_core/bidi.py", line 561, in _recv
    return next(call)
  File "/usr/local/google/home/crwilcox/scratch/firestore_rst_stream_unavailable_retry/venv/lib/python3.7/site-packages/grpc/_channel.py", line 392, in __next__
    return self._next()
  File "/usr/local/google/home/crwilcox/scratch/firestore_rst_stream_unavailable_retry/venv/lib/python3.7/site-packages/grpc/_channel.py", line 561, in _next
    raise self
DEBUG:google.api_core.bidi:Call to retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7f8bd4199910>> caused <_Rendezvous of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "Transport closed"
        debug_error_string = "{"created":"@1577060465.850641483","description":"Error received from peer ipv4:74.125.142.95:443","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"Transport closed","grpc_status":14}"
>.
2019-12-23 00:21:05,854 - google.api_core.bidi - DEBUG - Call to retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7f8bd4199910>> caused <_Rendezvous of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "Transport closed"
        debug_error_string = "{"created":"@1577060465.850641483","description":"Error received from peer ipv4:74.125.142.95:443","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"Transport closed","grpc_status":14}"
>.
DEBUG:google.api_core.bidi:Re-opening stream from retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7f8bd4199910>>.
2019-12-23 00:21:05,855 - google.api_core.bidi - DEBUG - Re-opening stream from retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7f8bd4199910>>.
DEBUG:google.api_core.bidi:Stream was already re-established.
2019-12-23 00:21:05,855 - google.api_core.bidi - DEBUG - Stream was already re-established.
DEBUG:google.api_core.bidi:recved response.
2019-12-23 00:21:05,964 - google.api_core.bidi - DEBUG - recved response.
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: 1

jeanbza · 2019-12-23T18:57:47Z

Did you figure out whether RST_STREAM was being returned as an INTERNAL error or not? If so, the change you propose makes sense to me.

I think if we just retry INTERNAL that will be sufficient

INTERNAL in addition to UNAVAILABLE, or just by itself? (UNAVAILABLE should always be retried)

schmidt-sebastian · 2019-12-26T02:15:51Z

I would prefer if we used the same retry configuration everywhere, and the Node SDK has the configuration that is most battle-tested. We should try to retry every Watch request unless we know beforehand that a retry will not help (e.g. "PERMISSION_DENIED"). If we don't do this, our users will, and they will do so without backoff.

crwilcox · 2019-12-26T18:50:07Z

@jadekler I haven't gotten an answer on that yet, but discussion is ongoing at b/144734355.

I think for now @schmidt-sebastian has a reasonable point. Customer of watch are determined to keep it running, likely to the point of personally retrying any of the codes. If we retry most all of them, but with sensible timeouts, that is likely better than leaving it to chance.

I will modify this PR to match Node.js

…ps://github.com/googleapis/nodejs-firestore/blob/25472e11a0e1a4a5e1931b1652d125f9c8cabf11/dev/src/watch.ts\#L817

crwilcox · 2020-01-02T18:25:51Z

Merging this. We have further discussion internally on whether RST_STREAM should occur with the error type we are seeing (INTERNAL), but this ought to resolve the issues for users currently. We can always soften later if this becomes unecessary.

fix: Recover watch stream on more error types

d8b90fb

crwilcox requested review from frankyn and tseaver as code owners December 18, 2019 18:06

googlebot added the cla: yes This human has signed the Contributor License Agreement. label Dec 18, 2019

crwilcox changed the title ~~fix: Recover watch stream on more error types~~ fix: recover watch stream on more error types Dec 18, 2019

frankyn approved these changes Dec 18, 2019

View reviewed changes

crwilcox requested a review from BenWhitehead December 18, 2019 18:29

crwilcox self-assigned this Dec 18, 2019

fix: don't retry deadline exceeded

88d87b6

tritone reviewed Dec 18, 2019

View reviewed changes

crwilcox added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Dec 20, 2019

crwilcox mentioned this pull request Dec 26, 2019

RST_STREAM error from grpc via Bidi in Firestore Client Library #9890

Closed

fix: match recovered stream exceptions to node.js implementation, htt…

e2ff234

…ps://github.com/googleapis/nodejs-firestore/blob/25472e11a0e1a4a5e1931b1652d125f9c8cabf11/dev/src/watch.ts\#L817

schmidt-sebastian approved these changes Dec 27, 2019

View reviewed changes

BenWhitehead mentioned this pull request Dec 30, 2019

Firestore: what possible cause of this exception: InternalServerError: 500 Received RST_STREAM with error code 0 firebase/firebase-admin-python#282

Closed

crwilcox merged commit 13a870c into master Jan 2, 2020

crwilcox deleted the retry-more-error-types branch January 2, 2020 20:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: recover watch stream on more error types #9995

fix: recover watch stream on more error types #9995

crwilcox commented Dec 18, 2019

frankyn left a comment

tritone left a comment

BenWhitehead commented Dec 18, 2019

crwilcox commented Dec 18, 2019

schmidt-sebastian commented Dec 19, 2019

crwilcox commented Dec 20, 2019

crwilcox commented Dec 23, 2019

jeanbza commented Dec 23, 2019

schmidt-sebastian commented Dec 26, 2019

crwilcox commented Dec 26, 2019

crwilcox commented Jan 2, 2020

fix: recover watch stream on more error types #9995

fix: recover watch stream on more error types #9995

Conversation

crwilcox commented Dec 18, 2019

frankyn left a comment

Choose a reason for hiding this comment

tritone left a comment

Choose a reason for hiding this comment

BenWhitehead commented Dec 18, 2019

crwilcox commented Dec 18, 2019

schmidt-sebastian commented Dec 19, 2019

crwilcox commented Dec 20, 2019

crwilcox commented Dec 23, 2019

jeanbza commented Dec 23, 2019

schmidt-sebastian commented Dec 26, 2019

crwilcox commented Dec 26, 2019

crwilcox commented Jan 2, 2020