You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
// For now, all errors are retryable except authentication/authorization.
return!grpcutil.IsAuthError(err)
}
However, coverage of this is often spotty, which can cause operations to hang indefinitely. This can often be seen when decommissioning a node and then trying to run operations via that node -- these will often error internally with codes.PermissionDenied, but since the error isn't always considered permanent and returned early, it is retried internally and the operation hangs.
This problem is exacerbated by the RPC layer often wrapping errors in roachpb.Error internally:
These errors do not propagate gRPC error codes because of cockroachdb/errors#63. As mentioned in #56208, we also often rely on checking the error message because of the lack of structured errors, which is brittle and should also be avoided.
The text was updated successfully, but these errors were encountered:
erikgrinaker
added
the
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
label
Mar 4, 2021
In cockroachdb#61356, `TestDecommissionedNodeCannotConnect` was extended to check
error propagation for a `Scan` operation after a node was decommissioned.
However, because of cockroachdb#61470 not all failure modes propagate errors
properly, causing the test to sometimes hang. This patch disables this
check until these problems are addresses.
Release justification: non-production code changes
Release note: None
59288: rfc: declarative, state-based schema changer r=lucy-zhang a=lucy-zhang
https://github.com/lucy-zhang/cockroach/blob/schema-changer-rfc/docs/RFCS/20210115_new_schema_changer.md
Release note: none
61459: tracing: cap registry size at 5k r=irfansharif a=tbg
This caps the size of the span registry at 5k, evicting "random" (map
order) existing entries when at the limit.
The purpose of this is to serve as a guardrail against leaked spans,
which would otherwise lead to unbounded memory growth.
Touches #59188.
Release justification: low risk, high benefit changes to existing
functionality
Release note: None
61471: server: disable flaky check in TestDecommissionedNodeCannotConnect r=tbg a=erikgrinaker
In #61356, `TestDecommissionedNodeCannotConnect` was extended to check
error propagation for a `Scan` operation after a node was decommissioned.
However, because of #61470 not all failure modes propagate errors
properly, causing the test to sometimes hang. This patch disables this
check until these problems are addressed.
Resolves#61452.
Release justification: non-production code changes
Release note: None
Co-authored-by: Lucy Zhang <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: Erik Grinaker <[email protected]>
The KV layer often retries RPC errors internally, but tries to error out early on permanent errors, e.g.:
cockroach/pkg/kv/kvclient/rangecache/range_cache.go
Lines 1326 to 1329 in d5f73c6
However, coverage of this is often spotty, which can cause operations to hang indefinitely. This can often be seen when decommissioning a node and then trying to run operations via that node -- these will often error internally with
codes.PermissionDenied
, but since the error isn't always considered permanent and returned early, it is retried internally and the operation hangs.This problem is exacerbated by the RPC layer often wrapping errors in
roachpb.Error
internally:cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go
Lines 1446 to 1448 in d5f73c6
These errors do not propagate gRPC error codes because of cockroachdb/errors#63. As mentioned in #56208, we also often rely on checking the error message because of the lack of structured errors, which is brittle and should also be avoided.
The text was updated successfully, but these errors were encountered: