-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ui, server: hot ranges page hits context deadline exceeded
#104269
Labels
A-check-on-console
Issues that need to be checked on CC Console
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
T-observability
Comments
zachlite
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
T-cluster-observability
labels
Jun 2, 2023
craig bot
pushed a commit
that referenced
this issue
Jul 24, 2023
107457: ui: increase hot ranges page timeout r=zachlite a=zachlite This commit increases the hot ranges request timeout to 30 minutes for both the initial fetch and the refresh. Informs ##104269 Epic: none Release note (bug fix): The timeout duration when loading the Hot Ranges page has been increased to 30 minutes. 107468: kv: implement errors.Wrapper on sendError, deflake test r=knz a=nvanbenschoten Fixes #107353. This commit makes `sendError` implement the `errors.Wrapper` interface. This deflakes `TestDefaultConnectionDisruptionDoesNotInterfereWithSystemTraffic`, which was expecting a call to `require.ErrorIs` to find a `context.DeadlineExceeded` in an error chain that included a `sendError`. Release note: None Co-authored-by: zachlite <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]>
This was referenced Jul 26, 2023
zachlite
added a commit
to zachlite/cockroach
that referenced
this issue
Jul 28, 2023
Requests for hot ranges are serviced by a cluster wide fan-out, where non-trivial work is done on each node to provide a response. For each store, and for each hot range, we start a transaction with KV to look up descriptor info. Previously, there was no upper-bound set on the time a node could take to provide a response. This commit introduces a per-node timeout in the pagination logic, and is configurable with the new cluster setting server.hot_ranges.node.timeout. A value of 0 will disable the timeout. Error behavior and semantics are preserved. If a particular node times out, The fan-out continues as before, as if a node failed to provide a response. Informs cockroachdb#104269 Resolves cockroachdb#107627 Epic: none Release note (ops change): Added a new cluster setting named server.hot_ranges.node.timeout, with a default value of 5 minutes. The setting controls the maximum amount of time that a hot ranges request will spend waiting for a node to provide a response. Set to 0 to disable timeouts.
zachlite
added a commit
to zachlite/cockroach
that referenced
this issue
Jul 31, 2023
Requests for hot ranges are serviced by a cluster wide fan-out, where non-trivial work is done on each node to provide a response. For each store, and for each hot range, we start a transaction with KV to look up descriptor info. Previously, there was no upper-bound set on the time a node could take to provide a response. This commit introduces a per-node timeout in the pagination logic, and is configurable with the new cluster setting server.hot_ranges_request.node.timeout. A value of 0 will disable the timeout. Error behavior and semantics are preserved. If a particular node times out, The fan-out continues as before, as if a node failed to provide a response. Informs cockroachdb#104269 Resolves cockroachdb#107627 Epic: none Release note (ops change): Added a new cluster setting named server.hot_ranges_request.node.timeout, with a default value of 5 minutes. The setting controls the maximum amount of time that a hot ranges request will spend waiting for a node to provide a response. Set to 0 to disable timeouts.
zachlite
added a commit
to zachlite/cockroach
that referenced
this issue
Jul 31, 2023
Requests for hot ranges are serviced by a cluster wide fan-out, where non-trivial work is done on each node to provide a response. For each store, and for each hot range, we start a transaction with KV to look up descriptor info. Previously, there was no upper-bound set on the time a node could take to provide a response. This commit introduces a per-node timeout in the pagination logic, and is configurable with the new cluster setting server.hot_ranges_request.node.timeout. A value of 0 will disable the timeout. Error behavior and semantics are preserved. If a particular node times out, The fan-out continues as before, as if a node failed to provide a response. Informs cockroachdb#104269 Resolves cockroachdb#107627 Epic: none Release note (ops change): Added a new cluster setting named server.hot_ranges_request.node.timeout, with a default value of 5 minutes. The setting controls the maximum amount of time that a hot ranges request will spend waiting for a node to provide a response. Set to 0 to disable timeouts.
craig bot
pushed a commit
that referenced
this issue
Aug 1, 2023
107796: ui, server: add a timeout per node while collecting hot ranges r=zachlite a=zachlite Requests for hot ranges are serviced by a cluster wide fan-out, where non-trivial work is done on each node to provide a response. For each store, and for each hot range, we start a transaction with KV to look up descriptor info. Previously, there was no upper-bound set on the time a node could take to provide a response. This commit introduces a per-node timeout in the pagination logic, and is configurable with the new cluster setting server.hot_ranges.node.timeout. A value of 0 will disable the timeout. Error behavior and semantics are preserved. If a particular node times out, The fan-out continues as before, as if a node failed to provide a response. Informs #104269 Resolves #107627 Epic: none Release note (ops change): Added a new cluster setting named server.hot_ranges.node.timeout, with a default value of 5 minutes. The setting controls the maximum amount of time that a hot ranges request will spend waiting for a node to provide a response. Set to 0 to disable timeouts. Co-authored-by: zachlite <[email protected]>
zachlite
added a commit
to zachlite/cockroach
that referenced
this issue
Aug 18, 2023
Requests for hot ranges are serviced by a cluster wide fan-out, where non-trivial work is done on each node to provide a response. For each store, and for each hot range, we start a transaction with KV to look up descriptor info. Previously, there was no upper-bound set on the time a node could take to provide a response. This commit introduces a per-node timeout in the pagination logic, and is configurable with the new cluster setting server.hot_ranges_request.node.timeout. A value of 0 will disable the timeout. Error behavior and semantics are preserved. If a particular node times out, The fan-out continues as before, as if a node failed to provide a response. Informs cockroachdb#104269 Resolves cockroachdb#107627 Epic: none Release note (ops change): Added a new cluster setting named server.hot_ranges_request.node.timeout, with a default value of 5 minutes. The setting controls the maximum amount of time that a hot ranges request will spend waiting for a node to provide a response. Set to 0 to disable timeouts.
maryliag
added
the
A-check-on-console
Issues that need to be checked on CC Console
label
Nov 3, 2023
exalate-issue-sync
bot
added
T-observability
and removed
T-cluster-observability
labels
Mar 21, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-check-on-console
Issues that need to be checked on CC Console
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
T-observability
The Hot Ranges page can time out if the target cluster has a large number of nodes (45 nodes in the case of this reported error).
There was work done to improve the performance of Hot Ranges requests in af62e80 as a part of #74377 by adding pagination.
We might consider a pagination scheme that visits 1 node at a time, and streams that data back to the client. Hot Range data doesn't require some final aggregation or filtering, so we don't need to wait for a cluster-wide fan-out to complete.
Jira issue: CRDB-28439
The text was updated successfully, but these errors were encountered: