-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
admission: revamp db console overload page to have useful metrics #121572
Labels
A-admission-control
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-admission-control
Admission Control
Comments
aadityasondhi
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-admission-control
T-admission-control
Admission Control
labels
Apr 2, 2024
This was referenced Apr 2, 2024
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 2, 2024
In investigations, we have found that the following charts are not useful and frequently cause confusion: - Admission work rate - Admission Delay rate - Requests Waiting For Flow Tokens Informs cockroachdb#121572 Release note (ui change): This patch removes "Admission Delay Rate", "Admission Work Rate", and "Requests Waiting For Flow Tokens". These charts often cause confusion and are not useful for general overload investigations.
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 2, 2024
This patch reorders the existing metrics in a more usable order: 1. Metrics to help determine which resource is constrained (IO, CPU) 2. Metrics to narrow down which AC queues are seeing requests waiting 3. More advanced metrics about the system health (goroutine scheduler, L0 sublevels, etc.) Informs cockroachdb#121572. Release note (ui change): Reordering of metrics on the overload page to help categorizing them better. They are roughly in the following order: 1. Metrics to help determine which resource is constrained (IO, CPU) 2. Metrics to narrow down which AC queues are seeing requests waiting 3. More advanced metrics about the system health (goroutine scheduler, L0 sublevels, etc.)
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 2, 2024
This patch adds additional metrics to the overload page that allow for more granular look at the system: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9 Informs cockroachdb#121572. Release note (ui change): Two additional metrics on the overload page for better visibility into overloaded resources: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 6, 2024
This patch adds additional metrics to the overload page that allow for more granular look at the system: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9 Informs cockroachdb#121572. Release note (ui change): Two additional metrics on the overload page for better visibility into overloaded resources: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 7, 2024
This patch adds additional metrics to the overload page that allow for more granular look at the system: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9 Informs cockroachdb#121572. Release note (ui change): Two additional metrics on the overload page for better visibility into overloaded resources: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 7, 2024
Informs cockroachdb#121572. Release note (ui change): There are now 4 graphs for Admission Queue Delay: 1. Foreground (regular) CPU work 2. Store (IO) work 3. Background (elastic) CPU work 4. Replication Admission Control, store overload on replicas
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 8, 2024
This patch adds additional metrics to the overload page that allow for more granular look at the system: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9 Informs cockroachdb#121572. Release note (ui change): Two additional metrics on the overload page for better visibility into overloaded resources: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 8, 2024
Informs cockroachdb#121572. Release note (ui change): There are now 4 graphs for Admission Queue Delay: 1. Foreground (regular) CPU work 2. Store (IO) work 3. Background (elastic) CPU work 4. Replication Admission Control, store overload on replicas
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 8, 2024
This patch adds additional metrics to the overload page that allow for more granular look at the system: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9 Informs cockroachdb#121572. Release note (ui change): Two additional metrics on the overload page for better visibility into overloaded resources: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 8, 2024
Informs cockroachdb#121572. Release note (ui change): There are now 4 graphs for Admission Queue Delay: 1. Foreground (regular) CPU work 2. Store (IO) work 3. Background (elastic) CPU work 4. Replication Admission Control, store overload on replicas
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 17, 2024
In investigations, we have found that the following charts are not useful and frequently cause confusion: - Admission work rate - Admission Delay rate - Requests Waiting For Flow Tokens Informs cockroachdb#121572 Release note (ui change): This patch removes "Admission Delay Rate", "Admission Work Rate", and "Requests Waiting For Flow Tokens". These charts often cause confusion and are not useful for general overload investigations.
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 17, 2024
This patch reorders the existing metrics in a more usable order: 1. Metrics to help determine which resource is constrained (IO, CPU) 2. Metrics to narrow down which AC queues are seeing requests waiting 3. More advanced metrics about the system health (goroutine scheduler, L0 sublevels, etc.) Informs cockroachdb#121572. Release note (ui change): Reordering of metrics on the overload page to help categorizing them better. They are roughly in the following order: 1. Metrics to help determine which resource is constrained (IO, CPU) 2. Metrics to narrow down which AC queues are seeing requests waiting 3. More advanced metrics about the system health (goroutine scheduler, L0 sublevels, etc.)
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 17, 2024
This patch adds additional metrics to the overload page that allow for more granular look at the system: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9 Informs cockroachdb#121572. Release note (ui change): Two additional metrics on the overload page for better visibility into overloaded resources: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 17, 2024
Informs cockroachdb#121572. Release note (ui change): There are now 4 graphs for Admission Queue Delay: 1. Foreground (regular) CPU work 2. Store (IO) work 3. Background (elastic) CPU work 4. Replication Admission Control, store overload on replicas
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 17, 2024
This patch uses the new sperated `elastic-stores` metrics for queing delay from cockroachdb#123890. Informs cockroachdb#121572. Release note (ui change): The `Admission Queueing Delay – Store` chart now separates elastic (background) work from the regular foreground work.
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 17, 2024
This patch adds the metric `elastic_io_tokens_exhausted_duration.kv` introduced in cockroachdb#124078. Informs cockroachdb#121572. Release note (ui change): The `Admission IO Tokens Exhausted` chart now separates elastic and regular io work.
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 17, 2024
In investigations, we have found that the following charts are not useful and frequently cause confusion: - Admission work rate - Admission Delay rate - Requests Waiting For Flow Tokens Informs cockroachdb#121572 Release note (ui change): This patch removes "Admission Delay Rate", "Admission Work Rate", and "Requests Waiting For Flow Tokens". These charts often cause confusion and are not useful for general overload investigations.
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 17, 2024
This patch reorders the existing metrics in a more usable order: 1. Metrics to help determine which resource is constrained (IO, CPU) 2. Metrics to narrow down which AC queues are seeing requests waiting 3. More advanced metrics about the system health (goroutine scheduler, L0 sublevels, etc.) Informs cockroachdb#121572. Release note (ui change): Reordering of metrics on the overload page to help categorizing them better. They are roughly in the following order: 1. Metrics to help determine which resource is constrained (IO, CPU) 2. Metrics to narrow down which AC queues are seeing requests waiting 3. More advanced metrics about the system health (goroutine scheduler, L0 sublevels, etc.)
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 17, 2024
This patch adds additional metrics to the overload page that allow for more granular look at the system: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9 Informs cockroachdb#121572. Release note (ui change): Two additional metrics on the overload page for better visibility into overloaded resources: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 17, 2024
Informs cockroachdb#121572. Release note (ui change): There are now 4 graphs for Admission Queue Delay: 1. Foreground (regular) CPU work 2. Store (IO) work 3. Background (elastic) CPU work 4. Replication Admission Control, store overload on replicas
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 17, 2024
This patch uses the new sperated `elastic-stores` metrics for queing delay from cockroachdb#123890. Informs cockroachdb#121572. Release note (ui change): The `Admission Queueing Delay – Store` chart now separates elastic (background) work from the regular foreground work.
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 17, 2024
This patch adds the metric `elastic_io_tokens_exhausted_duration.kv` introduced in cockroachdb#124078. Informs cockroachdb#121572. Release note (ui change): The `Admission IO Tokens Exhausted` chart now separates elastic and regular io work.
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 17, 2024
In investigations, we have found that the following charts are not useful and frequently cause confusion: - Admission work rate - Admission Delay rate - Requests Waiting For Flow Tokens Informs cockroachdb#121572 Release note (ui change): This patch removes "Admission Delay Rate", "Admission Work Rate", and "Requests Waiting For Flow Tokens". These charts often cause confusion and are not useful for general overload investigations.
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 17, 2024
This patch reorders the existing metrics in a more usable order: 1. Metrics to help determine which resource is constrained (IO, CPU) 2. Metrics to narrow down which AC queues are seeing requests waiting 3. More advanced metrics about the system health (goroutine scheduler, L0 sublevels, etc.) Informs cockroachdb#121572. Release note (ui change): Reordering of metrics on the overload page to help categorizing them better. They are roughly in the following order: 1. Metrics to help determine which resource is constrained (IO, CPU) 2. Metrics to narrow down which AC queues are seeing requests waiting 3. More advanced metrics about the system health (goroutine scheduler, L0 sublevels, etc.)
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 17, 2024
This patch adds additional metrics to the overload page that allow for more granular look at the system: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9 Informs cockroachdb#121572. Release note (ui change): Two additional metrics on the overload page for better visibility into overloaded resources: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 17, 2024
Informs cockroachdb#121572. Release note (ui change): There are now 4 graphs for Admission Queue Delay: 1. Foreground (regular) CPU work 2. Store (IO) work 3. Background (elastic) CPU work 4. Replication Admission Control, store overload on replicas
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 17, 2024
This patch uses the new sperated `elastic-stores` metrics for queing delay from cockroachdb#123890. Informs cockroachdb#121572. Release note (ui change): The `Admission Queueing Delay – Store` chart now separates elastic (background) work from the regular foreground work.
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 17, 2024
This patch adds the metric `elastic_io_tokens_exhausted_duration.kv` introduced in cockroachdb#124078. Informs cockroachdb#121572. Release note (ui change): The `Admission IO Tokens Exhausted` chart now separates elastic and regular io work.
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 21, 2024
In investigations, we have found that the following charts are not useful and frequently cause confusion: - Admission work rate - Admission Delay rate - Requests Waiting For Flow Tokens Informs cockroachdb#121572 Release note (ui change): This patch removes "Admission Delay Rate", "Admission Work Rate", and "Requests Waiting For Flow Tokens". These charts often cause confusion and are not useful for general overload investigations.
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 21, 2024
This patch reorders the existing metrics in a more usable order: 1. Metrics to help determine which resource is constrained (IO, CPU) 2. Metrics to narrow down which AC queues are seeing requests waiting 3. More advanced metrics about the system health (goroutine scheduler, L0 sublevels, etc.) Informs cockroachdb#121572. Release note (ui change): Reordering of metrics on the overload page to help categorizing them better. They are roughly in the following order: 1. Metrics to help determine which resource is constrained (IO, CPU) 2. Metrics to narrow down which AC queues are seeing requests waiting 3. More advanced metrics about the system health (goroutine scheduler, L0 sublevels, etc.)
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 21, 2024
This patch adds additional metrics to the overload page that allow for more granular look at the system: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9 Informs cockroachdb#121572. Release note (ui change): Two additional metrics on the overload page for better visibility into overloaded resources: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 21, 2024
Informs cockroachdb#121572. Release note (ui change): There are now 4 graphs for Admission Queue Delay: 1. Foreground (regular) CPU work 2. Store (IO) work 3. Background (elastic) CPU work 4. Replication Admission Control, store overload on replicas
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 21, 2024
This patch uses the new sperated `elastic-stores` metrics for queing delay from cockroachdb#123890. Informs cockroachdb#121572. Release note (ui change): The `Admission Queueing Delay – Store` chart now separates elastic (background) work from the regular foreground work.
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
May 21, 2024
This patch adds the metric `elastic_io_tokens_exhausted_duration.kv` introduced in cockroachdb#124078. Informs cockroachdb#121572. Release note (ui change): The `Admission IO Tokens Exhausted` chart now separates elastic and regular io work.
craig bot
pushed a commit
that referenced
this issue
May 21, 2024
123522: dbconsole: overload page improvements r=sumeerbhola a=aadityasondhi This PR contains a series of improvements to the overload page of the DB console as part of #121574. It is separated into multiple commits for ease of review. ____ dbconsole: remove non useful charts on the overload page In investigations, we have found that the following charts are not useful and frequently cause confusion: - Admission work rate - Admission Delay rate - Requests Waiting For Flow Tokens Informs #121572 Release note (ui change): This patch removes "Admission Delay Rate", "Admission Work Rate", and "Requests Waiting For Flow Tokens". These charts often cause confusion and are not useful for general overload investigations. ___ dbconsole: reorder overload page metrics for better readability This patch reorders the existing metrics in a more usable order: 1. Metrics to help determine which resource is constrained (IO, CPU) 2. Metrics to narrow down which AC queues are seeing requests waiting 3. More advanced metrics about the system health (goroutine scheduler, L0 sublevels, etc.) Informs #121572. Release note (ui change): Reordering of metrics on the overload page to help categorizing them better. They are roughly in the following order: 1. Metrics to help determine which resource is constrained (IO, CPU) 2. Metrics to narrow down which AC queues are seeing requests waiting 3. More advanced metrics about the system health (goroutine scheduler, L0 sublevels, etc.) ___ dbconsole: include better names and descriptions for overload page This patch improves the metric descriptions for the metrics on the overload page. Fixes #120853. Release note (ui change): The overload page now includes descriptions for all metrics. ___ dbconsole: additional higher granularity metrics for overload This patch adds additional metrics to the overload page that allow for more granular look at the system: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9 Informs #121572. Release note (ui change): Two additional metrics on the overload page for better visibility into overloaded resources: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9 ___ dbconsole: split Admission Queue graphs to avoid overcrowding Informs #121572. Release note (ui change): There are now 4 graphs for Admission Queue Delay: 1. Foreground (regular) CPU work 2. Store (IO) work 3. Background (elastic) CPU work 4. Replication Admission Control, store overload on replicas ___ dbconsole: add elastic store metric to the overload page This patch uses the new sperated `elastic-stores` metrics for queing delay from #123890. Informs #121572. Release note (ui change): The `Admission Queueing Delay – Store` chart now separates elastic (background) work from the regular foreground work. ___ dbconsole: add elastic io token exhausted duration to overload page This patch adds the metric `elastic_io_tokens_exhausted_duration.kv` introduced in #124078. Informs #121572. Release note (ui change): The `Admission IO Tokens Exhausted` chart now separates elastic and regular io work. 124493: packer: only try emulating via Docker on x86 r=rail a=rickystewart Epic: none Release note: None Co-authored-by: Aaditya Sondhi <[email protected]> Co-authored-by: Ricky Stewart <[email protected]>
blathers-crl bot
pushed a commit
that referenced
this issue
May 21, 2024
In investigations, we have found that the following charts are not useful and frequently cause confusion: - Admission work rate - Admission Delay rate - Requests Waiting For Flow Tokens Informs #121572 Release note (ui change): This patch removes "Admission Delay Rate", "Admission Work Rate", and "Requests Waiting For Flow Tokens". These charts often cause confusion and are not useful for general overload investigations.
blathers-crl bot
pushed a commit
that referenced
this issue
May 21, 2024
This patch reorders the existing metrics in a more usable order: 1. Metrics to help determine which resource is constrained (IO, CPU) 2. Metrics to narrow down which AC queues are seeing requests waiting 3. More advanced metrics about the system health (goroutine scheduler, L0 sublevels, etc.) Informs #121572. Release note (ui change): Reordering of metrics on the overload page to help categorizing them better. They are roughly in the following order: 1. Metrics to help determine which resource is constrained (IO, CPU) 2. Metrics to narrow down which AC queues are seeing requests waiting 3. More advanced metrics about the system health (goroutine scheduler, L0 sublevels, etc.)
blathers-crl bot
pushed a commit
that referenced
this issue
May 21, 2024
This patch adds additional metrics to the overload page that allow for more granular look at the system: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9 Informs #121572. Release note (ui change): Two additional metrics on the overload page for better visibility into overloaded resources: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9
blathers-crl bot
pushed a commit
that referenced
this issue
May 21, 2024
Informs #121572. Release note (ui change): There are now 4 graphs for Admission Queue Delay: 1. Foreground (regular) CPU work 2. Store (IO) work 3. Background (elastic) CPU work 4. Replication Admission Control, store overload on replicas
Merged #123522. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-admission-control
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-admission-control
Admission Control
Through escalations we have found some metrics on this page to be not useful, while we miss some useful metrics. We should be deliberate with each chart on this page.
Examples to remove:
Jira issue: CRDB-37347
Epic CRDB-36319
The text was updated successfully, but these errors were encountered: