-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-24.1: dbconsole: overload page improvements #124509
Merged
aadityasondhi
merged 7 commits into
release-24.1
from
blathers/backport-release-24.1-123522
May 22, 2024
Merged
release-24.1: dbconsole: overload page improvements #124509
aadityasondhi
merged 7 commits into
release-24.1
from
blathers/backport-release-24.1-123522
May 22, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In investigations, we have found that the following charts are not useful and frequently cause confusion: - Admission work rate - Admission Delay rate - Requests Waiting For Flow Tokens Informs #121572 Release note (ui change): This patch removes "Admission Delay Rate", "Admission Work Rate", and "Requests Waiting For Flow Tokens". These charts often cause confusion and are not useful for general overload investigations.
This patch reorders the existing metrics in a more usable order: 1. Metrics to help determine which resource is constrained (IO, CPU) 2. Metrics to narrow down which AC queues are seeing requests waiting 3. More advanced metrics about the system health (goroutine scheduler, L0 sublevels, etc.) Informs #121572. Release note (ui change): Reordering of metrics on the overload page to help categorizing them better. They are roughly in the following order: 1. Metrics to help determine which resource is constrained (IO, CPU) 2. Metrics to narrow down which AC queues are seeing requests waiting 3. More advanced metrics about the system health (goroutine scheduler, L0 sublevels, etc.)
This patch improves the metric descriptions for the metrics on the overload page. Fixes #120853. Release note (ui change): The overload page now includes descriptions for all metrics.
This patch adds additional metrics to the overload page that allow for more granular look at the system: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9 Informs #121572. Release note (ui change): Two additional metrics on the overload page for better visibility into overloaded resources: - cr.store.storage.l0-sublevels - cr.node.go.scheduler_latency-p99.9
Informs #121572. Release note (ui change): There are now 4 graphs for Admission Queue Delay: 1. Foreground (regular) CPU work 2. Store (IO) work 3. Background (elastic) CPU work 4. Replication Admission Control, store overload on replicas
blathers-crl
bot
force-pushed
the
blathers/backport-release-24.1-123522
branch
from
May 21, 2024 18:51
d1f8f90
to
43e1a10
Compare
blathers-crl
bot
requested review from
kyle-a-wong
and removed request for
a team
May 21, 2024 18:51
blathers-crl
bot
added
blathers-backport
This is a backport that Blathers created automatically.
O-robot
Originated from a bot.
labels
May 21, 2024
Thanks for opening a backport. Please check the backport criteria before merging:
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
Also, please add a brief release justification to the body of your PR to justify this |
blathers-crl
bot
added
the
backport
Label PR's that are backports to older release branches
label
May 21, 2024
sumeerbhola
approved these changes
May 21, 2024
dhartunian
approved these changes
May 21, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport
Label PR's that are backports to older release branches
blathers-backport
This is a backport that Blathers created automatically.
O-robot
Originated from a bot.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport 7/7 commits from #123522 on behalf of @aadityasondhi.
/cc @cockroachdb/release
This PR contains a series of improvements to the overload page of the DB console as part of #121574. It is separated into multiple commits for ease of review.
dbconsole: remove non useful charts on the overload page
In investigations, we have found that the following charts are not
useful and frequently cause confusion:
Informs #121572
Release note (ui change): This patch removes "Admission Delay Rate",
"Admission Work Rate", and "Requests Waiting For Flow Tokens". These
charts often cause confusion and are not useful for general overload
investigations.
dbconsole: reorder overload page metrics for better readability
This patch reorders the existing metrics in a more usable order:
L0 sublevels, etc.)
Informs #121572.
Release note (ui change): Reordering of metrics on the overload page to
help categorizing them better. They are roughly in the following order:
L0 sublevels, etc.)
dbconsole: include better names and descriptions for overload page
This patch improves the metric descriptions for the metrics on the
overload page.
Fixes #120853.
Release note (ui change): The overload page now includes descriptions for all
metrics.
dbconsole: additional higher granularity metrics for overload
This patch adds additional metrics to the overload page that allow for
more granular look at the system:
Informs #121572.
Release note (ui change): Two additional metrics on the overload page
for better visibility into overloaded resources:
dbconsole: split Admission Queue graphs to avoid overcrowding
Informs #121572.
Release note (ui change): There are now 4 graphs for Admission Queue
Delay:
dbconsole: add elastic store metric to the overload page
This patch uses the new sperated
elastic-stores
metrics for queingdelay from #123890.
Informs #121572.
Release note (ui change): The
Admission Queueing Delay – Store
chartnow separates elastic (background) work from the regular foreground
work.
dbconsole: add elastic io token exhausted duration to overload page
This patch adds the metric
elastic_io_tokens_exhausted_duration.kv
introduced in #124078.
Informs #121572.
Release note (ui change): The
Admission IO Tokens Exhausted
chart nowseparates elastic and regular io work.
Release justification: Metrics only change that will significantly help Admission Control escalations.