-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
per-worker listener and watchdog stats #8263
Conversation
This PR does a few things: 1) Adds per-worker listener stats, useful for viewing worker connection imbalance. 2) Adds per-worker watchdog miss stats, useful for viewing per worker event loop latency. 3) Misc connection handling cleanups. Part of #4602 Signed-off-by: Matt Klein <[email protected]>
Signed-off-by: Matt Klein <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did a quick skim of this and I have some high level questions mostly to educate me...
What is the rough cardinality (order of magnitude) of:
- watched dogs (one per thread?)
- active listeners of various flavors
- connection handlers
I see some potential opportunities to factor out or front-load some name lookups but it may be immaterial of the cardinalities are low or they are all created at server-start as opposed to during requests.
I will dive in more today.
1 per worker
1 per listener (so per-worker stats is listeners X workers)
1 per worker and 1 on the main thread
Yeah I agree we could do better here, but given the low cardinality I'm not sure it matters. I think we could make these stats configurable if we think this is going to be an issue though, but I found these stats to be incredibly useful in debugging potential worker imbalance issues so they seem generally useful to me (but I wanted to get @mergeconflict @oschaaf opinion on that) |
Signed-off-by: Matt Klein <[email protected]>
Signed-off-by: Matt Klein <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good to me; my only nits are about stat names.
/azp run envoy-linux |
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Matt Klein <[email protected]>
Signed-off-by: Matt Klein <[email protected]>
Signed-off-by: Matt Klein <[email protected]>
Signed-off-by: Matt Klein <[email protected]>
Signed-off-by: Matt Klein <[email protected]>
@danzh2010 @jmarantz @mergeconflict master merged and comments addressed. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm generally modulo the test nit.
Signed-off-by: Matt Klein <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than a few nits!
Signed-off-by: Matt Klein <[email protected]>
e1a0c99
@danzh2010 updated per comments. |
This PR does a few things: 1) Adds per-worker listener stats, useful for viewing worker connection imbalance. 2) Adds per-worker watchdog miss stats, useful for viewing per worker event loop latency. 3) Misc connection handling cleanups. Part of envoyproxy#4602 Signed-off-by: Matt Klein <[email protected]>
This PR does a few things: 1) Adds per-worker listener stats, useful for viewing worker connection imbalance. 2) Adds per-worker watchdog miss stats, useful for viewing per worker event loop latency. 3) Misc connection handling cleanups. Part of envoyproxy#4602 Signed-off-by: Matt Klein <[email protected]>
This PR does a few things: 1) Adds per-worker listener stats, useful for viewing worker connection imbalance. 2) Adds per-worker watchdog miss stats, useful for viewing per worker event loop latency. 3) Misc connection handling cleanups. Part of envoyproxy#4602 Signed-off-by: Matt Klein <[email protected]>
This PR does a few things:
connection imbalance.
worker event loop latency.
Part of #4602
Risk Level: Medium
Testing: Modified UTs and new integration test
Docs Changes: Added
Release Notes: Added