Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

per-worker listener and watchdog stats #8263

Merged
merged 11 commits into from
Sep 20, 2019
Merged

per-worker listener and watchdog stats #8263

merged 11 commits into from
Sep 20, 2019

Conversation

mattklein123
Copy link
Member

This PR does a few things:

  1. Adds per-worker listener stats, useful for viewing worker
    connection imbalance.
  2. Adds per-worker watchdog miss stats, useful for viewing per
    worker event loop latency.
  3. Misc connection handling cleanups.

Part of #4602

Risk Level: Medium
Testing: Modified UTs and new integration test
Docs Changes: Added
Release Notes: Added

This PR does a few things:
1) Adds per-worker listener stats, useful for viewing worker
   connection imbalance.
2) Adds per-worker watchdog miss stats, useful for viewing per
   worker event loop latency.
3) Misc connection handling cleanups.

Part of #4602

Signed-off-by: Matt Klein <[email protected]>
@mattklein123
Copy link
Member Author

@danzh2010 @mergeconflict @jmarantz PTAL

Signed-off-by: Matt Klein <[email protected]>
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.

🐱

Caused by: #8263 was synchronize by mattklein123.

see: more, trace.

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did a quick skim of this and I have some high level questions mostly to educate me...

What is the rough cardinality (order of magnitude) of:

  • watched dogs (one per thread?)
  • active listeners of various flavors
  • connection handlers

I see some potential opportunities to factor out or front-load some name lookups but it may be immaterial of the cardinalities are low or they are all created at server-start as opposed to during requests.

I will dive in more today.

@mattklein123
Copy link
Member Author

watched dogs (one per thread?)

1 per worker

active listeners of various flavors

1 per listener (so per-worker stats is listeners X workers)

connection handlers

1 per worker and 1 on the main thread

I see some potential opportunities to factor out or front-load some name lookups but it may be immaterial of the cardinalities are low or they are all created at server-start as opposed to during requests.

Yeah I agree we could do better here, but given the low cardinality I'm not sure it matters. I think we could make these stats configurable if we think this is going to be an issue though, but I found these stats to be incredibly useful in debugging potential worker imbalance issues so they seem generally useful to me (but I wanted to get @mergeconflict @oschaaf opinion on that)

Copy link
Member

@mergeconflict mergeconflict left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me; my only nits are about stat names.

docs/root/configuration/listeners/stats.rst Outdated Show resolved Hide resolved
docs/root/operations/performance.rst Show resolved Hide resolved
docs/root/configuration/listeners/stats.rst Outdated Show resolved Hide resolved
@mattklein123
Copy link
Member Author

/azp run envoy-linux

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mattklein123
Copy link
Member Author

@danzh2010 @jmarantz @mergeconflict master merged and comments addressed. PTAL.

mergeconflict
mergeconflict previously approved these changes Sep 19, 2019
Copy link
Member

@mergeconflict mergeconflict left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

jmarantz
jmarantz previously approved these changes Sep 20, 2019
Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm generally modulo the test nit.

docs/root/operations/performance.rst Show resolved Hide resolved
source/server/connection_handler_impl.cc Show resolved Hide resolved
test/server/connection_handler_test.cc Show resolved Hide resolved
test/server/guarddog_impl_test.cc Outdated Show resolved Hide resolved
danzh2010
danzh2010 previously approved these changes Sep 20, 2019
Copy link
Contributor

@danzh2010 danzh2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than a few nits!

Signed-off-by: Matt Klein <[email protected]>
@mattklein123
Copy link
Member Author

@danzh2010 updated per comments.

@mattklein123 mattklein123 merged commit 483aa09 into master Sep 20, 2019
@mattklein123 mattklein123 deleted the per_worker_stats branch September 20, 2019 20:05
danzh2010 pushed a commit to danzh2010/envoy that referenced this pull request Sep 24, 2019
This PR does a few things:
1) Adds per-worker listener stats, useful for viewing worker
   connection imbalance.
2) Adds per-worker watchdog miss stats, useful for viewing per
   worker event loop latency.
3) Misc connection handling cleanups.

Part of envoyproxy#4602

Signed-off-by: Matt Klein <[email protected]>
danzh2010 pushed a commit to danzh2010/envoy that referenced this pull request Oct 4, 2019
This PR does a few things:
1) Adds per-worker listener stats, useful for viewing worker
   connection imbalance.
2) Adds per-worker watchdog miss stats, useful for viewing per
   worker event loop latency.
3) Misc connection handling cleanups.

Part of envoyproxy#4602

Signed-off-by: Matt Klein <[email protected]>
danzh2010 pushed a commit to danzh2010/envoy that referenced this pull request Oct 4, 2019
This PR does a few things:
1) Adds per-worker listener stats, useful for viewing worker
   connection imbalance.
2) Adds per-worker watchdog miss stats, useful for viewing per
   worker event loop latency.
3) Misc connection handling cleanups.

Part of envoyproxy#4602

Signed-off-by: Matt Klein <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants