Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http: dumping session state on the decode path #7390

Merged
merged 14 commits into from
Jul 3, 2019

Conversation

alyssawilk
Copy link
Contributor

@alyssawilk alyssawilk commented Jun 24, 2019

Adding per-stream pretty printing, and tracking the active stream when decoding data and dumping it on crash

If we like how this looks, I'll do the same for the encode path (track scope when entering the router) and add an optional scope tracker to alarms and plumb existing resumption alarms.

Risk Level: Medium
Testing: new unit tests, manual testing.
Docs Changes: n/a
Release Notes: n/a
Part of #7300

@htuch htuch requested a review from mattklein123 June 24, 2019 22:45
@alyssawilk alyssawilk changed the title WIP: http: dumping session state on the decode path http: dumping session state on the decode path Jun 25, 2019
Signed-off-by: Alyssa Wilk <[email protected]>
@alyssawilk alyssawilk changed the title http: dumping session state on the decode path WIP: http: dumping session state on the decode path Jun 25, 2019
Signed-off-by: Alyssa Wilk <[email protected]>
@alyssawilk alyssawilk changed the title WIP: http: dumping session state on the decode path http: dumping session state on the decode path Jun 26, 2019
htuch
htuch previously requested changes Jun 26, 2019
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good structurally, I have a bunch of specific areas on the leafs we could tackle to make this a bit cleaner. Thanks.
/wait

include/envoy/common/scope_tracker.h Outdated Show resolved Hide resolved
include/envoy/event/dispatcher.h Outdated Show resolved Hide resolved
include/envoy/http/header_map.h Outdated Show resolved Hide resolved
os << spaces << #member ": "; \
if ((member) != nullptr) { \
os << "\n"; \
(member)->logState(os, indent_level + 1); \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of this style of printing and indenting TBH. I get that you want to make this efficient and have no allocations, but it's leaking through above into a duck typing of a pure interface (the header map), which seems like a pattern best avoided across if it's going to appear in more places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, to your point I think there is value in having a structured data print type, and I can think of several use cases where it'd be valuable internally. From (possibly outdated?) priors I believe there is also value in a minimal allocation state drop, which is what I planned on implementing first. I think for out-process communication and machine-consumption we'll eventually want to tackle the latter, but I don't think we'd want to use that version in the fatal error handler and I'd like the "core dumps are hard" session state to be human readable where possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fair point. I think it would be useful to document (i.e. provide the citations) on how allocations during fatal handling are a problem. The main one I can imagine is that we're in a soft OOM situation, and allocation would lead to a hard OOM, would be good to have some more intuition on this one.


ABSL_CONST_INIT static absl::Mutex failure_mutex(absl::kConstInit);
using FailureFunctionList = std::list<const CrashHandlerInterface*>;
ABSL_CONST_INIT std::atomic<FailureFunctionList*> crash_handlers{nullptr};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, all the code here to me seems pretty problematic, but maybe there isn't a better way. We have mutexes, atomics with relaxed memory ordering, manual memory allocations, a singleton, etc. Oh my :)

One way we could be cleaner about this is to have a per-thread local optional signal handler for any signal. On a signal, the global handler will iterate over the list of threads and try and invoke the signal handler if registered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd started by trying to implement this as a single handler in the server, and doing the plumbing through to snag dispatcher state, and it was messy enough I thought this was cleaner. I think if we get to a point where more than one class wants to register signal handlers it'd might be cleaner to do a single entry point per thread but I think for now it's just moving the mess to a different location.

source/exe/signal_action.cc Outdated Show resolved Hide resolved
@htuch
Copy link
Member

htuch commented Jun 27, 2019

I'm generally good with this modulo the wish to have better clarity on how significant the "no allocation during fatal handling" policy is, since it is the main constraint on the solution space here. Will leave it to @mattklein123 for review/merge.

Signed-off-by: Alyssa Wilk <[email protected]>
@alyssawilk
Copy link
Contributor Author

Yeah, checked in with other proxy devs and one of our kernel guys, and new/malloc in signal handlers is pretty bad form. Added more detail to the comments. FWIW cerr is not without risk either - I got a suggestion to switch to ABSL_RAW_LOG which looks like it'd be safer but I'm not sure how to pass that through the Envoy stack in a plausible way.

Both folks I checked with independently suggested proto arena as a viable compromise. I think it's a bit riskier than what we have mainly because I'm unfamiliar with how it works under the hood, but I think it's a pretty viable compromise where you get your structured data and I get my (trusting jeremy) minimal-to-no allocations.

I suspect (still trying to avoid memory allocations) that I'd have to write my own proto printer, but I don't think that's hard to do with the existing reflection, and it'd keep all the pretty printing ascii logic to a single utility function rather than a widely included set of macros.

WDYT?

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for checking with folks. I do like the idea of protbuf arenas for future consideration.

@alyssawilk
Copy link
Contributor Author

@mattklein123 I think this is ready for your pass then

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, cool stuff and will definitely be useful. Flushing out some comments.

General question: I wonder if this behavior should be opt-in, opt-out, or have some type of compile control? The reason I ask is that there is some risk of doubt fault and I'm not sure if core dumps will be useful if that happens?

/wait


/*
* A class for tracking the scope of work.
* Currently this is only used for best-effort tracking the any L7 stream doing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/the any/any ?

include/envoy/event/dispatcher.h Show resolved Hide resolved

namespace Envoy {

class ScopeTrackerImpl {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I might call this RAIIScopeTracker or something like that? Since this isn't really an implementation of anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this name is better future-proofed for when we use the scope object for customer memory/cpu accounting (the htuch@ feature) - WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly about it, other than to say that Impl usually is an implementation of an interface which this is not, so I would probably try to find a more descriptive name which IMO is some type of context or something along those lines. e.g. ScopeTrackerScopeState or something. Anyway, not a big deal either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I guess a small class comment on what it is for would be nice either way)

@@ -25,6 +25,10 @@

#include "event2/event.h"

#ifdef ENVOY_HANDLE_SIGNALS
#include "exe/signal_action.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reaching into exe here seems wrong from a layering perspective. Would it be better to have some other type of injected interface for this purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for layering, honestly I'd think all of the signal handling code could be moved out of exe since we use it in test, so it's somewhat core to the server handling. maybe source/server/signal/?

I'm also not sure if that addresses your concerns, or if you still think there's value in the dispatcher not reaching directly to signal handling code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm regarding moving. I might even go as far as to move some of it into source/common/signal but up to you.

source/common/event/dispatcher_impl.h Show resolved Hide resolved
@@ -416,6 +417,8 @@ ConnectionManagerImpl::ActiveStream::ActiveStream(ConnectionManagerImpl& connect
connection_manager_.stats_.named_.downstream_rq_time_, connection_manager_.timeSource())),
stream_info_(connection_manager_.codec_->protocol(), connection_manager_.timeSource()),
upstream_options_(std::make_shared<Network::Socket::Options>()) {
ScopeTrackerImpl scope(this, connection_manager_.read_callbacks_->connection().dispatcher());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you pick the places where you setup the tracked scope? Can we document in comments somewhere? It seems like we should be potentially tracking the encode side as well? Filter continuations? Etc.? I wonder if there is some more generic way we could do this? Nothing immediately comes to mind but I figured I would throw it out there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's in PR description - I'm going to do encode side (from the router filter) as well as alarms for L7 in follow-ups :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK sounds good.

@@ -20,6 +46,14 @@ void SignalAction::sigHandler(int sig, siginfo_t* info, void* context) {
}
tracer.logTrace();

FailureFunctionList* list = fatal_error_handlers.exchange(nullptr, std::memory_order_relaxed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some comments on why we are dealing with thread safety in this way?

"backtrace.*Segmentation fault");
SignalAction::removeFatalErrorHandler(handler);

/* SignalAction actions;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

source/common/common/dump_state_utils.h Show resolved Hide resolved
Copy link
Contributor Author

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I think I got all of your comments. Added a compile option since signals already had one, but do you think we should have DS style config as well?

Signed-off-by: Alyssa Wilk <[email protected]>
@mattklein123
Copy link
Member

Added a compile option since signals already had one, but do you think we should have DS style config as well?

IMO compile time is probably fine and easier, but I don't feel strongly about it if you do.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks looks great. 1 comment about build options, docs, etc.

/wait

@@ -35,6 +35,7 @@ Version history
* http: added support for :ref:`preserve_external_request_id<envoy_api_field_config.filter.network.http_connection_manager.v2.HttpConnectionManager.preserve_external_request_id>` that represents whether the x-request-id should not be reset on edge entry inside mesh
* http: changed `sendLocalReply` to send percent-encoded `GrpcMessage`.
* http: added :ref:`dynamic forward proxy <arch_overview_http_dynamic_forward_proxy>` support.
* http: tracking the active stream and dumping state in Envoy crash handlers. This can be disabled by building with -DENVOY_SKIP_OBJECT_TRACE_ON_DUMP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a bazel config setting similar to https://github.com/envoyproxy/envoy/blob/master/bazel/BUILD#L98? And then also document here: https://github.com/envoyproxy/envoy/tree/master/bazel#disabling-optional-features? And maybe also define this in the compile time options CI build? Also, does disabling signal traces implicitly disable this? I think so but maybe I'm not reading the code correctly? If that's the case can we document that?

Signed-off-by: Alyssa Wilk <[email protected]>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

@mattklein123 mattklein123 dismissed htuch’s stale review July 3, 2019 23:15

Harvey said to merge once reviewed

@mattklein123 mattklein123 merged commit 733052a into envoyproxy:master Jul 3, 2019
@alyssawilk alyssawilk deleted the dumpState branch July 31, 2019 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants