add context_propagation_only config option, spec existing disable_send config option #461

trentm · 2021-06-29T00:40:07Z

This is part of my adding disableSend support to the Node.js APM agent.

Currently the Java, Python, and Ruby agents implement disable_send:

I'm not sure if they meet the current MUST / SHOULD semantics I've proposed.

@felixbarny I'd appreciate a sanity check from you on this:

I didn't see an obvious existing spec file for this (didn't feel right in transport.md), so I've started a general section at the bottom of configuration.md.
Do the proposed MUSTs and SHOULDs raise any red flags for my being Node.js-centric?

apmmachine · 2021-06-29T00:43:36Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Duration: 3 min 9 sec

felixbarny

I didn't see an obvious existing spec file for this (didn't feel right in transport.md), so I've started a general section at the bottom of configuration.md.

I think transport.md would be quite fitting. If you disagree, prefer creating a new file over adding to configuration.md.

Do the proposed MUSTs and SHOULDs raise any red flags for my being Node.js-centric?

LGTM 👍

specs/agents/README.md

specs/agents/configuration.md

mikker · 2021-06-29T07:44:58Z

specs/agents/configuration.md

+- SHOULD attempt to reduce runtime overhead where possible. For example,
+  because events will be dropped there is no need to collect stack traces,
+  collect metrics, or to calculate breakdown metrics.


I don't agree with this one. I like to think of DISABLE_SEND as a way to run all of the agent except only the parts that communicate with APM Server. That means doing everything as usual up until the very last point before opening a connection. If perf is an issue, users can use the RECORDING or ENABLED options instead.

But a big use case for this option is to allow for distributed context propagation and log correlation. To reduce overhead for that use case, it does make sense to turn off features that don't have any noticeable side-effects but add overhead. If recording or enabled is set to false, log correlation doesn't work either.

I like to think of DISABLE_SEND as a way to run all of the agent except only the parts that communicate with APM Server. That means doing everything as usual up until the very last point before opening a connection.

What's the advantage of not disabling side-effect-free features such as breakdown metrics if disable_send is false? Is it something you need for testing? Is it because the option name doesn't reflect that? Any suggestions for a better option name while we're standardizing on the behavior?

This is maybe only my own use case, but for my own apps I like to run the agent as integrated as possible, doing as many things as possible, to test that nothing breaks. However, I don't need the resulting data as it's not really relevant when in development/test mode.

It may be because I am the only person who's both an agent AND application developer, but I can see other folks wanting something like this too. Like, staging or whatever, where you want to mirror prod as closely as possible but don't wan't the burden of setting up a dedicated APM Server for it or don't want the data clutter.

idk, ready to be voted down. This definition was just not how I, personally, used it 😊

I can understand the dev/test use case. For that I wrote a small mock APM server: https://gist.github.com/trentm/4b8c54a8bdb1c1eba2ade871253860f6 It shows all requests nicely, which can be helpful for dev as well:

[2021-06-29T23:18:13.255Z] INFO: mockapmserver/47289 on pink.local: request (req.remoteAddress=::ffff:127.0.0.1, req.remotePort=58727, req.bodyLength=2067, res.bodyLength=2) POST /intake/v2/events HTTP/1.1 accept: application/json user-agent: elasticapm-node/3.16.0 elastic-apm-http-client/9.8.1 node/12.22.1 content-type: application/x-ndjson content-encoding: gzip host: localhost:8200 connection: keep-alive transfer-encoding: chunked { "metadata": { "service": { "name": "play-s3", "environment": "development", ...

However, I think the user use case (the motivator for this is usage by Kibana core for elastic/kibana#101711) either trumps the dev case, or we look into two separate config vars here.

@elastic/apm-agent-python What's your understanding/usage of disable_send?

Sorry for the late reply. The reason we added disable_send back in the day was indeed to facilitate the use of the agent for local development and CI (not our CI, the user's CI). The other option is to disable the agent completely, which bears the risk that issues of the agent inter-playing with the app only show up on deployment.

That's two votes for everything but sending 😅

as all the mentioned ways to limit overhead can already be achieved with existing settings AFAICT.

Not entirely, at least in the node.js agent. There isn't another way to avoid:

encoding of the span/transaction objects to data objects for sending

capturing an error (especially its stacktrace) for captured errors.

Wouldn't recording or enabled do the trick? Maybe source_lines_error_app_frames for stacktraces?

Wouldn't recording or enabled do the trick?

The Node.js agent doesn't currently implement recording :) -- and I'm not entirely clear from the description of recording in the Python and Java agent docs that it matches. The Java docs imply recording: false means not tracking incoming HTTP requests, which means no trace-context propagation.

Maybe source_lines_error_app_frames for stacktraces?

Mostly, yes. Stacktrace collection is the large part of error (and span) capture, but not all of it.

tl;dr: Would anyone object to the spec here saying something like "Agents MAY reduce runtime overhead where possible, for example skipping stack trace, metrics, error collection and breakdown metric calculation."? Effectively this would allow agent implementations to differ slightly without yet having get into specifying exactly how to support comparatively rare use cases. I imagine this particular Kibana use case -- using APM, but not for traces and metrics -- is rare. This is "option 1" in the details below.

Details (you can skip if your eyes are glazing over):

We discussed this at the apm-agents-nodejs call this morning and Alan and I chatted a bit about it. Here are options that I see:

Change the disable_send spec to say "agents MAY limit work ..." and basically have divergent use cases for the various agents.

Add a separate config option to more clearly state the use case for "do all your work, but be useful for CI where there isn't an APM server" (as Python and Ruby are using disable_send=true now). Seems unnecessary to have Python and Ruby teams prioritize this.

Add a separate config option to more clearly state the use case for "do the minimal work to support trace-context propagation and log correlation" (as Node.js and Java, sort of, are using disable_send=true now). I'm not sure what I would call this option.

Add separate config options to disable particular functionality (like span serialization and error capture) such that the "do the minimal work to support trace-context propagation and log correlation" use case is: disable_send=true, metrics_interval=0s, breakdown_metrics=false, serialize_objects_for_sending=false(?), capture_errors=false(?).

This feels like under-specifying the option to a point where users have to rely on the implementation details of a specific agent.

I think the name disable_send conveys that quite nicely already.

To me, this sounds like a good option. Finding a better name that better suits the context propagation use case (context_propagation_only?) makes it easier to understand the intent of the option. Agents that have already implemented disable_send can just use it as an alias for the new config in the first phase. In a later stage, they can implement the listed optimizations.
The spec should be more concrete on the things agents SHOULD update (such as putting events in the queue, capturing errors and stack traces)

Seems like adding even more options and making it harder for the users to set the right set of options for the context propagation/log correlation use case.

trentm · 2021-06-29T18:13:31Z

I think transport.md would be quite fitting.

Okay, done.

specs/agents/transport.md

trentm · 2021-10-06T16:32:34Z

I dropped the ball on this one. I'd like to get back to this so I can settle on the new name used in the Node.js APM agent because Kibana is going to be using it.

Going on @felixbarny's "To me, this sounds like a good option." the plan is:

Add a separate config option to more clearly state the use case for "do the minimal work to support trace-context propagation and log correlation" (as Node.js and Java, sort of, are using disable_send=true now). I'm not sure what I would call this option.

I like Felix's suggestion of a boolean config var named context_propagation_only. Any objections?
I'll update this PR and re-request review.

… config options

Co-authored-by: Felix Barnsteiner <[email protected]>

…ting usage of disable_send in Python and Ruby agents

trentm · 2021-10-06T20:54:25Z

specs/agents/transport.md

+- MUST NOT log warnings/errors related to failures to communicate with APM server.
+- SHOULD attempt to reduce runtime overhead where possible. For example,
+  because events will be dropped there is no need to collect stack traces,
+  collect metrics, or to calculate breakdown metrics.


REVIEW NOTE: I'm considering adding that there is no need to create spans (other than the top-level transaction) in this mode, pending some discussion on slack.

felixbarny

LGTM

…xt_propagation_only (from separate discussion)

specs/agents/transport.md

basepi

Should we explicitly define these as optional configuration options that agents can choose not to implement if they're not relevant? (My understanding is that they're optional, and if that's correct I think we should define that in the spec.)

Co-authored-by: eyalkoren <[email protected]>

trentm · 2021-10-12T16:42:22Z

Should we explicitly define these as optional configuration options that agents can choose not to implement if they're not relevant?

Perhaps something like starting each of those sections with Agents MAY implement this configuration option. ?

basepi · 2021-10-12T17:00:41Z

Perhaps something like starting each of those sections with Agents MAY implement this configuration option. ?

Seems reasonable. Maybe I'm nit-picking and it's not necessary. I haven't checked if we're explicit on other optionals.

trentm · 2021-10-12T17:10:25Z

Maybe I'm nit-picking

I like the added clarity. The existing language -- "Agents that implement this configuration option" -- is vague.

I haven't checked if we're explicit on other optionals.

It varies. Some say "should" in their opening sentence, implying that it is optional, but there is room for interpretation.

… implement these)

trentm requested a review from felixbarny June 29, 2021 00:40

trentm self-assigned this Jun 29, 2021

felixbarny reviewed Jun 29, 2021

View reviewed changes

specs/agents/README.md Outdated Show resolved Hide resolved

specs/agents/configuration.md Outdated Show resolved Hide resolved

mikker reviewed Jun 29, 2021

View reviewed changes

trentm changed the title ~~spec DISABLE_SEND config option~~ spec disable_send config option Jul 6, 2021

trentm requested a review from felixbarny July 6, 2021 00:15

felixbarny approved these changes Jul 6, 2021

View reviewed changes

specs/agents/transport.md Outdated Show resolved Hide resolved

trentm mentioned this pull request Sep 7, 2021

disableSend env var also disables automatic traces and transactions creation elastic/apm-agent-nodejs#2318

Closed

3 tasks

trentm changed the title ~~spec disable_send config option~~ add context_propagation_only config options, spec existing disable_send config option Oct 6, 2021

trentm changed the title ~~add context_propagation_only config options, spec existing disable_send config option~~ add context_propagation_only config option, spec existing disable_send config option Oct 6, 2021

trentm and others added 5 commits October 6, 2021 13:46

spec DISABLE_SEND config option

e98756f

per feedback: move to transport.md and use suggested case/heading for…

18fbd68

… config options

Update specs/agents/transport.md

4b8594d

Co-authored-by: Felix Barnsteiner <[email protected]>

update: add context_propagation_only for the new thing; document exis…

84824c9

…ting usage of disable_send in Python and Ruby agents

reducing this diff (recovering from failed merges)

3114b9d

trentm force-pushed the disable_send branch from e4a11c2 to 3114b9d Compare October 6, 2021 20:50

trentm requested review from felixbarny, beniwohli and mikker October 6, 2021 20:51

trentm commented Oct 6, 2021

View reviewed changes

felixbarny approved these changes Oct 7, 2021

View reviewed changes

note that span creation can be skipped to reduce processing for conte…

94c4260

…xt_propagation_only (from separate discussion)

trentm marked this pull request as ready for review October 7, 2021 17:38

trentm requested review from a team as code owners October 7, 2021 17:38

eyalkoren approved these changes Oct 10, 2021

View reviewed changes

specs/agents/transport.md Outdated Show resolved Hide resolved

basepi approved these changes Oct 12, 2021

View reviewed changes

clarifying text from Eyal

92a6573

Co-authored-by: eyalkoren <[email protected]>

be explicit that these are optional config vars (i.e. agents need not…

d054284

… implement these)

trentm merged commit 6f0566d into elastic:master Oct 26, 2021

trentm deleted the disable_send branch October 26, 2021 22:30

This was referenced Oct 26, 2021

implement contextPropagationOnly config var, tweak disableSend behaviour elastic/apm-agent-nodejs#2393

Closed

feat: impl contextPropagationOnly config var; change disableSend behaviour elastic/apm-agent-nodejs#2396

Merged

jportner mentioned this pull request Jan 20, 2022

X-Opaque-ID contains UUID causing ES deduplication to fail elastic/kibana#120124

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add context_propagation_only config option, spec existing disable_send config option #461

add context_propagation_only config option, spec existing disable_send config option #461

trentm commented Jun 29, 2021

apmmachine commented Jun 29, 2021 •

edited

Loading

Build stats

felixbarny left a comment

mikker Jun 29, 2021

felixbarny Jun 29, 2021

mikker Jun 29, 2021

trentm Jun 29, 2021

beniwohli Jul 6, 2021

mikker Jul 6, 2021

trentm Jul 6, 2021

mikker Jul 6, 2021

trentm Jul 6, 2021 •

edited

Loading

felixbarny Jul 7, 2021

trentm commented Jun 29, 2021

trentm commented Oct 6, 2021

trentm Oct 6, 2021

felixbarny left a comment

basepi left a comment •

edited

Loading

trentm commented Oct 12, 2021

basepi commented Oct 12, 2021

trentm commented Oct 12, 2021

add context_propagation_only config option, spec existing disable_send config option #461

add context_propagation_only config option, spec existing disable_send config option #461

Conversation

trentm commented Jun 29, 2021

apmmachine commented Jun 29, 2021 • edited Loading

💚 Build Succeeded

Build stats

felixbarny left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trentm Jul 6, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trentm commented Jun 29, 2021

trentm commented Oct 6, 2021

Choose a reason for hiding this comment

felixbarny left a comment

Choose a reason for hiding this comment

basepi left a comment • edited Loading

Choose a reason for hiding this comment

trentm commented Oct 12, 2021

basepi commented Oct 12, 2021

trentm commented Oct 12, 2021

apmmachine commented Jun 29, 2021 •

edited

Loading

trentm Jul 6, 2021 •

edited

Loading

basepi left a comment •

edited

Loading