Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: disableSend boolean config option to work, but not communicate with APM server #2127

Merged
merged 8 commits into from
Jun 29, 2021

Conversation

trentm
Copy link
Member

@trentm trentm commented Jun 24, 2021

Env var is ELASTIC_APM_DISABLE_SEND. Default is false. This feature is
useful to use the APM agent for async context tracking, distributed
tracing context propagation, and/or log enrichment with trace.id. When
disableSend=true, attempts are made to limit unnecessary processing
(like calculating metrics, encoding stacktraces, etc.).

This is the same as the "disable_send" option in some of the other APM agents.

Fixes: #2101

Checklist

  • Implement code
  • Add tests
  • Update TypeScript typings
  • Update documentation
  • Add CHANGELOG.asciidoc entry
  • Commit message follows commit guidelines
  • Updated Agent config comparison sheet (Also added it for Java which had already implemented disable_send.)

@trentm trentm self-assigned this Jun 24, 2021
@github-actions github-actions bot added the agent-nodejs Make available for APM Agents project planning. label Jun 24, 2021
@apmmachine
Copy link
Contributor

apmmachine commented Jun 24, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #2127 updated

  • Start Time: 2021-06-29T16:26:46.824+0000

  • Duration: 20 min 14 sec

  • Commit: 549392b

Test stats 🧪

Test Results
Failed 0
Passed 19040
Skipped 0
Total 19040

Trends 🧪

Image of Build Times

Image of Tests

trentm added 5 commits June 28, 2021 15:16
…with APM server

Env var is ELASTIC_APM_DISABLE_SEND. Default is false. This feature is
useful to use the APM agent for async context tracking, distributed
tracing context propagation, and/or log enrichment with trace.id. When
disableSend=true, attempts are made to limit unnecessary processing
(like calculating metrics, encoding stacktraces, etc.).

This is the same as the "disable_send" option in some of the other APM agents.
That should suffice to give debugging hint why data isn't being sent and
limit log.trace noise sufficiently.
@trentm trentm force-pushed the trentm/disable_send branch from eb266f8 to 426f3f9 Compare June 28, 2021 22:17
@trentm trentm marked this pull request as ready for review June 28, 2021 22:17
Copy link
Contributor

@astorm astorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does what it says on the tin and a no-op transport object seems like a great way to accomplish this without being too disruptive. Approving -- I'd be fine landing this as is, but it wasn't clear if the NICE_REGEXPS_FIELDS was intended here or not.

@@ -399,9 +407,21 @@ class Config {
logger: true,
transport: true
}
const NICE_REGEXPS_FIELDS = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see this change mentioned -- this looks like some code that will serialize our "arrays of regular expressions" values as strings so that instead of seeing

    {
        ignoreUrlRegExp: [{},{}]
    }

users will see

    {
        ignoreUrlRegExp: ['/foo','/bar']
    }    

when serializing the configuration object as JSON. This isn't a particularly hot path so I don't see any issue with the extra work being done here, and (as the comment mentions) it's way more useful than a {}

Question: Since we do use this method when we're logging the agent's configuration on startup, do we want to mention this behavior change/fix in the CHANGELOG? Or just keep it as a quiet quality of life improvement?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to add a "REVIEW NOTE" for this. This part and the slight change to the 'agent configured correctly' log.trace are an unrelated change. You got it right, however, that this is a "quality of life improvement". When filtering trace output through ecslog, the before and after are:

before:

[2021-06-24T22:12:17.017Z] TRACE (elastic-apm-node): agent configured correctly: {"pid":99484,"ppid":77218,"arch":"x64","platform":"darwin","node":"v12.22.1","agent":"3.16.0","startTrace":["at Agent.start (/Users/trentm/el/apm-agent-nodejs11/lib/agent.js:190:11)","at Object.<anonymous> (/Users/trentm/el/apm-agent-nodejs11/esfoo.js:3:27)","at Module._compile (internal/modules/cjs/loader.js:999:30)","at Object.Module._extensions..js (internal/modules/cjs/loader.js:1027:10)","at Module.load (internal/modules/cjs/loader.js:863:32)","at Function.Module._load (internal/modules/cjs/loader.js:708:14)","at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:60:12)","at internal/main/run_main_module.js:17:47"],"main":"index.js","dependencies":{"@elastic/ecs-pino-format":"^1.1.2","after-all-results":"^2.0.0","async-cache":"^1.1.0","async-value-promise":"^1.1.1","basic-auth":"^2.0.1","cookie":"^0.4.0","core-util-is":"^1.0.2","elastic-apm-http-client":"^9.8.1","end-of-stream":"^1.4.4","error-callsites":"^2.0.4","error-stack-parser":"^2.0.6","escape-string-regexp":"^4.0.0","fast-safe-stringify":"^2.0.7","http-headers":"^3.0.2","is-native":"^1.0.1","load-source-map":"^2.0.0","lru-cache":"^6.0.0","measured-reporting":"^1.51.1","monitor-event-loop-delay":"^1.0.0","object-filter-sequence":"^1.0.0","object-identity-map":"^1.0.2","original-url":"^1.2.3","pino":"^6.11.2","read-pkg-up":"^7.0.1","relative-microtime":"^2.0.0","require-in-the-middle":"^5.0.3","semver":"^6.3.0","set-cookie-serde":"^1.0.0","shallow-clone-shim":"^2.0.0","sql-summary":"^1.0.1","traceparent":"^1.0.0","traverse":"^0.6.6","unicode-byte-truncate":"^1.0.0"},"conf":{"ignoreUrlStr":[],"ignoreUrlRegExp":[],"ignoreUserAgentStr":[],"ignoreUserAgentRegExp":[],"transactionIgnoreUrlRegExp":[],"sanitizeFieldNamesRegExp":[{},{},{},{},{},{},{},{},{},{},{},{},{},{}],"ignoreMessageQueuesRegExp":[],"abortedErrorThreshold":25,"active":true,"apiRequestSize":786432,"apiRequestTime":10,"asyncHooks":true,"breakdownMetrics":true,"captureBody":"off","captureErrorLogStackTraces":"messages","captureExceptions":false,"captureHeaders":true,"captureSpanStackTraces":false,"centralConfig":true,"cloudProvider":"none","disableInstrumentations":[],"disableSend":true,"environment":"development","errorMessageMaxLength":2048,"errorOnAbortedRequests":false,"filterHttpHeaders":true,"ignoreMessageQueues":[],"instrument":true,"instrumentIncomingHTTPRequests":true,"logLevel":"trace","logUncaughtExceptions":false,"maxQueueSize":1024,"metricsInterval":0,"metricsLimit":1000,"sanitizeFieldNames":["password","passwd","pwd","secret","*key","*token*","*session*","*credit*","*card*","authorization","set-cookie","pw","pass","connect.sid"],"serverTimeout":30,"sourceLinesErrorAppFrames":5,"sourceLinesErrorLibraryFrames":5,"sourceLinesSpanAppFrames":0,"sourceLinesSpanLibraryFrames":0,"spanFramesMinDuration":0.01,"stackTraceLimit":50,"transactionIgnoreUrls":[],"transactionMaxSpans":500,"transactionSampleRate":1,"useElasticTraceparentHeader":true,"usePathAsTransactionName":false,"verifyServerCert":true,"serviceName":"elastic-apm-node","serviceVersion":"3.16.0","serverHost":"localhost:8200","serverPort":8200}}

after:

[2021-06-24T22:27:38.034Z] TRACE (elastic-apm-node): agent configured correctly
    pid: 99836
    ppid: 77218
    arch: "x64"
    platform: "darwin"
    node: "v12.22.1"
    agent: "3.16.0"
    startTrace: [
        "at Agent.start (/Users/trentm/el/apm-agent-nodejs11/lib/agent.js:190:11)",
        "at Object.<anonymous> (/Users/trentm/el/apm-agent-nodejs11/esfoo.js:3:27)",
        "at Module._compile (internal/modules/cjs/loader.js:999:30)",
        "at Object.Module._extensions..js (internal/modules/cjs/loader.js:1027:10)",
        "at Module.load (internal/modules/cjs/loader.js:863:32)",
        "at Function.Module._load (internal/modules/cjs/loader.js:708:14)",
        "at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:60:12)",
        "at internal/main/run_main_module.js:17:47"
    ]
    main: "index.js"
    dependencies: {
        "@elastic/ecs-pino-format": "^1.1.2",
        "after-all-results": "^2.0.0",
        "async-cache": "^1.1.0",
        "async-value-promise": "^1.1.1",
        "basic-auth": "^2.0.1",
        "cookie": "^0.4.0",
        "core-util-is": "^1.0.2",
        "elastic-apm-http-client": "^9.8.1",
        "end-of-stream": "^1.4.4",
        "error-callsites": "^2.0.4",
        "error-stack-parser": "^2.0.6",
        "escape-string-regexp": "^4.0.0",
        "fast-safe-stringify": "^2.0.7",
        "http-headers": "^3.0.2",
        "is-native": "^1.0.1",
        "load-source-map": "^2.0.0",
        "lru-cache": "^6.0.0",
        "measured-reporting": "^1.51.1",
        "monitor-event-loop-delay": "^1.0.0",
        "object-filter-sequence": "^1.0.0",
        "object-identity-map": "^1.0.2",
        "original-url": "^1.2.3",
        "pino": "^6.11.2",
        "read-pkg-up": "^7.0.1",
        "relative-microtime": "^2.0.0",
        "require-in-the-middle": "^5.0.3",
        "semver": "^6.3.0",
        "set-cookie-serde": "^1.0.0",
        "shallow-clone-shim": "^2.0.0",
        "sql-summary": "^1.0.1",
        "traceparent": "^1.0.0",
        "traverse": "^0.6.6",
        "unicode-byte-truncate": "^1.0.0"
    }
    conf: {
        "ignoreUrlStr": [],
        "ignoreUrlRegExp": [],
        "ignoreUserAgentStr": [],
        "ignoreUserAgentRegExp": [],
        "transactionIgnoreUrlRegExp": [],
        "sanitizeFieldNamesRegExp": [
            "/^password$/i",
            "/^passwd$/i",
            "/^pwd$/i",
            "/^secret$/i",
            "/^.*key$/i",
            "/^.*token.*$/i",
            "/^.*session.*$/i",
            "/^.*credit.*$/i",
            "/^.*card.*$/i",
            "/^authorization$/i",
            "/^set\\x2dcookie$/i",
            "/^pw$/i",
            "/^pass$/i",
            "/^connect\\.sid$/i"
        ],
        "ignoreMessageQueuesRegExp": [],
        "abortedErrorThreshold": 25,
        "active": true,
        "apiRequestSize": 786432,
        "apiRequestTime": 10,
        "asyncHooks": true,
        "breakdownMetrics": true,
        "captureBody": "off",
        "captureErrorLogStackTraces": "messages",
        "captureExceptions": false,
        "captureHeaders": true,
        "captureSpanStackTraces": false,
        "centralConfig": true,
        "cloudProvider": "none",
        "disableInstrumentations": [],
        "disableSend": true,
        "environment": "development",
        "errorMessageMaxLength": 2048,
        "errorOnAbortedRequests": false,
        "filterHttpHeaders": true,
        "ignoreMessageQueues": [],
        "instrument": true,
        "instrumentIncomingHTTPRequests": true,
        "logLevel": "trace",
        "logUncaughtExceptions": false,
        "maxQueueSize": 1024,
        "metricsInterval": 0,
        "metricsLimit": 1000,
        "sanitizeFieldNames": [
            "password",
            "passwd",
            "pwd",
            "secret",
            "*key",
            "*token*",
            "*session*",
            "*credit*",
            "*card*",
            "authorization",
            "set-cookie",
            "pw",
            "pass",
            "connect.sid"
        ],
        "serverTimeout": 30,
        "sourceLinesErrorAppFrames": 5,
        "sourceLinesErrorLibraryFrames": 5,
        "sourceLinesSpanAppFrames": 0,
        "sourceLinesSpanLibraryFrames": 0,
        "spanFramesMinDuration": 0.01,
        "stackTraceLimit": 50,
        "transactionIgnoreUrls": [],
        "transactionMaxSpans": 500,
        "transactionSampleRate": 1,
        "useElasticTraceparentHeader": true,
        "usePathAsTransactionName": false,
        "verifyServerCert": true,
        "serviceName": "elastic-apm-node",
        "serviceVersion": "3.16.0",
        "serverHost": "localhost:8200",
        "serverPort": 8200
    }

do we want to mention this behavior change/fix in the CHANGELOG?

I don't think it is significant enough to merit a changelog entry. I don't feel strongly, however.

@@ -79,7 +79,7 @@
},
"homepage": "https://github.com/elastic/apm-agent-nodejs",
"dependencies": {
"@elastic/ecs-pino-format": "^1.1.2",
"@elastic/ecs-pino-format": "^1.2.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just noting the bumped dependency in cast it wasn't intentional.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was intentional to get this change to support testing log correlation, as commented here:

// `_elasticApm` is an internal/test-only option added in
// @elastic/[email protected] to allow passing in the import agent
// package that isn't `require`able at "elastic-apm-node".
const log = pino(ecsFormat({ _elasticApm: apm }), logStream)

@trentm
Copy link
Member Author

trentm commented Jun 29, 2021

Notes on perf when using disableSend=true

When disableSend=true is set, the Node.js agent is attempting to avoid some work. The following shows some perf data around this. I ran "node pathosapp.js", a node.js Express simple app, doing this for each request:

function sim_80_17_3(req, res, next) {
    var prob = rng()
    if (prob < 0.8) {
      barSync()
      res.send('syncspans2')
    } else if (prob < 0.97) {
      res.send('ok')
    } else {
      next(new Error('err: ' + prob))
    }
  }

i.e. simulating 80% just a transaction, 17% a transaction + 2 spans, 3% errors. I loaded that to 500 req/s (via hey -c 50 -q 10 -z 60s http://localhost:3000/) on my macos laptop, with profiling enabled to get a flamegraph.

with disableSend=false (the default)

  • CPU usage of the node process was ~50%. This value is only useful as a relative comparison to other runs.
 Latency distribution:
  10% in 0.0050 secs
  25% in 0.0098 secs
  50% in 0.0189 secs
  75% in 0.0268 secs
  90% in 0.0319 secs
  95% in 0.0353 secs
  99% in 0.0479 secs

flamegraph (the shown trim_prefix stack frame is not significant):

Screen Shot 2021-06-29 at 10 01 25 AM

with disableSend=true

  • CPU usage: ~30% (a significant drop)

  • Latency percentiles: a significant improvement

Latency distribution:
  10% in 0.0038 secs
  25% in 0.0073 secs
  50% in 0.0128 secs
  75% in 0.0183 secs
  90% in 0.0223 secs
  95% in 0.0257 secs
  99% in 0.0385 secs

Flamegraph:

Screen Shot 2021-06-29 at 10 01 32 AM

This shows a number of Agent-related stacks are gone (it is hard to see this without the interactive flamegraph and some experience with the agent). If future work were done to reduce more overhead here, a starting point would be this stack frame:

startOrResume /Users/trentm/el/apm-agent-nodejs11/node_modules/traceparent/index.js:121:24
Top of Stack: 4.6% (621 of 13528 samples)
On Stack: 5.3% (720 of 13528 samples)

Conclusion: Agent overhead is significantly reduced when using disableSend, which is a good thing for those using the disableSend use case.

@trentm
Copy link
Member Author

trentm commented Jun 29, 2021

The failed "Integration Tests" is a common flake I've seen of late. I'm not stopping this PR for it.

@trentm trentm merged commit 28e77af into master Jun 29, 2021
@trentm trentm deleted the trentm/disable_send branch June 29, 2021 17:09
dgieselaar pushed a commit to dgieselaar/apm-agent-nodejs that referenced this pull request Sep 10, 2021
…with APM server (elastic#2127)

Env var is ELASTIC_APM_DISABLE_SEND. Default is false. This feature is
useful to use the APM agent for async context tracking, distributed
tracing context propagation, and/or log enrichment with trace.id. When
disableSend=true, attempts are made to limit unnecessary processing
(like calculating metrics, encoding stacktraces, etc.).

This also includes an improvement to the `agent configured correctly`
log.trace that dumps the agent config.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-nodejs Make available for APM Agents project planning.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add support for explicitly disabling sending to APM server
3 participants