Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: sanitize field names #1898

Merged
merged 58 commits into from
Dec 22, 2020
Merged

feat: sanitize field names #1898

merged 58 commits into from
Dec 22, 2020

Conversation

astorm
Copy link
Contributor

@astorm astorm commented Nov 30, 2020

Update

Subtitles abound here.

The end goal of all this is to get version of this feature that is suitable for a minor version bump while setting us up for 100% spec and agent compliance in our next major version.

Our current spec discussion say that the SANITIZE_FIELD_NAMES should fully READACT the value of a header, post field, or cookie field if they're listed in the SANITIZE_FIELD_NAMES array. These might be captured by agents in the following fields (see also the intake API/Schema)

  • transaction.context.request.cookies
  • transaction.context.request.headers
  • transaction.context.request.body (if form/url encoded)
  • transaction.context.response.headers

The node agent currently captures transaction.context.request.headers, transaction.context.request.body, and transaction.context.response.headers, but does not capture transaction.context.request.cookies.

Also worth noting -- the Java agent also has a behavior where it blanks out the values of transaction.context.request.headers.cookies when it parses values into transaction.context.request.cookies.

The Node.js agent also has a preexisting http-filter module that works slightly differently from the behavior of SANITIZE_FIELD_NAMES. Prior to this PR the http-filter module was hard-coded to redact the authorization HTTP header. The filter also looked for both set-cookie and cookie HTTP headers, and would parse them into name/value pairs in order to redact individual cookie fields. Once redacted, the name/value pairs would be re-serialized as a cookie string and captured in the .headers properties. Which individual cookie names got redacted was/is controlled by a list of fields in redact-secrets.

The redact-secrets module also looks for values to redact. Specifically, it looks for things that match a credit card regular expression.

There's a few important subtleties here.

  1. The new SANITIZE_FIELD_NAMES feature, when combined with the default configuration values, indicates that we should completely redact the set-cookie header.

  2. The feature, as spec-ed, does not speak to parsing the values in the HTTP headers and redacting them. It only speaks to parsing values collected in transaction.context.request.cookies

  3. The Node.js agent does not appear to have used redact-secrets to control which HTTP headers or post-body fields/values are redacted. The redact-secrets list only controls which "cookie field names encoded in the headers" were redacted.

This PR

In order to avoid a major version bump this PR

  1. Removes the hard coded authorization redacting, and instead relies on authorization being one of the default values of SANITIZE_FIELD_NAMES

  2. Maintains the "cookie in header" parsing via redact-secrets, but also includes the SANITIZE_FIELD_NAMES wildcard patterns as part of redact-secrets

  3. If either cookie or set-cookie is listed in the SANITIZE_FIELD_NAMES configuration array, per the spec this value will be completely redacted.

  4. The default value of SANITIZE_FIELD_NAMES will include the extra configuration values ['pw','pass','connect.sid'] (which we have currently proposed in the spec as MAY include values)

The preserves as much of the previous Node.js agent behavior as possible in order to prevent a major version bump. The one "controversial" behavior I see here is with the new feature behavior, set-cookie will be completely redacted by default -- whereas in the previous versions of the agent only specific fields would be redacted. I'm choosing to interpret this as a new enhanced behavior vs. a breaking change.

Future Next Major Version PR

In order to be fully spec compliant I also propose that we revisit this when 4.0 rolls around. At this time we'll

  1. Remove ['pw','pass','connect.sid'] from the SANITIZE_FIELD_NAMES defaults
  2. Conform to the java agent behavior of capturing cookies in transaction.context.request.cookies and blanking out transaction.context.request.headers.cookies
  3. Redact/sanitize transaction.context.request.cookies per spec
  4. Remove the individual redacting of the set-cookie http header

Original Description

This PR implements the Data sanitization spec.

This PR

  1. Adds configuration handling for the new sanitizeFeatureName value
  2. Implements filtering of payload request/response headers and the request body for application/x-www-form-urlencoded based on sanitizeFeatureName

Fields are "removed" from the payload object by setting their values to undefined in the javascript payload object. This will result in them being skipped in their JSON serialization

> JSON.stringify({foo:"bar",zip:undefined})
'{"foo":"bar"}'

and allows us to avoid any perf. issues with delete.

During testing we discovered that the agent does not capture application/x-www-form-urlencoded request body payloads for fastify, hapi, and koa. This is why those tests skip those fixtures.

Tests:

Additional packages (middleware body parsers for web frameworks) were added to our dev dependencies to allow integration testing of the application/x-www-form-urlencoded handling.

The test/sanitize-field-names/main.js file contains unit-ish tests for the new functions.

The rest of the tests are integration-ish tests where we make HTTP POSTs with each of our supported frameworks and different configured values to determine if the right items are removed from the payloads. Test fixtures are shared between all the different framework integration tests. Fixtures contain input values and expected output values, as well as which style middleware should be used for parsing.

To Do

  • Integration tests -- with full agent config loading lifecycle and with various web frameworks handling of req.body
  • Document new configuration value
  • tests hanging in startTransaction

Checklist

  • Implement code
  • Add tests
  • Update TypeScript typings
  • Update documentation
  • Add CHANGELOG.asciidoc entry
  • Commit message follows commit guidelines

@astorm astorm marked this pull request as draft November 30, 2020 05:41
@github-actions github-actions bot added the agent-nodejs Make available for APM Agents project planning. label Nov 30, 2020
@apmmachine
Copy link
Contributor

apmmachine commented Nov 30, 2020

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #1898 updated

  • Start Time: 2020-12-22T19:44:15.622+0000

  • Duration: 15 min 59 sec

Test stats 🧪

Test Results
Failed 0
Passed 16398
Skipped 0
Total 16398

@astorm astorm force-pushed the astorm/sanitize_field_names branch from 593dddc to 391fd75 Compare December 3, 2020 04:24
@@ -339,3 +339,8 @@ finalhandler:
memcached:
versions: '>=2.2.0'
commands: node test/instrumentation/modules/memcached.js

body-parser:
Copy link
Contributor Author

@astorm astorm Dec 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Express has an external body parser middleware module for capturing application/x-www-form-urlencoded form bodies, so we run its tests through TAV.

Restify and HAPI do not use external middleware for capturing application/x-www-form-urlencoded form bodies, so there's no module to run through TAV.

Fastify and koa do use external middleware for capturing application/x-www-form-urlencoded bodies, but the agent doesn't currently capture their request bodies, so we don't test those module (yet!)

docs/configuration.asciidoc Outdated Show resolved Hide resolved
@@ -66,6 +66,9 @@ var DEFAULTS = {
logUncaughtExceptions: false, // TODO: Change to `true` in the v4.0.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything in this file is just configuration handling (including converting the wildcards into regular expressions)

@@ -20,6 +20,11 @@ function httpHeaders (obj) {

Copy link
Contributor Author

@astorm astorm Dec 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We changed the handling of undefined in this module to be consistent across our all redacted header fields. The was prompted by the fact that

  1. We filter headers before they're redacted
  2. We filter headers by setting their value to undefined
  3. This meant an authorization header we had marked for removal via setting it to undefined would set the REDACTED before the agent had a chance to serialize it as JSON

* Express provides multiple body parser middlewares with x-www-form-urlencoded
* handling. See http://expressjs.com/en/resources/middleware/body-parser.html
*/
function removeKeysFromPostedFormVariables (body, requestHeaders, regexes) {
Copy link
Contributor Author

@astorm astorm Dec 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is probably more complicated than it needs to be right now. I originally thought we would need to be able to handle the parsed request body in all its possible formats. The way feature development ending up going the agent had already normalized these values by the time we wanted to handle them. This function could probably be simplified to remove the raw/buffer handling, but I decided to leave this code in place as it may be useful when/if we handle the request body parsing for fastify, hapi, and koa. I was 52%/48% on this -- not a strongly held decision.

lib/parsers.js Outdated Show resolved Hide resolved
package.json Outdated
@@ -117,8 +117,10 @@
"@babel/cli": "^7.8.4",
Copy link
Contributor Author

@astorm astorm Dec 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The body parers are here for the test suites. Even though we discovered that out koa and fastify instrumentation does not capture the body correly (i.e doesn't need to be tested), the test harness I came up with becomes awkward if the middleware isn't here. The was a waffling decision -- happy to reconsider it.

@astorm astorm marked this pull request as ready for review December 4, 2020 17:34
@astorm astorm requested review from trentm and bmorelli25 December 4, 2020 17:34
Copy link
Member

@trentm trentm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not done reviewing yet, but I wanted to get this in to discuss my main Q: whether the spec requires removing fields rather than just censoring field values.

docs/configuration.asciidoc Outdated Show resolved Hide resolved
index.d.ts Outdated Show resolved Hide resolved
docs/configuration.asciidoc Outdated Show resolved Hide resolved
docs/configuration.asciidoc Outdated Show resolved Hide resolved
lib/config.js Outdated Show resolved Hide resolved
lib/config.js Outdated Show resolved Hide resolved
lib/filters/http-headers.js Outdated Show resolved Hide resolved
lib/filters/http-headers.js Show resolved Hide resolved
CHANGELOG.asciidoc Outdated Show resolved Hide resolved
docs/configuration.asciidoc Outdated Show resolved Hide resolved
@astorm
Copy link
Contributor Author

astorm commented Dec 19, 2020

PR feedback's addressed, look for the big old DONE stamp unless there's more work to be done here -- putting out for re-review

@astorm astorm requested a review from trentm December 19, 2020 01:42
Copy link
Member

@trentm trentm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the one nit, and the follow up from the Buffer change we discussed and this all LGTM.
Oh, and the test failure to figure out.

test/sanitize-field-names/express.js Outdated Show resolved Hide resolved
@astorm
Copy link
Contributor Author

astorm commented Dec 22, 2020

Adjustments made (removed handling of and related tests) for new handling of raw body parsering

@astorm astorm requested a review from trentm December 22, 2020 07:11
lib/parsers.js Outdated Show resolved Hide resolved
@astorm astorm merged commit 4cbe2f9 into master Dec 22, 2020
@astorm astorm deleted the astorm/sanitize_field_names branch December 22, 2020 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-nodejs Make available for APM Agents project planning.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants