-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: sanitize field names #1898
Conversation
593dddc
to
391fd75
Compare
@@ -339,3 +339,8 @@ finalhandler: | |||
memcached: | |||
versions: '>=2.2.0' | |||
commands: node test/instrumentation/modules/memcached.js | |||
|
|||
body-parser: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Express has an external body parser middleware module for capturing application/x-www-form-urlencoded
form bodies, so we run its tests through TAV.
Restify and HAPI do not use external middleware for capturing application/x-www-form-urlencoded
form bodies, so there's no module to run through TAV.
Fastify and koa do use external middleware for capturing application/x-www-form-urlencoded
bodies, but the agent doesn't currently capture their request bodies, so we don't test those module (yet!)
@@ -66,6 +66,9 @@ var DEFAULTS = { | |||
logUncaughtExceptions: false, // TODO: Change to `true` in the v4.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything in this file is just configuration handling (including converting the wildcards into regular expressions)
@@ -20,6 +20,11 @@ function httpHeaders (obj) { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We changed the handling of undefined in this module to be consistent across our all redacted header fields. The was prompted by the fact that
- We filter headers before they're redacted
- We filter headers by setting their value to undefined
- This meant an
authorization
header we had marked for removal via setting it to undefined would set the REDACTED before the agent had a chance to serialize it as JSON
lib/filters/sanitize-field-names.js
Outdated
* Express provides multiple body parser middlewares with x-www-form-urlencoded | ||
* handling. See http://expressjs.com/en/resources/middleware/body-parser.html | ||
*/ | ||
function removeKeysFromPostedFormVariables (body, requestHeaders, regexes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is probably more complicated than it needs to be right now. I originally thought we would need to be able to handle the parsed request body in all its possible formats. The way feature development ending up going the agent had already normalized these values by the time we wanted to handle them. This function could probably be simplified to remove the raw/buffer handling, but I decided to leave this code in place as it may be useful when/if we handle the request body parsing for fastify, hapi, and koa. I was 52%/48% on this -- not a strongly held decision.
package.json
Outdated
@@ -117,8 +117,10 @@ | |||
"@babel/cli": "^7.8.4", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The body parers are here for the test suites. Even though we discovered that out koa
and fastify
instrumentation does not capture the body correly (i.e doesn't need to be tested), the test harness I came up with becomes awkward if the middleware isn't here. The was a waffling decision -- happy to reconsider it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not done reviewing yet, but I wanted to get this in to discuss my main Q: whether the spec requires removing fields rather than just censoring field values.
9cf58e1
to
3c29255
Compare
2db30df
to
3b9f506
Compare
PR feedback's addressed, look for the big old DONE stamp unless there's more work to be done here -- putting out for re-review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the one nit, and the follow up from the Buffer change we discussed and this all LGTM.
Oh, and the test failure to figure out.
Adjustments made (removed handling of and related tests) for new handling of raw body parsering |
Update
Subtitles abound here.
The end goal of all this is to get version of this feature that is suitable for a minor version bump while setting us up for 100% spec and agent compliance in our next major version.
Our current spec discussion say that the
SANITIZE_FIELD_NAMES
should fully READACT the value of a header, post field, or cookie field if they're listed in theSANITIZE_FIELD_NAMES
array. These might be captured by agents in the following fields (see also the intake API/Schema)transaction.context.request.cookies
transaction.context.request.headers
transaction.context.request.body
(if form/url encoded)transaction.context.response.headers
The node agent currently captures
transaction.context.request.headers
,transaction.context.request.body
, andtransaction.context.response.headers
, but does not capturetransaction.context.request.cookies
.Also worth noting -- the Java agent also has a behavior where it blanks out the values of
transaction.context.request.headers.cookies
when it parses values intotransaction.context.request.cookies
.The Node.js agent also has a preexisting
http-filter
module that works slightly differently from the behavior ofSANITIZE_FIELD_NAMES
. Prior to this PR thehttp-filter
module was hard-coded to redact theauthorization
HTTP header. The filter also looked for bothset-cookie
andcookie
HTTP headers, and would parse them into name/value pairs in order to redact individual cookie fields. Once redacted, the name/value pairs would be re-serialized as a cookie string and captured in the.headers
properties. Which individual cookie names got redacted was/is controlled by a list of fields inredact-secrets
.The
redact-secrets
module also looks for values to redact. Specifically, it looks for things that match a credit card regular expression.There's a few important subtleties here.
The new
SANITIZE_FIELD_NAMES
feature, when combined with the default configuration values, indicates that we should completely redact theset-cookie
header.The feature, as spec-ed, does not speak to parsing the values in the HTTP headers and redacting them. It only speaks to parsing values collected in
transaction.context.request.cookies
The Node.js agent does not appear to have used
redact-secrets
to control which HTTP headers or post-body fields/values are redacted. Theredact-secrets
list only controls which "cookie field names encoded in the headers" were redacted.This PR
In order to avoid a major version bump this PR
Removes the hard coded
authorization
redacting, and instead relies onauthorization
being one of the default values ofSANITIZE_FIELD_NAMES
Maintains the "cookie in header" parsing via
redact-secrets
, but also includes theSANITIZE_FIELD_NAMES
wildcard patterns as part ofredact-secrets
If either
cookie
orset-cookie
is listed in theSANITIZE_FIELD_NAMES
configuration array, per the spec this value will be completely redacted.The default value of
SANITIZE_FIELD_NAMES
will include the extra configuration values['pw','pass','connect.sid']
(which we have currently proposed in the spec asMAY
include values)The preserves as much of the previous Node.js agent behavior as possible in order to prevent a major version bump. The one "controversial" behavior I see here is with the new feature behavior,
set-cookie
will be completely redacted by default -- whereas in the previous versions of the agent only specific fields would be redacted. I'm choosing to interpret this as a new enhanced behavior vs. a breaking change.Future Next Major Version PR
In order to be fully spec compliant I also propose that we revisit this when 4.0 rolls around. At this time we'll
['pw','pass','connect.sid']
from theSANITIZE_FIELD_NAMES
defaultstransaction.context.request.cookies
and blanking outtransaction.context.request.headers.cookies
transaction.context.request.cookies
per specset-cookie
http headerOriginal Description
This PR implements the Data sanitization spec.
This PR
sanitizeFeatureName
valueapplication/x-www-form-urlencoded
based onsanitizeFeatureName
Fields are "removed" from the payload object by setting their values to
undefined
in the javascript payload object. This will result in them being skipped in their JSON serializationand allows us to avoid any perf. issues with
delete
.During testing we discovered that the agent does not capture
application/x-www-form-urlencoded
request body payloads for fastify, hapi, and koa. This is why those tests skip those fixtures.Tests:
Additional packages (middleware body parsers for web frameworks) were added to our dev dependencies to allow integration testing of the
application/x-www-form-urlencoded
handling.The
test/sanitize-field-names/main.js
file contains unit-ish tests for the new functions.The rest of the tests are integration-ish tests where we make HTTP POSTs with each of our supported frameworks and different configured values to determine if the right items are removed from the payloads. Test fixtures are shared between all the different framework integration tests. Fixtures contain input values and expected output values, as well as which style middleware should be used for parsing.
To Do
Checklist