-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarifies Behavior of SANITIZE_FIELD_NAMES #378
Changes from 12 commits
5966ca4
2f33854
f8b2b9f
18e3193
4981138
4ed30f4
218ccb6
a14bd8a
7629a1e
ff04c63
997594f
cf9732f
6a21885
bd32292
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,16 +1,50 @@ | ||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", | ||
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to | ||
be interpreted as described in | ||
[RFC 2119](https://www.ietf.org/rfc/rfc2119.txt). | ||
|
||
## Data sanitization | ||
|
||
### `sanitize_field_names` configuration | ||
|
||
Sometimes it is necessary to sanitize, i.e., remove, | ||
sensitive data sent to Elastic APM. | ||
This config accepts a list of wildcard patterns of field names which should be sanitized. | ||
These apply to HTTP headers (including cookies) and `application/x-www-form-urlencoded` data (POST form fields). | ||
The query string and the captured request body (such as `application/json` data) will not get sanitized. | ||
|
||
This config accepts a list of wildcard patterns of field names which control | ||
how an agent will sanitize data. | ||
|
||
| | | | ||
|----------------|---| | ||
| Type | `List<`[`WildcardMatcher`](../../tests/agents/json-specs/wildcard_matcher_tests.json)`>` | | ||
| Default | `password, passwd, pwd, secret, *key, *token*, *session*, *credit*, *card*, authorization, set-cookie` | | ||
| Dynamic | `true` | | ||
| Central config | `true` | | ||
|
||
#### Configuration | ||
|
||
Agents MUST provide a minimum default configuration of | ||
|
||
[ 'password', 'passwd', 'pwd', 'secret', '*key', '*token*', '*session*', | ||
'*credit*','*card*', 'authorization', 'set-cookie'] | ||
|
||
for the `sanitize_field_names` configuration value. Agent's MAY include the | ||
following extra fields in their default configuration to avoid breaking changes | ||
|
||
['pw','pass','connect.sid'] | ||
|
||
## Sanitizing Values | ||
|
||
If a payload field's name (a header key, a form key) matches a configured | ||
wildcard, that field's _value_ MUST be redacted and the key itself | ||
MUST still be reported in the agent payload. Agents MAY choose the string | ||
they use to replace the value so long as it's consistent and does not reveal | ||
the value it has replaced. The replacement string SHOULD be `[REDACTED]`. | ||
|
||
Fields that MUST be sanitized are the HTTP Request headers, HTTP Response | ||
headers, and form fields in an `application/x-www-form-urlencoded` request | ||
body. No fields (including `set-cookie` headers) are exempt from this. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's also There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Only transaction.context.request.cookies map should be filtered or those There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
For response headers, there's no equivalent to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Should we include it in the spec? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @felixelastic @SergeyKleyman @basepi Thank you all. If I'm reading the above correctly, it sounds like there's three data structures/values to consider here
If I spec-ed this as something like
would that capture what we're after? Or is there more nuance here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @SergeyKleyman Re:
Good question! Thinking that through -- I'm not sure this would achieve what we want. Here's my reasoning: If we added Also, I've been operating under the assumption that the default fields were decided on a while ago and would be a pain to change now after some agent teams have already landed this functionality or are midstream on it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @felixbarny Thank you for sharing those data structures -- I learned something. It looks like the Node.js Agent doesn't capture More pertinent to our discussion though -- What do you think about this as final language for future proofing things?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I didn't know some agents don't send I like that wording! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The main reason for my suggestion is that it might look strange to the users that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If we decide to go with hardcoding
|
||
|
||
The query string and other captured request bodies (such as `application/json`) | ||
SHOULD NOT be sanitized. | ||
|
||
Agents SHOULD NOT sanitize fields based on the _value_ of a particular field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that some agents have different defaults should be reflected in the remote config description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today I learned there's a remote config description :) These extra fields were/are for the Node.js agent specifically -- they're fields that we previously redacted that would be tricky/cumbersome (but not impossible) to change. However, since it sounds like there's downstream dependencies I'm learning towards just removing this and going with the defaults.
@eyalkoren (or anyone) A few follow up questions for you -- is the the remote config description used by the central configuration system?
That is -- with those defaults set in
general_settings.ts
does that mean the central configuration system will send these values by default? The follow on question that is -- do agents typically need to set these default values themselves, or can/should we rely on them being set by central configuration?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is only a string for description of the default.
The central config does not send these values, agents need to set them themselves. After all, the central config is just one more configuration system for agents, which need to have proper defaults even if it is disabled, and you would typically use the same default regardless of the configuration source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@astorm So you leaning towards having the apm-agent-nodejs drop
['pw','pass','connect.sid']
by default? I don't disagree. We'll need to defer doing so until a major ver bump, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @eyalkoren -- the description of this issue has been updated with a TO DO that reflects the need to update the remote config description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trentm I suspect we'll talk off of GitHub about this, but my intent was to spec
['pw','pass','connect.sid']
this way in order to give us some flexibility in keeping these extra keys redacted and not force a major version bump on us immediately.