Add sanitize_field_names spec #334

felixbarny · 2020-08-26T10:18:39Z

Supersedes #319

Note that the implementation of this spec also includes adding the option to Kibana.

Agent	Milestone	Link to agent implementation issue
.NET	n/a	already implemented according to spec
Go	?	elastic/apm-agent-go#821
Java	n/a	already implemented according to spec
Node.js	7.11	elastic/apm-agent-nodejs#1871
PHP	?	elastic/apm-agent-php#161
Python	7.11	elastic/apm-agent-python#904 (central config missing)
Ruby	7.10	elastic/apm-agent-ruby#857
RUM	n/a

apmmachine · 2020-08-26T10:21:42Z

💔 Build Failed

Expand to view the summary

Build stats

Build Cause: [Started by timer]
Start Time: 2020-09-14T05:06:00.762+0000
Duration: 3 min 31 sec

Steps errors

Expand to view the steps failures

Name: Shell Script
- Description: [2020-09-14T05:08:30.627Z] + git diff --name-only ae241d3759ec5e6b8c82e85f8ffac6b70a59fb53...cc3043d
- Duration: 0 min 0 sec
- Start Time: 2020-09-14T05:08:30.338+0000
- log

Log output

Expand to view the last 100 lines of log output

[2020-09-14T05:06:05.482Z] To remove this log message, move the test code outside of src/. To restore the previous behavior that allowed access to files in src/test/, pass -Dorg.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.INCLUDE_SRC_TEST_IN_LIBRARIES=true to the java command used to start Jenkins.
[2020-09-14T05:06:21.088Z] Still waiting to schedule task
[2020-09-14T05:06:21.088Z] Waiting for next available executor on ‘linux&&immutable’
[2020-09-14T05:07:54.083Z] Running on apm-ci-immutable-ubuntu-1804-1600059969297340462 in /var/lib/jenkins/workspace/ared_apm-update-specs-mbp_PR-334
[2020-09-14T05:07:54.214Z] �[39;49m[INFO] Override default checkout�[0m
[2020-09-14T05:07:54.271Z] Sleeping for 10 sec
[2020-09-14T05:08:06.934Z] using credential f6c7695a-671e-4f4f-a331-acdce44ff9ba
[2020-09-14T05:08:06.957Z] Wiping out workspace first.
[2020-09-14T05:08:06.987Z] Cloning the remote Git repository
[2020-09-14T05:08:06.987Z] Using shallow clone with depth 4
[2020-09-14T05:08:06.987Z] Avoid fetching tags
[2020-09-14T05:08:07.011Z] Cloning repository [email protected]:elastic/apm.git
[2020-09-14T05:08:07.069Z]  > git init /var/lib/jenkins/workspace/ared_apm-update-specs-mbp_PR-334 # timeout=10
[2020-09-14T05:08:07.127Z] Fetching upstream changes from [email protected]:elastic/apm.git
[2020-09-14T05:08:07.127Z]  > git --version # timeout=10
[2020-09-14T05:08:07.134Z]  > git --version # 'git version 2.17.1'
[2020-09-14T05:08:07.135Z] using GIT_SSH to set credentials GitHub user @elasticmachine SSH key
[2020-09-14T05:08:07.163Z]  > git fetch --no-tags --progress -- [email protected]:elastic/apm.git +refs/heads/*:refs/remotes/origin/* # timeout=15
[2020-09-14T05:08:07.927Z] Cleaning workspace
[2020-09-14T05:08:07.954Z] Using shallow fetch with depth 4
[2020-09-14T05:08:07.954Z] Pruning obsolete local branches
[2020-09-14T05:08:08.541Z] Merging remotes/origin/master commit c51d0084047222398a517d7b02814ebb142cfa00 into PR head commit cc3043d1fb2a97c306250b017e3b615c0ca30da6
[2020-09-14T05:08:08.667Z] Merge succeeded, producing 6f1f38cc223f48aaf6a648e06a6958b4344e4374
[2020-09-14T05:08:08.667Z] Checking out Revision 6f1f38cc223f48aaf6a648e06a6958b4344e4374 (PR-334)
[2020-09-14T05:08:07.903Z]  > git config remote.origin.url [email protected]:elastic/apm.git # timeout=10
[2020-09-14T05:08:07.907Z]  > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
[2020-09-14T05:08:07.920Z]  > git config remote.origin.url [email protected]:elastic/apm.git # timeout=10
[2020-09-14T05:08:07.930Z]  > git rev-parse --verify HEAD # timeout=10
[2020-09-14T05:08:07.940Z] No valid HEAD. Skipping the resetting
[2020-09-14T05:08:07.941Z]  > git clean -fdx # timeout=10
[2020-09-14T05:08:07.959Z] Fetching upstream changes from [email protected]:elastic/apm.git
[2020-09-14T05:08:07.960Z] using GIT_SSH to set credentials GitHub user @elasticmachine SSH key
[2020-09-14T05:08:07.966Z]  > git fetch --no-tags --progress --prune -- [email protected]:elastic/apm.git +refs/pull/334/head:refs/remotes/origin/PR-334 +refs/heads/master:refs/remotes/origin/master # timeout=15
[2020-09-14T05:08:08.549Z]  > git config core.sparsecheckout # timeout=10
[2020-09-14T05:08:08.569Z]  > git checkout -f cc3043d1fb2a97c306250b017e3b615c0ca30da6 # timeout=15
[2020-09-14T05:08:08.603Z]  > git remote # timeout=10
[2020-09-14T05:08:08.621Z]  > git config --get remote.origin.url # timeout=10
[2020-09-14T05:08:08.635Z] using GIT_SSH to set credentials GitHub user @elasticmachine SSH key
[2020-09-14T05:08:08.640Z]  > git merge c51d0084047222398a517d7b02814ebb142cfa00 # timeout=10
[2020-09-14T05:08:08.657Z]  > git rev-parse HEAD^{commit} # timeout=10
[2020-09-14T05:08:08.670Z]  > git config core.sparsecheckout # timeout=10
[2020-09-14T05:08:08.681Z]  > git checkout -f 6f1f38cc223f48aaf6a648e06a6958b4344e4374 # timeout=15
[2020-09-14T05:08:12.285Z] Commit message: "Merge commit 'c51d0084047222398a517d7b02814ebb142cfa00' into HEAD"
[2020-09-14T05:08:12.301Z] First time build. Skipping changelog.
[2020-09-14T05:08:12.301Z] Cleaning workspace
[2020-09-14T05:08:12.289Z]  > git rev-list --no-walk 101fe3d82a65476e0b632980f4391061e3d426bd # timeout=10
[2020-09-14T05:08:12.303Z]  > git rev-parse --verify HEAD # timeout=10
[2020-09-14T05:08:12.312Z] Resetting working tree
[2020-09-14T05:08:12.312Z]  > git reset --hard # timeout=10
[2020-09-14T05:08:12.330Z]  > git clean -fdx # timeout=10
[2020-09-14T05:08:12.951Z] Masking supported pattern matches of $JOB_GCS_BUCKET or $NOTIFY_TO
[2020-09-14T05:08:12.984Z] Timeout set to expire in 3 hr 0 min
[2020-09-14T05:08:12.993Z] The timestamps step is unnecessary when timestamps are enabled for all Pipeline builds.
[2020-09-14T05:08:13.191Z] [INFO] 'shallow' is forced to be disabled when running on PullRequests
[2020-09-14T05:08:13.199Z] Running in /var/lib/jenkins/workspace/ared_apm-update-specs-mbp_PR-334/src/github.com/elastic/apm
[2020-09-14T05:08:13.209Z] [INFO] gitCheckout: Checkout master from [email protected]:elastic/apm.git with credentials f6c7695a-671e-4f4f-a331-acdce44ff9ba
[2020-09-14T05:08:13.224Z] [INFO] Override default checkout
[2020-09-14T05:08:13.248Z] Sleeping for 10 sec
[2020-09-14T05:08:23.388Z] using credential f6c7695a-671e-4f4f-a331-acdce44ff9ba
[2020-09-14T05:08:23.415Z] Cloning the remote Git repository
[2020-09-14T05:08:23.432Z] Cloning repository [email protected]:elastic/apm.git
[2020-09-14T05:08:23.462Z]  > git init /var/lib/jenkins/workspace/ared_apm-update-specs-mbp_PR-334/src/github.com/elastic/apm # timeout=10
[2020-09-14T05:08:23.470Z] Fetching upstream changes from [email protected]:elastic/apm.git
[2020-09-14T05:08:23.470Z]  > git --version # timeout=10
[2020-09-14T05:08:23.481Z]  > git --version # 'git version 2.17.1'
[2020-09-14T05:08:23.481Z] using GIT_SSH to set credentials GitHub user @elasticmachine SSH key
[2020-09-14T05:08:23.486Z]  > git fetch --tags --progress -- [email protected]:elastic/apm.git +refs/heads/*:refs/remotes/origin/* # timeout=10
[2020-09-14T05:08:24.123Z]  > git config remote.origin.url [email protected]:elastic/apm.git # timeout=10
[2020-09-14T05:08:24.127Z]  > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
[2020-09-14T05:08:24.135Z]  > git config remote.origin.url [email protected]:elastic/apm.git # timeout=10
[2020-09-14T05:08:24.143Z] Fetching upstream changes from [email protected]:elastic/apm.git
[2020-09-14T05:08:24.143Z] using GIT_SSH to set credentials GitHub user @elasticmachine SSH key
[2020-09-14T05:08:24.152Z]  > git fetch --tags --progress -- [email protected]:elastic/apm.git +refs/heads/*:refs/remotes/origin/* +refs/pull/*/head:refs/remotes/origin/PR/* # timeout=10
[2020-09-14T05:08:25.027Z] Checking out Revision c51d0084047222398a517d7b02814ebb142cfa00 (origin/master)
[2020-09-14T05:08:25.063Z] Commit message: "[CI] compare with the calculated SHA commit (#336)"
[2020-09-14T05:08:25.063Z] First time build. Skipping changelog.
[2020-09-14T05:08:25.022Z]  > git rev-parse origin/master^{commit} # timeout=10
[2020-09-14T05:08:25.030Z]  > git config core.sparsecheckout # timeout=10
[2020-09-14T05:08:25.041Z]  > git checkout -f c51d0084047222398a517d7b02814ebb142cfa00 # timeout=10
[2020-09-14T05:08:25.836Z] Masking supported pattern matches of $GIT_USERNAME or $GIT_PASSWORD
[2020-09-14T05:08:26.412Z] + git fetch https://****:****@github.com/elastic/apm.git +refs/pull/*/head:refs/remotes/origin/pr/*
[2020-09-14T05:08:26.722Z] Archiving artifacts
[2020-09-14T05:08:27.390Z] + git rev-parse HEAD
[2020-09-14T05:08:27.748Z] + git rev-parse HEAD
[2020-09-14T05:08:28.051Z] + git rev-parse origin/pr/334
[2020-09-14T05:08:28.083Z] [INFO] githubEnv: Found Git Build Cause: pr
[2020-09-14T05:08:28.327Z] Masking supported pattern matches of $GITHUB_TOKEN
[2020-09-14T05:08:29.348Z] [INFO] githubPrCheckApproved: Title: Add sanitize_field_names spec - User: felixbarny - Author Association: MEMBER
[2020-09-14T05:08:29.907Z] Stashed 387 file(s)
[2020-09-14T05:08:30.267Z] Running in /var/lib/jenkins/workspace/ared_apm-update-specs-mbp_PR-334/src/github.com/elastic/apm
[2020-09-14T05:08:30.627Z] + git diff --name-only ae241d3759ec5e6b8c82e85f8ffac6b70a59fb53...cc3043d1fb2a97c306250b017e3b615c0ca30da6
[2020-09-14T05:08:30.627Z] fatal: Invalid symmetric difference expression ae241d3759ec5e6b8c82e85f8ffac6b70a59fb53...cc3043d1fb2a97c306250b017e3b615c0ca30da6
[2020-09-14T05:08:30.678Z] Stage "Send Pull Request for BDD specs" skipped due to earlier failure(s)
[2020-09-14T05:08:30.697Z] Stage "Send Pull Request for JSON specs" skipped due to earlier failure(s)
[2020-09-14T05:08:30.903Z] Running on Jenkins in /var/lib/jenkins/workspace/ared_apm-update-specs-mbp_PR-334
[2020-09-14T05:08:30.990Z] [INFO] getVaultSecret: Getting secrets
[2020-09-14T05:08:31.061Z] Masking supported pattern matches of $VAULT_ADDR or $VAULT_ROLE_ID or $VAULT_SECRET_ID
[2020-09-14T05:08:31.632Z] + chmod 755 generate-build-data.sh
[2020-09-14T05:08:31.632Z] + ./generate-build-data.sh https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-shared/apm-update-specs-mbp/PR-334/ https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-shared/apm-update-specs-mbp/PR-334/runs/7 FAILURE 150610
[2020-09-14T05:08:32.183Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-shared/apm-update-specs-mbp/PR-334/runs/7/steps/?limit=10000 -o steps-info.json

basepi

Looks good to me.

basepi · 2020-09-01T14:01:32Z

This will be a breaking change in python (adding globbing, which is implicit in the current implementation), so we'll have to do a major version. But I think it will be worth it to standardize across agents.

felixbarny · 2020-09-01T14:20:26Z

Another option that doesn't require a new major version right now would be to do the change in two steps:

Align the default with the other agents but retain the auto-glob behavior by automatically adding wildcards to the beginning and end of each pattern (unless there already is a wildcard).

Examples:

foo  -> *foo*
foo* -> *foo*

That behavior can be removed in the next major version to fully align.

mikker · 2020-09-02T11:45:06Z

Another thing is that �Python, �Ruby, Node.js agents currently check some values too for strings looking like credit card numbers.

Do we just keep this check as is, separate from the config option, or do we stop checking?

felixbarny · 2020-09-02T12:02:05Z

Oh, that's interesting, I didn't know about these checks.

I think it definitely makes sense to leave them in but they seem to be orthogonal to this config option anyway. Could you create a follow-up spec PR to make sure we do that check across agents?
Regex checks on each value can be expensive but we could do some lightweight pre-checking. For example, only check the regex if the value has a length of at 16-20 and the 4 first characters are a digit.

mikker · 2020-09-02T12:12:27Z

We could also consider removing the check. I'm skeptical that it provides any real world benefit? Compared to checking every value ever and how few may be sending credit card numbers in this way, under another key than those already checked? That's all just speculation, of course.

basepi · 2020-09-02T22:20:10Z

I'd be fine with removing the credit card checks, their value seems dubious to me.

mikker · 2020-09-03T09:52:00Z

So upgrade path is

add current defaults but with *s around each key
keep value checking outside of config option
on next major, remove *s and value checking?

felixbarny · 2020-09-03T11:18:35Z

With add current defaults you mean password, passwd, pwd, secret, *key, *token*, *session*, *credit*, *card*, authorization, set-cookie? Any of the values, including defaults and custom additions, will have * added around them.

This would apply to Python and Ruby. Other agents don't add the additional wildcards.

mikker · 2020-09-03T12:09:17Z

Yes, like this:
https://github.com/elastic/apm-agent-ruby/blob/a670c08df3451f23780b1d0f1356c7faea99e651/lib/elastic_apm/config.rb#L33-L36

With this addendum and the value filter:
https://github.com/elastic/apm-agent-ruby/blob/a670c08df3451f23780b1d0f1356c7faea99e651/lib/elastic_apm/transport/filters/hash_sanitizer.rb#L26-L33

beniwohli · 2020-09-07T09:53:57Z

Why don't we sanitize the query string? Sure, the querystring shouldn't be used for sensitive data, but the interweb isn't really known for following best practices...

I'm asking because in the Python agent we do currently sanitize the query string. Removing that would be a breaking change.

gregkalapos · 2020-09-07T10:05:59Z

Why don't we sanitize the query string? Sure, the querystring shouldn't be used for sensitive data, but the interweb isn't really known for following best practices...

I'm asking because in the Python agent we do currently sanitize the query string. Removing that would be a breaking change.

Both in the .NET and in the Java doc we quote owasp.org:

Data in the query string is considered non-sensitive, as sensitive information should not be sent in the query string. 
See https://www.owasp.org/index.php/Information_exposure_through_query_strings_in_url for more information

I can't give a very specific answer on the Why don't we sanitize.... question other than what we state in the doc, but I was told we definitely don't wanna do this when I implemented the sanitize_field_names setting for .NET.

One quote I remember is (not tagging the person :D ) :

People who put sensitive info there deserve that it's captured

Not sure how big the overhead would be to parse it... I'd guess it's not that bad.

I personally think that not sanitizing query string is fine.

SylvainJuge · 2020-09-08T09:13:05Z

While checking within the agent is definitely expensive with apparent low value for now, there is definitely a huge cost implied by any leak of sensitive data (credit card numbers, SSN, ...), and often implied deleting and/or re-indexing.

However, we could apply those checks/regexes at apm-server level, or even with an ingest pipeline to provide the same benefit to the user, while data would still be technically captured by agents, it won't be stored .

Security-minded people won't do stupid things, thus they won't use query string. However, we can't assume that everyone is one of them. If we could prevent users from shooting in their own foot with a reasonable cost (thus probably not within the agents), that would still be valuable.

vigneshshanmugam

RUM agent will not be introducing a new config option as there is no use-case ATM and the agent does not capture HTTP headers (set-cookie), post body, form encoded data. The only place we do sanitisation is on the URL where we already redact most of the parameters by default.

We will align with the agents on this config once we see any new use cases.

mikker · 2020-09-15T13:18:06Z

Declared good to merge by @graphaelli over Slack 😊

mikker · 2020-09-29T13:24:50Z

@elastic/apm-agent-devs Please update the table above with links to issues if applicable (some already have this behaviour.)

mikker · 2020-09-29T13:29:44Z

If your agent already works this way (or the setting is n/a) please state so in the table.

Add sanitize_field_names spec

cc3043d

felixbarny linked an issue Aug 26, 2020 that may be closed by this pull request

Align sanitize_field_names option + central config #319

Closed

This was referenced Aug 26, 2020

Align sanitize_field_names option + central config #319

Closed

Adding more config options to central config for backend agents #318

Closed

felixbarny linked an issue Aug 26, 2020 that may be closed by this pull request

Adding more config options to central config for backend agents #318

Closed

felixbarny assigned mikker Aug 26, 2020

basepi approved these changes Sep 1, 2020

View reviewed changes

mikker marked this pull request as ready for review September 7, 2020 09:05

mikker requested review from a team as code owners September 7, 2020 09:05

mikker approved these changes Sep 7, 2020

View reviewed changes

gregkalapos approved these changes Sep 7, 2020

View reviewed changes

eyalkoren approved these changes Sep 7, 2020

View reviewed changes

axw approved these changes Sep 8, 2020

View reviewed changes

beniwohli mentioned this pull request Sep 8, 2020

Remove query string sanitization in 6.0 elastic/apm-agent-python#920

Closed

SergeyKleyman approved these changes Sep 10, 2020

View reviewed changes

vigneshshanmugam approved these changes Sep 14, 2020

View reviewed changes

mikker merged commit d66d75f into elastic:master Sep 15, 2020

beniwohli pushed a commit to felixbarny/apm that referenced this pull request Sep 21, 2020

Add sanitize_field_names spec (elastic#334)

a5c6b0a

graphaelli added this to the 7.11 milestone Sep 23, 2020

SergeyKleyman mentioned this pull request Sep 29, 2020

[META 319] Configuration option: SANITIZE_FIELD_NAMES elastic/apm-agent-php#161

Closed

mikker mentioned this pull request Sep 29, 2020

Add sanitize_field_names to list of CentralConfig options #346

Closed

axw mentioned this pull request Sep 30, 2020

[META 319] Make SANITIZE_FIELD_NAMES centrally configurable elastic/apm-agent-go#821

Closed

felixbarny deleted the sanitize_field_names branch October 16, 2020 11:39

trentm mentioned this pull request Nov 4, 2020

[META 319] Align sanitize_field_names option + central config elastic/apm-agent-nodejs#1811

Closed

This was referenced Nov 12, 2020

[META 319] Add sanitize_field_name option to Kibana elastic/apm-agent-dotnet#1034

Closed

[META 319] Add sanitize_field_name option to Kibana elastic/apm-agent-java#1504

Closed

Add/adapt central config options for Agents in Kibana #369

Closed

basepi mentioned this pull request Dec 3, 2020

[META 319] Add sanitize_field_names to remote config elastic/apm-agent-python#904

Closed

3 tasks

astorm mentioned this pull request Dec 6, 2020

Clarifies Behavior of SANITIZE_FIELD_NAMES #378

Merged

1 task

basepi mentioned this pull request Jun 20, 2022

Add APM agent Breaking Changes doc #650

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sanitize_field_names spec #334

Add sanitize_field_names spec #334

felixbarny commented Aug 26, 2020 •

edited by AlexanderWert

Loading

apmmachine commented Aug 26, 2020 •

edited

Loading

Build stats

basepi left a comment

basepi commented Sep 1, 2020

felixbarny commented Sep 1, 2020

mikker commented Sep 2, 2020 •

edited

Loading

felixbarny commented Sep 2, 2020

mikker commented Sep 2, 2020

basepi commented Sep 2, 2020

mikker commented Sep 3, 2020

felixbarny commented Sep 3, 2020

mikker commented Sep 3, 2020

beniwohli commented Sep 7, 2020

gregkalapos commented Sep 7, 2020

SylvainJuge commented Sep 8, 2020

vigneshshanmugam left a comment

mikker commented Sep 15, 2020

mikker commented Sep 29, 2020

mikker commented Sep 29, 2020

Add sanitize_field_names spec #334

Add sanitize_field_names spec #334

Conversation

felixbarny commented Aug 26, 2020 • edited by AlexanderWert Loading

apmmachine commented Aug 26, 2020 • edited Loading

💔 Build Failed

Build stats

Steps errors

Log output

basepi left a comment

Choose a reason for hiding this comment

basepi commented Sep 1, 2020

felixbarny commented Sep 1, 2020

mikker commented Sep 2, 2020 • edited Loading

felixbarny commented Sep 2, 2020

mikker commented Sep 2, 2020

basepi commented Sep 2, 2020

mikker commented Sep 3, 2020

felixbarny commented Sep 3, 2020

mikker commented Sep 3, 2020

beniwohli commented Sep 7, 2020

gregkalapos commented Sep 7, 2020

SylvainJuge commented Sep 8, 2020

vigneshshanmugam left a comment

Choose a reason for hiding this comment

mikker commented Sep 15, 2020

mikker commented Sep 29, 2020

mikker commented Sep 29, 2020

felixbarny commented Aug 26, 2020 •

edited by AlexanderWert

Loading

apmmachine commented Aug 26, 2020 •

edited

Loading

mikker commented Sep 2, 2020 •

edited

Loading