Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs/agents: add sampling spec #307

Merged
merged 10 commits into from
Aug 24, 2020

Conversation

axw
Copy link
Member

@axw axw commented Aug 4, 2020

Create a more detailed spec for sampling, including the new requirement to capture the sampling rate in transactions and spans.

Supersedes #270

Agent Milestone Link to agent implementation issue
.NET 7.10 elastic/apm-agent-dotnet#906
Go 7.10 elastic/apm-agent-go#787
Java 7.10 elastic/apm-agent-java#1293
Node.js ? elastic/apm-agent-nodejs#1797
PHP ? elastic/apm-agent-php#88
Python 7.10 elastic/apm-agent-python#888
Ruby 7.10 elastic/apm-agent-ruby#840 (reference impl)
RUM 7.10 elastic/apm-agent-rum-js#845

@axw axw requested a review from felixbarny August 4, 2020 09:26
@felixbarny felixbarny added this to the 7.10 milestone Aug 4, 2020
docs/agents/sampling.md Outdated Show resolved Hide resolved
docs/agents/sampling.md Outdated Show resolved Hide resolved
docs/agents/agent-development.md Outdated Show resolved Hide resolved
docs/agents/sampling.md Outdated Show resolved Hide resolved
felixbarny and others added 3 commits August 5, 2020 11:54
* Add section about tracestate

* Update docs/agents/sampling.md

* Update docs/agents/distributed-tracing.md

Co-authored-by: Andrew Wilkins <[email protected]>
@axw axw force-pushed the agent-spec-sampling-weight branch from af072f0 to a3e3eaf Compare August 5, 2020 05:06
docs/agents/sampling.md Outdated Show resolved Hide resolved
docs/agents/distributed-tracing.md Outdated Show resolved Hide resolved
docs/agents/sampling.md Outdated Show resolved Hide resolved
docs/agents/sampling.md Outdated Show resolved Hide resolved
Copy link
Contributor

@hmdhk hmdhk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @axw , regarding the propagation of the sampling rate via the tracestate header, I think it adds over head for the users since the tracestate needs to be configured for CORS. Furthermore, it should be
considered a breaking change for the RUM agent since existing setups will break once we add this header to requests.

docs/agents/distributed-tracing.md Outdated Show resolved Hide resolved
- rename "elastic" to "es"
- add note about validating tracestate
@axw
Copy link
Member Author

axw commented Aug 5, 2020

Thanks @axw , regarding the propagation of the sampling rate via the tracestate header, I think it adds over head for the users since the tracestate needs to be configured for CORS. Furthermore, it should be
considered a breaking change for the RUM agent since existing setups will break once we add this header to requests.

@jahtalab I don't see any alternative other than sending another header. I suppose it'll have to be opt-in for RUM. That means any traces originating from RUM will not have service map edge metrics unless the feature is enabled.

@axw axw marked this pull request as ready for review August 11, 2020 02:46
@axw axw requested review from a team as code owners August 11, 2020 02:46
@axw axw requested a review from felixbarny August 11, 2020 02:46
docs/agents/sampling.md Outdated Show resolved Hide resolved
docs/agents/sampling.md Outdated Show resolved Hide resolved
Copy link

@nehaduggal nehaduggal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

specs/agents/tracing-sampling.md Show resolved Hide resolved
@felixbarny
Copy link
Member

We've got all the required approvals now. This is scheduled to be merged next Monday unless there are objections.

The sampling rate will be used by the server for scaling transaction and span metrics.

Transaction metrics will be used by the UI to display transaction distributions and throughput,
from the perspective of the transaction's service (grouped by `service.name` and `transaction.name`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@axw Probably no need to add this to the agents spec, more out of curiosity, what other dimensions are you planning to add? outcome? result? What's the cardinality limit for transaction.name? Are you planning to also track metrics without the transaction.name dimension to speed up queries that go across all transaction names? Could also be a special-case name like transaction.name: "_all".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the current plans regarding the detection of dynamic parts in the URL?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How important is it that all agents report ${request.method} unknown route instead of ${request.method} ${request.path} for the metrics collection to work properly? Is the metrics collection also scheduled for 7.10?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@axw Probably no need to add this to the agents spec, more out of curiosity, what other dimensions are you planning to add? outcome? result?

result is already in there, and outcome will be added once we've finalised the spec. The full list is at https://github.com/elastic/apm-server/blob/5cb4101d705effdf2f54e2e45847ee92033e806e/x-pack/apm-server/aggregation/txmetrics/aggregator.go#L414

What's the cardinality limit for transaction.name?

The server has a configurable limit for the number of transaction metric buckets, which resets after every reporting interval. It's currently defaulting to 1000. Once that limit is reached, the server will start recording single-value metrics for new buckets.

Are you planning to also track metrics without the transaction.name dimension to speed up queries that go across all transaction names? Could also be a special-case name like transaction.name: "_all".

Not currently planning to do this in the server, but the UI might create transforms based off the transaction group metrics: elastic/kibana#74498

What are the current plans regarding the detection of dynamic parts in the URL?

None that I'm aware of. RUM recently introduced a heuristic-based approach: elastic/apm-agent-rum-js#827

How important is it that all agents report ${request.method} unknown route instead of ${request.method} ${request.path} for the metrics collection to work properly? Is the metrics collection also scheduled for 7.10?

Very important. If transaction names have such high cardinality, then distributions become less meaningful. I believe we're intending to preview the Metrics-based UI in 7.10.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some remarks to the Kibana issue: elastic/kibana#74498 (comment)

Very important. If transaction names have such high cardinality, then distributions become less meaningful. I believe we're intending to preview the Metrics-based UI in 7.10.

That's good info, I'll work with the agent teams to ensure we're aligned.

@apmmachine
Copy link

apmmachine commented Aug 19, 2020

💔 Build Failed

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Started by timer]

  • Start Time: 2020-08-24T05:44:00.669+0000

  • Duration: 3 min 36 sec

Steps errors

Expand to view the steps failures

  • Name: Shell Script
    • Description: [2020-08-24T05:46:36.239Z] + git diff --name-only fb7c4170efbbcee84950b3cdacb0b327a8eacdb9...78d6ef8

    • Duration: 0 min 0 sec

    • Start Time: 2020-08-24T05:46:35.948+0000

    • log

Log output

Expand to view the last 100 lines of log output

[2020-08-24T05:44:23.115Z] All nodes of label ‘linux&&immutable’ are offline
[2020-08-24T05:45:59.697Z] Running on apm-ci-immutable-ubuntu-1804-1598247852002366181 in /var/lib/jenkins/workspace/ared_apm-update-specs-mbp_PR-307
[2020-08-24T05:45:59.806Z] �[39;49m[INFO] Override default checkout�[0m
[2020-08-24T05:45:59.849Z] Sleeping for 10 sec
[2020-08-24T05:46:12.681Z] using credential f6c7695a-671e-4f4f-a331-acdce44ff9ba
[2020-08-24T05:46:12.705Z] Wiping out workspace first.
[2020-08-24T05:46:12.746Z] Cloning the remote Git repository
[2020-08-24T05:46:12.746Z] Using shallow clone with depth 4
[2020-08-24T05:46:12.746Z] Avoid fetching tags
[2020-08-24T05:46:12.777Z] Cloning repository [email protected]:elastic/apm.git
[2020-08-24T05:46:12.828Z]  > git init /var/lib/jenkins/workspace/ared_apm-update-specs-mbp_PR-307 # timeout=10
[2020-08-24T05:46:12.893Z] Fetching upstream changes from [email protected]:elastic/apm.git
[2020-08-24T05:46:12.894Z]  > git --version # timeout=10
[2020-08-24T05:46:12.902Z]  > git --version # 'git version 2.17.1'
[2020-08-24T05:46:12.903Z] using GIT_SSH to set credentials GitHub user @elasticmachine SSH key
[2020-08-24T05:46:12.933Z]  > git fetch --no-tags --progress -- [email protected]:elastic/apm.git +refs/heads/*:refs/remotes/origin/* # timeout=15
[2020-08-24T05:46:13.697Z] Cleaning workspace
[2020-08-24T05:46:13.715Z] Using shallow fetch with depth 4
[2020-08-24T05:46:13.715Z] Pruning obsolete local branches
[2020-08-24T05:46:14.344Z] Merging remotes/origin/master commit 0c78d54ebc748c6593c475fb97f1e2adec0977db into PR head commit 001e8b9389e41e4686a5f6030f28a90a7d41e80c
[2020-08-24T05:46:14.444Z] Merge succeeded, producing 78d6ef8af34954544546c95f72a814af9ecf6c98
[2020-08-24T05:46:14.444Z] Checking out Revision 78d6ef8af34954544546c95f72a814af9ecf6c98 (PR-307)
[2020-08-24T05:46:13.675Z]  > git config remote.origin.url [email protected]:elastic/apm.git # timeout=10
[2020-08-24T05:46:13.681Z]  > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
[2020-08-24T05:46:13.691Z]  > git config remote.origin.url [email protected]:elastic/apm.git # timeout=10
[2020-08-24T05:46:13.700Z]  > git rev-parse --verify HEAD # timeout=10
[2020-08-24T05:46:13.706Z] No valid HEAD. Skipping the resetting
[2020-08-24T05:46:13.706Z]  > git clean -fdx # timeout=10
[2020-08-24T05:46:13.723Z] Fetching upstream changes from [email protected]:elastic/apm.git
[2020-08-24T05:46:13.723Z] using GIT_SSH to set credentials GitHub user @elasticmachine SSH key
[2020-08-24T05:46:13.729Z]  > git fetch --no-tags --progress --prune -- [email protected]:elastic/apm.git +refs/pull/307/head:refs/remotes/origin/PR-307 +refs/heads/master:refs/remotes/origin/master # timeout=15
[2020-08-24T05:46:14.354Z]  > git config core.sparsecheckout # timeout=10
[2020-08-24T05:46:14.362Z]  > git checkout -f 001e8b9389e41e4686a5f6030f28a90a7d41e80c # timeout=15
[2020-08-24T05:46:14.395Z]  > git remote # timeout=10
[2020-08-24T05:46:14.401Z]  > git config --get remote.origin.url # timeout=10
[2020-08-24T05:46:14.410Z] using GIT_SSH to set credentials GitHub user @elasticmachine SSH key
[2020-08-24T05:46:14.414Z]  > git merge 0c78d54ebc748c6593c475fb97f1e2adec0977db # timeout=10
[2020-08-24T05:46:14.435Z]  > git rev-parse HEAD^{commit} # timeout=10
[2020-08-24T05:46:14.448Z]  > git config core.sparsecheckout # timeout=10
[2020-08-24T05:46:14.454Z]  > git checkout -f 78d6ef8af34954544546c95f72a814af9ecf6c98 # timeout=15
[2020-08-24T05:46:18.038Z] Commit message: "Merge commit '0c78d54ebc748c6593c475fb97f1e2adec0977db' into HEAD"
[2020-08-24T05:46:18.050Z] First time build. Skipping changelog.
[2020-08-24T05:46:18.050Z] Cleaning workspace
[2020-08-24T05:46:18.043Z]  > git rev-list --no-walk bd3f71598ef3119950854262ea81a7b7793bea3d # timeout=10
[2020-08-24T05:46:18.054Z]  > git rev-parse --verify HEAD # timeout=10
[2020-08-24T05:46:18.057Z] Resetting working tree
[2020-08-24T05:46:18.057Z]  > git reset --hard # timeout=10
[2020-08-24T05:46:18.065Z]  > git clean -fdx # timeout=10
[2020-08-24T05:46:18.649Z] Masking supported pattern matches of $JOB_GCS_BUCKET or $NOTIFY_TO
[2020-08-24T05:46:18.678Z] Timeout set to expire in 3 hr 0 min
[2020-08-24T05:46:18.686Z] The timestamps step is unnecessary when timestamps are enabled for all Pipeline builds.
[2020-08-24T05:46:18.877Z] [INFO] 'shallow' is forced to be disabled when running on PullRequests
[2020-08-24T05:46:18.886Z] Running in /var/lib/jenkins/workspace/ared_apm-update-specs-mbp_PR-307/src/github.com/elastic/apm
[2020-08-24T05:46:18.896Z] [INFO] gitCheckout: Checkout master from [email protected]:elastic/apm.git with credentials f6c7695a-671e-4f4f-a331-acdce44ff9ba
[2020-08-24T05:46:18.911Z] [INFO] Override default checkout
[2020-08-24T05:46:18.934Z] Sleeping for 10 sec
[2020-08-24T05:46:29.069Z] using credential f6c7695a-671e-4f4f-a331-acdce44ff9ba
[2020-08-24T05:46:29.094Z] Cloning the remote Git repository
[2020-08-24T05:46:29.114Z] Cloning repository [email protected]:elastic/apm.git
[2020-08-24T05:46:29.140Z]  > git init /var/lib/jenkins/workspace/ared_apm-update-specs-mbp_PR-307/src/github.com/elastic/apm # timeout=10
[2020-08-24T05:46:29.153Z] Fetching upstream changes from [email protected]:elastic/apm.git
[2020-08-24T05:46:29.154Z]  > git --version # timeout=10
[2020-08-24T05:46:29.166Z]  > git --version # 'git version 2.17.1'
[2020-08-24T05:46:29.166Z] using GIT_SSH to set credentials GitHub user @elasticmachine SSH key
[2020-08-24T05:46:29.172Z]  > git fetch --tags --progress -- [email protected]:elastic/apm.git +refs/heads/*:refs/remotes/origin/* # timeout=10
[2020-08-24T05:46:29.795Z]  > git config remote.origin.url [email protected]:elastic/apm.git # timeout=10
[2020-08-24T05:46:29.799Z]  > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
[2020-08-24T05:46:29.811Z]  > git config remote.origin.url [email protected]:elastic/apm.git # timeout=10
[2020-08-24T05:46:29.820Z] Fetching upstream changes from [email protected]:elastic/apm.git
[2020-08-24T05:46:29.820Z] using GIT_SSH to set credentials GitHub user @elasticmachine SSH key
[2020-08-24T05:46:29.824Z]  > git fetch --tags --progress -- [email protected]:elastic/apm.git +refs/heads/*:refs/remotes/origin/* +refs/pull/*/head:refs/remotes/origin/PR/* # timeout=10
[2020-08-24T05:46:30.674Z] Checking out Revision 0c78d54ebc748c6593c475fb97f1e2adec0977db (origin/master)
[2020-08-24T05:46:30.712Z] Commit message: "Add SQL parsing performance examples (#186)"
[2020-08-24T05:46:30.712Z] First time build. Skipping changelog.
[2020-08-24T05:46:30.666Z]  > git rev-parse origin/master^{commit} # timeout=10
[2020-08-24T05:46:30.679Z]  > git config core.sparsecheckout # timeout=10
[2020-08-24T05:46:30.689Z]  > git checkout -f 0c78d54ebc748c6593c475fb97f1e2adec0977db # timeout=10
[2020-08-24T05:46:31.399Z] Masking supported pattern matches of $GIT_USERNAME or $GIT_PASSWORD
[2020-08-24T05:46:32.048Z] + git fetch https://****:****@github.com/elastic/apm.git +refs/pull/*/head:refs/remotes/origin/pr/*
[2020-08-24T05:46:32.093Z] Archiving artifacts
[2020-08-24T05:46:32.743Z] + git rev-parse HEAD
[2020-08-24T05:46:33.097Z] + git rev-parse HEAD
[2020-08-24T05:46:33.406Z] + git rev-parse origin/pr/307
[2020-08-24T05:46:33.441Z] [INFO] githubEnv: Found Git Build Cause: pr
[2020-08-24T05:46:33.725Z] Masking supported pattern matches of $GITHUB_TOKEN
[2020-08-24T05:46:34.736Z] [INFO] githubPrCheckApproved: Title: docs/agents: add sampling spec - User: axw - Author Association: MEMBER
[2020-08-24T05:46:35.343Z] Stashed 354 file(s)
[2020-08-24T05:46:35.815Z] Running in /var/lib/jenkins/workspace/ared_apm-update-specs-mbp_PR-307/src/github.com/elastic/apm
[2020-08-24T05:46:36.239Z] + git diff --name-only fb7c4170efbbcee84950b3cdacb0b327a8eacdb9...78d6ef8af34954544546c95f72a814af9ecf6c98
[2020-08-24T05:46:36.239Z] fatal: Invalid symmetric difference expression fb7c4170efbbcee84950b3cdacb0b327a8eacdb9...78d6ef8af34954544546c95f72a814af9ecf6c98
[2020-08-24T05:46:36.294Z] Stage "Send Pull Request for BDD specs" skipped due to earlier failure(s)
[2020-08-24T05:46:36.315Z] Stage "Send Pull Request for JSON specs" skipped due to earlier failure(s)
[2020-08-24T05:46:36.500Z] Running on worker-1095690 in /var/lib/jenkins/workspace/ared_apm-update-specs-mbp_PR-307
[2020-08-24T05:46:36.576Z] [INFO] getVaultSecret: Getting secrets
[2020-08-24T05:46:36.640Z] Masking supported pattern matches of $VAULT_ADDR or $VAULT_ROLE_ID or $VAULT_SECRET_ID
[2020-08-24T05:46:38.448Z] + chmod 755 generate-build-data.sh
[2020-08-24T05:46:38.448Z] + ./generate-build-data.sh https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-shared/apm-update-specs-mbp/PR-307/ https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-shared/apm-update-specs-mbp/PR-307/runs/3 FAILURE 156380
[2020-08-24T05:46:38.449Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-shared/apm-update-specs-mbp/PR-307/runs/3/steps/?limit=10000 -o steps-info.json
[2020-08-24T05:46:39.904Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-shared/apm-update-specs-mbp/PR-307/runs/3/tests/?status=FAILED -o tests-errors.json
[2020-08-24T05:46:40.604Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-shared/apm-update-specs-mbp/PR-307/runs/3/log/ -o pipeline-log.txt

@felixbarny felixbarny merged commit f1e5a50 into elastic:master Aug 24, 2020
@felixbarny felixbarny linked an issue Sep 3, 2020 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: destination service metrics
9 participants