Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sampling: add missing license check #4328

Merged
merged 5 commits into from
Oct 27, 2020
Merged

sampling: add missing license check #4328

merged 5 commits into from
Oct 27, 2020

Conversation

axw
Copy link
Member

@axw axw commented Oct 19, 2020

Motivation/summary

Tail-based sampling will be a licensed feature, add code to enforce it.

We also introduce support for running ephemeral Elasticsearch containers using testcontainers-go. This is used for running a Basic-licensed Elasticsearch without modifying the docker-compose services, which is destructive and may interfere with other tests or general docker-compose usage.

Checklist

I have considered changes for:
- [ ] documentation

How to test these changes

  1. Run Elasticsearch with a trial license
  2. Enable tail-based sampling -- it should work
  3. Update Elasticsearch with a Basic license (https://www.elastic.co/guide/en/elasticsearch/reference/current/start-basic.html)
  4. The server should stop working, and start logging licensing errors

Related issues

#4185

@axw axw added the v7.11.0 label Oct 19, 2020
@apmmachine
Copy link
Contributor

apmmachine commented Oct 19, 2020

💔 Build Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #4328 updated]

  • Start Time: 2020-10-27T03:33:13.743+0000

  • Duration: 45 min 38 sec

Test stats 🧪

Test Results
Failed 0
Passed 4370
Skipped 145
Total 4515

Steps errors 3

Expand to view the steps failures

  • Name: Compress

    • Description: tar --exclude=coverage-files.tgz -czf coverage-files.tgz coverage

    • Duration: 0 min 0 sec

    • Start Time: 2020-10-27T03:49:53.242+0000

    • log

  • Name: Run Linux tests

    • Description: ./script/jenkins/linux-test.sh

    • Duration: 19 min 25 sec

    • Start Time: 2020-10-27T03:42:54.119+0000

    • log

  • Name: Compress

    • Description: tar --exclude=system-tests-linux-files.tgz -czf system-tests-linux-files.tgz system-tests

    • Duration: 0 min 0 sec

    • Start Time: 2020-10-27T04:02:22.352+0000

    • log

Log output

Expand to view the last 100 lines of log output

[2020-10-27T04:02:19.341Z] === RUN   TestAPIKeyInvalidateName
[2020-10-27T04:02:19.341Z] --- PASS: TestAPIKeyInvalidateName (3.05s)
[2020-10-27T04:02:19.341Z] === RUN   TestAPIKeyInvalidateID
[2020-10-27T04:02:19.341Z] --- PASS: TestAPIKeyInvalidateID (2.01s)
[2020-10-27T04:02:19.341Z] === RUN   TestAPMServerEnvironment
[2020-10-27T04:02:19.341Z] === RUN   TestAPMServerEnvironment/container
[2020-10-27T04:02:19.341Z] === PAUSE TestAPMServerEnvironment/container
[2020-10-27T04:02:19.341Z] === RUN   TestAPMServerEnvironment/systemd
[2020-10-27T04:02:19.341Z] === PAUSE TestAPMServerEnvironment/systemd
[2020-10-27T04:02:19.341Z] === RUN   TestAPMServerEnvironment/macos_service
[2020-10-27T04:02:19.341Z] === PAUSE TestAPMServerEnvironment/macos_service
[2020-10-27T04:02:19.341Z] === RUN   TestAPMServerEnvironment/windows_service
[2020-10-27T04:02:19.341Z] === PAUSE TestAPMServerEnvironment/windows_service
[2020-10-27T04:02:19.341Z] === CONT  TestAPMServerEnvironment/container
[2020-10-27T04:02:19.341Z] === CONT  TestAPMServerEnvironment/macos_service
[2020-10-27T04:02:19.341Z] === CONT  TestAPMServerEnvironment/systemd
[2020-10-27T04:02:19.341Z] === CONT  TestAPMServerEnvironment/windows_service
[2020-10-27T04:02:19.341Z] --- PASS: TestAPMServerEnvironment (0.00s)
[2020-10-27T04:02:19.341Z]     --- PASS: TestAPMServerEnvironment/macos_service (0.48s)
[2020-10-27T04:02:19.341Z]     --- PASS: TestAPMServerEnvironment/container (0.48s)
[2020-10-27T04:02:19.341Z]     --- PASS: TestAPMServerEnvironment/windows_service (0.48s)
[2020-10-27T04:02:19.341Z]     --- PASS: TestAPMServerEnvironment/systemd (0.52s)
[2020-10-27T04:02:19.341Z] === RUN   TestAPMServerInstrumentation
[2020-10-27T04:02:19.341Z] --- PASS: TestAPMServerInstrumentation (3.39s)
[2020-10-27T04:02:19.341Z] === RUN   TestJaegerGRPC
[2020-10-27T04:02:19.341Z] --- PASS: TestJaegerGRPC (2.97s)
[2020-10-27T04:02:19.341Z] === RUN   TestJaegerGRPCSampling
[2020-10-27T04:02:19.341Z] --- PASS: TestJaegerGRPCSampling (2.44s)
[2020-10-27T04:02:19.341Z] === RUN   TestAPMServerRequestLoggingValid
[2020-10-27T04:02:19.341Z] --- PASS: TestAPMServerRequestLoggingValid (0.27s)
[2020-10-27T04:02:19.341Z] === RUN   TestAPMServerMonitoring
[2020-10-27T04:02:19.341Z] --- PASS: TestAPMServerMonitoring (1.51s)
[2020-10-27T04:02:19.341Z] === RUN   TestAPMServerMonitoringBuiltinUser
[2020-10-27T04:02:19.341Z] --- PASS: TestAPMServerMonitoringBuiltinUser (2.04s)
[2020-10-27T04:02:19.341Z] === RUN   TestAPMServerOnboarding
[2020-10-27T04:02:19.341Z] --- PASS: TestAPMServerOnboarding (2.48s)
[2020-10-27T04:02:19.341Z] === RUN   TestRUMXForwardedFor
[2020-10-27T04:02:19.341Z] --- PASS: TestRUMXForwardedFor (2.39s)
[2020-10-27T04:02:19.341Z] === RUN   TestKeepUnsampled
[2020-10-27T04:02:19.341Z] === RUN   TestKeepUnsampled/false
[2020-10-27T04:02:19.341Z] === RUN   TestKeepUnsampled/true
[2020-10-27T04:02:19.341Z]     sampling_test.go:65: 
[2020-10-27T04:02:19.341Z]         	Error Trace:	sampling_test.go:65
[2020-10-27T04:02:19.342Z]         	Error:      	"[{apm-8.0.0-transaction-000001 0mU3aHUB96GvjGMlwl-W %!s(float64=0.2876821) map[@timestamp:2020-10-27T04:01:23.978Z agent:map[name:go version:0.0.0] ecs:map[version:1.6.0] event:map[ingested:2020-10-27T04:01:26.166263922Z outcome:unknown] host:map[architecture:i386 hostname:beowulf ip:127.0.0.1 name:beowulf os:map[platform:minix]] observer:map[ephemeral_id:159d1750-380b-4ec7-a381-21ccdec73835 hostname:apm-ci-immutable-ubuntu-1804-1603770060961733359 id:46b4a9c0-97f8-4bd9-8422-90ebbd7fd9e5 type:apm-server version:8.0.0 version_major:%!s(float64=8)] process:map[args:[/tmp/go-build739674032/b001/systemtest.test -test.testlogfile=/tmp/go-build739674032/b001/testlog.txt -test.timeout=10m0s -test.v=true] pid:%!s(float64=1) title:systemtest.test] processor:map[event:transaction name:transaction] service:map[language:map[name:go version:2.0] name:systemtest node:map[name:beowulf] runtime:map[name:gc version:2.0]] timestamp:map[us:%!s(float64=1.603771283978361e+15)] trace:map[id:b2bf84d8ff9abe84520f8a2d54d98074] transaction:map[duration:map[us:%!s(float64=0)] id:b2bf84d8ff9abe84 name:sampled sampled:%!s(bool=true) span_count:map[dropped:%!s(float64=0) started:%!s(float64=0)] type:TestKeepUnsampled]] {"process":{"args":["/tmp/go-build739674032/b001/systemtest.test","-test.testlogfile=/tmp/go-build739674032/b001/testlog.txt","-test.timeout=10m0s","-test.v=true"],"pid":1,"title":"systemtest.test"},"agent":{"name":"go","version":"0.0.0"},"processor":{"name":"transaction","event":"transaction"},"observer":{"hostname":"apm-ci-immutable-ubuntu-1804-1603770060961733359","id":"46b4a9c0-97f8-4bd9-8422-90ebbd7fd9e5","ephemeral_id":"159d1750-380b-4ec7-a381-21ccdec73835","type":"apm-server","version":"8.0.0","version_major":8},"trace":{"id":"b2bf84d8ff9abe84520f8a2d54d98074"},"@timestamp":"2020-10-27T04:01:23.978Z","ecs":{"version":"1.6.0"},"service":{"node":{"name":"beowulf"},"name":"systemtest","runtime":{"name":"gc","version":"2.0"},"language":{"name":"go","version":"2.0"}},"host":{"hostname":"beowulf","os":{"platform":"minix"},"ip":"127.0.0.1","name":"beowulf","architecture":"i386"},"event":{"ingested":"2020-10-27T04:01:26.166263922Z","outcome":"unknown"},"transaction":{"duration":{"us":0},"name":"sampled","id":"b2bf84d8ff9abe84","span_count":{"dropped":0,"started":0},"type":"TestKeepUnsampled","sampled":true},"timestamp":{"us":1603771283978361}}}]" should have 2 item(s), but has 1
[2020-10-27T04:02:19.342Z]         	Test:       	TestKeepUnsampled/true
[2020-10-27T04:02:19.342Z]     server.go:168: log file: /var/lib/jenkins/workspace/pm-server_apm-server-mbp_PR-4328/src/github.com/elastic/apm-server/systemtest/logs/TestKeepUnsampled_true/apm-server
[2020-10-27T04:02:19.342Z] --- FAIL: TestKeepUnsampled (5.24s)
[2020-10-27T04:02:19.342Z]     --- PASS: TestKeepUnsampled/false (2.77s)
[2020-10-27T04:02:19.342Z]     --- FAIL: TestKeepUnsampled/true (2.47s)
[2020-10-27T04:02:19.342Z] === RUN   TestTailSampling
[2020-10-27T04:02:19.342Z]     sampling_test.go:130: waiting for 100 "parent" transactions
[2020-10-27T04:02:19.342Z]     sampling_test.go:130: waiting for 100 "child" transactions
[2020-10-27T04:02:19.342Z] --- PASS: TestTailSampling (3.97s)
[2020-10-27T04:02:19.342Z] === RUN   TestTailSamplingUnlicensed
[2020-10-27T04:02:19.342Z] 2020/10/27 04:01:30 Starting container id: 040f655c5bea image: quay.io/testcontainers/ryuk:0.2.3
[2020-10-27T04:02:19.342Z] 2020/10/27 04:01:30 Waiting for container id 040f655c5bea image: quay.io/testcontainers/ryuk:0.2.3
[2020-10-27T04:02:19.342Z] 2020/10/27 04:01:30 Container is ready id: 040f655c5bea image: quay.io/testcontainers/ryuk:0.2.3
[2020-10-27T04:02:19.342Z] 2020/10/27 04:01:30 Starting container id: eb13eb9fc573 image: docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT
[2020-10-27T04:02:19.342Z] 2020/10/27 04:01:31 Waiting for container id eb13eb9fc573 image: docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT
[2020-10-27T04:02:19.342Z] 2020/10/27 04:01:50 Container is ready id: eb13eb9fc573 image: docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT
[2020-10-27T04:02:19.342Z] --- PASS: TestTailSamplingUnlicensed (36.93s)
[2020-10-27T04:02:19.342Z] FAIL
[2020-10-27T04:02:19.342Z] FAIL	github.com/elastic/apm-server/systemtest	85.187s
[2020-10-27T04:02:19.342Z] === RUN   TestAPMServer
[2020-10-27T04:02:19.342Z] 2020/10/27 04:00:41 Building apm-server...
[2020-10-27T04:02:19.342Z] 2020/10/27 04:00:43 Built /var/lib/jenkins/workspace/pm-server_apm-server-mbp_PR-4328/src/github.com/elastic/apm-server/apm-server
[2020-10-27T04:02:19.342Z] --- PASS: TestAPMServer (5.14s)
[2020-10-27T04:02:19.342Z] === RUN   TestUnstartedAPMServer
[2020-10-27T04:02:19.342Z] --- PASS: TestUnstartedAPMServer (0.00s)
[2020-10-27T04:02:19.342Z] === RUN   TestExpvar
[2020-10-27T04:02:19.342Z] --- PASS: TestExpvar (0.42s)
[2020-10-27T04:02:19.342Z] PASS
[2020-10-27T04:02:19.342Z] ok  	github.com/elastic/apm-server/systemtest/apmservertest	5.610s
[2020-10-27T04:02:19.342Z] ?   	github.com/elastic/apm-server/systemtest/estest	[no test files]
[2020-10-27T04:02:19.342Z] FAIL
[2020-10-27T04:02:19.342Z] + cleanup
[2020-10-27T04:02:19.342Z] + rm -rf /tmp/tmp.UojEBYYRvT
[2020-10-27T04:02:19.342Z] + .ci/scripts/docker-get-logs.sh
[2020-10-27T04:02:20.801Z] Post stage
[2020-10-27T04:02:20.820Z] Running in /var/lib/jenkins/workspace/pm-server_apm-server-mbp_PR-4328/src/github.com/elastic/apm-server
[2020-10-27T04:02:20.874Z] Archiving artifacts
[2020-10-27T04:02:21.225Z] Recording test results
[2020-10-27T04:02:21.967Z] [WARN] tar: pathPrefix parameter is deprecated.
[2020-10-27T04:02:22.323Z] + tar --version
[2020-10-27T04:02:22.643Z] + tar --exclude=system-tests-linux-files.tgz -czf system-tests-linux-files.tgz system-tests
[2020-10-27T04:02:22.643Z] tar: system-tests: Cannot stat: No such file or directory
[2020-10-27T04:02:22.643Z] tar: Exiting with failure status due to previous errors
[2020-10-27T04:02:22.658Z] [INFO] system-tests-linux-files.tgz was not compressed or archived : script returned exit code 2
[2020-10-27T04:02:22.785Z] Failed in branch System and Environment Tests
[2020-10-27T04:17:30.719Z] [INFO] For detailed information see: https://apm-ci.elastic.co/job/apm-integration-tests-selector-mbp/job/master/11152/display/redirect
[2020-10-27T04:17:48.639Z] Copied 53 artifacts from "APM Integration Test MBP Selector » master" build number 11152
[2020-10-27T04:17:49.682Z] Post stage
[2020-10-27T04:17:49.691Z] Recording test results
[2020-10-27T04:17:50.741Z] Running on Jenkins in /var/lib/jenkins/workspace/pm-server_apm-server-mbp_PR-4328
[2020-10-27T04:17:50.798Z] [INFO] getVaultSecret: Getting secrets
[2020-10-27T04:17:51.029Z] Masking supported pattern matches of $VAULT_ADDR or $VAULT_ROLE_ID or $VAULT_SECRET_ID
[2020-10-27T04:17:51.753Z] + chmod 755 generate-build-data.sh
[2020-10-27T04:17:51.753Z] + ./generate-build-data.sh https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4328/ https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4328/runs/11 FAILURE 2677737
[2020-10-27T04:17:52.004Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4328/runs/11/steps/?limit=10000 -o steps-info.json
[2020-10-27T04:17:52.554Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4328/runs/11/tests/?status=FAILED -o tests-errors.json

@axw
Copy link
Member Author

axw commented Oct 19, 2020

jenkins run the tests please

@codecov-io
Copy link

codecov-io commented Oct 19, 2020

Codecov Report

Merging #4328 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #4328   +/-   ##
=======================================
  Coverage   79.71%   79.71%           
=======================================
  Files         162      162           
  Lines        9137     9137           
=======================================
  Hits         7284     7284           
  Misses       1853     1853           

@axw axw marked this pull request as ready for review October 21, 2020 02:06
@axw axw requested a review from a team October 21, 2020 02:06
@axw
Copy link
Member Author

axw commented Oct 26, 2020

jenkins run the tests please

1 similar comment
@axw
Copy link
Member Author

axw commented Oct 27, 2020

jenkins run the tests please

@axw
Copy link
Member Author

axw commented Oct 27, 2020

The new test is a bit unreliable, I'm looking into it.

10 seconds wasn't enough, due to connection retry backoffs
in apm-server. Increase to 1 minute.
@axw
Copy link
Member Author

axw commented Oct 27, 2020

The latest test failure is unrelated, and will be addressed by #4353

@axw axw merged commit 1ffa9fa into elastic:master Oct 27, 2020
@axw axw deleted the tbs-platinum branch October 27, 2020 05:38
axw added a commit to axw/apm-server that referenced this pull request Dec 10, 2020
* Require platinum/trial license for tail-sampling
axw added a commit that referenced this pull request Dec 10, 2020
* Require platinum/trial license for tail-sampling
@simitt simitt self-assigned this Dec 23, 2020
@simitt
Copy link
Contributor

simitt commented Dec 23, 2020

Tested with BC 1 with the suggested steps.
The APM Server starts logging license notes as expected

{"log.level":"info","@timestamp":"2020-12-23T16:02:37.422+0100","log.logger":"license","log.origin":{"file.name":"licenser/es_callback.go","file.line":51},"message":"Elasticsearch license: Basic","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2020-12-23T16:02:37.437+0100","log.logger":"license","log.origin":{"file.name":"apm-server/main.go","file.line":102},"message":"Checking license for tail-based sampling","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2020-12-23T16:02:37.437+0100","log.logger":"license","log.origin":{"file.name":"licenser/check.go","file.line":35},"message":"License is active for Platinum","ecs.version":"1.6.0"}

Two observations:

  • When sending intake requests to the server it still responds with 202 until the queue is filled up.
  • Other events such as error or metric events also stop being ingested.

@axw - I assume that's ok but haven't realized this earlier.

@axw
Copy link
Member Author

axw commented Dec 24, 2020

@simitt It's just a backstop, so I think this somewhat awkward behaviour is OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants