Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run automated smoke tests on ESS #8303

Closed
simitt opened this issue Jun 3, 2022 · 17 comments · Fixed by #8600
Closed

Run automated smoke tests on ESS #8303

simitt opened this issue Jun 3, 2022 · 17 comments · Fixed by #8600
Assignees
Milestone

Comments

@simitt
Copy link
Contributor

simitt commented Jun 3, 2022

Problem

We want to own automated tests of the APM Server setup (with and without Elastic Agent) on Elastic ESS.

Solution

The APM Server systemtests should contain an ESS testing package that runs the tests specified under Validation Criteria. Tests are mostly testing the overall setup, that all involved components on ESS work together as expected.

There is no requirement to run the tests on a PR basis, but rather on a daily basis for now.

The tests have to create a deployment on ESS on which a specified test set can then be run.
For the deployment creation we will receive support from the eng productivity team (cc @cachedout) and try to leverage some of the work that has been done for benchmarking.

Validation Criteria

  • Daily Jenkins job is set up to run a defined set of APM Server tests on the ESS infrastructure.

The tests suit runs tests for:

  • standalone APM Server (created in 7.17)
  • managed APM Server (created in 8.0)
  • standalone APM Server migrated to managed APM Server (7.17)

We want to test upgrades for:

  • 7.17.x-1 upgrading to 7.17.x
  • 8.x: the last released version to the new minor version under development

We want to run following test cases:

  • data ingestion to APM Server for backend and RUM agents where events are successfully stored to the expected data streams.
  • fetching index & component templates, ILM policies and index pipelines and verifying that they are set up in the expected version

No-gos

In this first iteration we focus on ESS; ECE is out of scope.
We do not focus on triggering the tests on a PR basis for now.

@simitt simitt added this to the 8.4 milestone Jun 3, 2022
@cachedout
Copy link
Contributor

Hi @simitt

For the deployment creation we will receive support from the eng productivity team (cc @cachedout) and try to leverage some of the work that has been done for benchmarking.

Regarding the ESS provisioning piece, there are a few options available:

  1. You can the ESS Terraform provider directly. This will provide you with maximum control over the deployment and the way that it is configured for testing.
  2. You can use oblt-cli to create a cluster and then tear it back down again. Depending on your specific needs, we may need to create a specific recipe to deploy the cluster according to your needs.

Regarding the provisioning of RUM agents, could you please talk a bit more about what you envision here? Does it specifically need to be RUM data that's ingested or will any APM data be sufficient?

cc: @kuisathaverat for any additional thoughts.

@simitt
Copy link
Contributor Author

simitt commented Jun 15, 2022

Regarding the provisioning of RUM agents, could you please talk a bit more about what you envision here? Does it specifically need to be RUM data that's ingested or will any APM data be sufficient?

Ideally the test cases will handle sending some RUM agent recorded events, to ensure events are also ingested as expected when going through the apm-server internal RUM processors. Sending the concrete events (backend or rum) is part of implementing the test logic though, which the APM Server team will build. I don't expect that this impacts any of the work of the eng.productivity team.

@cachedout and @kuisathaverat , on the APM Server team @marclop is working on this, please reach out to and coordinate with him on the automation part.

@v1v
Copy link
Member

v1v commented Jun 20, 2022

@amannocci , I just assigned this task to you, @marclop, will contact you once the CI automation can be started. Then you can both sync up in the specifics. Let me know if you need any help. Thanks

@simitt
Copy link
Contributor Author

simitt commented Jun 23, 2022

Update: I removed the work on RUM agents from the scope for 8.4 work.

@marclop
Copy link
Contributor

marclop commented Jun 24, 2022

The first linked pr adds an initial set of smoke tests and assertions for versions 7.17 and latest (currently 8.3). Uses the artifacts API which is used to obtain the latest versions.

The 7.17 smoke flow is slightly different than latest:

  • A7.17.latest-1 deployment is created, some data ingested, asserted, then the deployment is upgraded to 7.17.latest and the same assertions are performed again. The assertions check that the indexed documents contain the expected observer.version (which matches the APM Server version).
  • A 8.2.3 deployment is created (Previous minor last version) and the same assertions are performed as above (on data streams this time), then the deployment is upgraded to the latest patch of the next minor (8.3.0).

The detailed flow looks like:

  • Create deployment succeeds
  • Data is sent to APM Server
  • Each of the events sent to APM Server can be found in Elasticsearch.
  • Upgrade to the next version succeeds.
  • Data is sent to the APM Server
  • Each of the events sent to APM Server can be found in Elasticsearch.

We still need to add another flow:

And add more assertions to all flows:

  • Assert index templates (legacy standalone)
  • Assert index pipelines
  • Assert ILM Policies

Also we need to add some documentation on the smoke tests:

  • Update TESTING.md

@simitt The issue states that we're looking to verify that they're created with the expected version, is that the only thing we're looking to assert?

@simitt
Copy link
Contributor Author

simitt commented Jun 24, 2022

@simitt The issue states that we're looking to verify that they're created with the expected version, is that the only thing we're looking to assert?

For smoke tests, that is good enough for now. Over time we could expand the checks to a minimum of fields that are required by the UI, but such checks can get stale and complex rather quickly, so let's not start with that.

If there is time for another set of upgrade related tests, a test from latest 7.17 to 8.latest.latest in standalone mode and then switching to managed mode would also make sense.

@cachedout
Copy link
Contributor

@marclop @simitt Are you ready then to have the new target make smoke {opts} created in Jenkins as its own pipeline or would you prefer that we wait until all the flows and assertions are complete?

@marclop
Copy link
Contributor

marclop commented Jun 24, 2022

@cachedout we are all good to go.

@cachedout
Copy link
Contributor

@amannocci It's your time to shine. :) Please create the pipeline in Jenkins as described above and corresponding to the description in #8458. Please co-ordinate with @v1v if you need any assistance.

@simitt
Copy link
Contributor Author

simitt commented Jun 24, 2022

@marclop @amannocci please go ahead with the pipelines. Very excited about this! We can always add more test cases and iterate on them.

@cachedout
Copy link
Contributor

Update: the PR which should resolve the automation side of this is #8499. We're just working through the review right now.

@amannocci
Copy link
Contributor

Status

  • Smoke tests are merged.
  • The Jenkins job is available here.

@marclop marclop reopened this Jul 12, 2022
@marclop
Copy link
Contributor

marclop commented Jul 12, 2022

The last remaining task which I am working on at the moment is validating that the APM Integration package installs the assets that we expect for the APM Server:

  • Assert component templates (managed)
  • Assert index pipelines (managed)
  • Assert ILM Policies (managed)

@simitt
Copy link
Contributor Author

simitt commented Jul 18, 2022

@amannocci would it also be possible to add a slack integration so that the apm-server channel gets notified when the Jenkins job fails? (I'm not certain how much effort this is, let me know if we should we create an extra ticket for it.)

@cachedout
Copy link
Contributor

@simitt This is a very simple change. I will re-open this and see if we can sneak it in. (If re-opening this specific issue disturbs your project tracking, let me know and we'll put it elsewhere.)

@amannocci This should do the magic: https://github.com/elastic/apm-pipeline-library/tree/main/vars#notifybuildresult

@amannocci
Copy link
Contributor

Status

  • We add notification on slack when a Jenkins job fails.

@cachedout
Copy link
Contributor

This was previously reopened to allow for a small change. Now that the change is merged, I am re-closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants