Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Run Elastic Agent e2e test on PRs #20131

Closed
kuisathaverat opened this issue Jul 22, 2020 · 12 comments
Closed

[CI] Run Elastic Agent e2e test on PRs #20131

kuisathaverat opened this issue Jul 22, 2020 · 12 comments
Labels
Agent automation ci Stalled Team:Automation Label for the Observability productivity team

Comments

@kuisathaverat
Copy link
Contributor

kuisathaverat commented Jul 22, 2020

Feature: Run Elastic Agent e2e test on PRs

  Scenario:  A developer creates a PR
    When the PR has changes for Filebeat, Metricbeat, or the Elastic Agent
    Then a new Docker image for the Elastic Agent is build
    And the e2e tests are launched for the docker image of the PR

  Scenario:  A developer creates a PR
    When the PR has changes for Filebeat, Metricbeat, or the Elastic Agent
    And the PR has the label `skip-ci`
    And the e2e tests are launched for the docker image of the PR

  Scenario: A developer push changes to a PR
    When the PR has changes for Filebeat, Metricbeat, or the Elastic Agent
    Then a new Docker image for the Elastic Agent is build
    And the e2e tests are launched for the docker image of the PR

  Scenario:  A developer push changes to a PR
    When the PR has changes for Filebeat, Metricbeat, or the Elastic Agent
    And the PR has the label `skip-ci`
    And the e2e tests are launched for the docker image of the PR

  Scenario: A developer push changes to a PR
    When the PR has changes for Filebeat, Metricbeat, or the Elastic Agent
    Then a new Docker image for the Elastic Agent is build
    And the e2e tests are launched for the docker image of the PR

  Scenario:  A developer creates a PR
    When a developer put a comment `/test agent` on a PR
    Then a new Docker image for the Elastic Agent is build
    And the e2e tests are launched for the docker image of the PR

  Scenario:  A developer creates a PR
    When a developer trigger manually a job in Jenkins with the param `agent_e2e` enabled
    Then a new Docker image for the Elastic Agent is build
    And the e2e tests are launched for the docker image of the PR

cc @elastic/observablt-robots

@mdelapenya
Copy link
Contributor

I'd rephrase one step to:

    And the e2e tests are launched for the docker image of the PR

Apart from that, super neat!

@EricDavisX
Copy link
Contributor

@kuisathaverat hi - checking in, let us know any progress over the last week. Cheers.

@EricDavisX
Copy link
Contributor

I'm transferring some sentiment and comments from a meta ticket to this implementation / discussion piece. I have some concerns / questions and want advice and help to finish this off as soon as is feasible. @mdelapenya @kuisathaverat @ph

My concerns currently, as briefly as possible:

Summary: we have 2 methods for testing Agent in the e2e tests, stand-alone (with the Agent Docker container) and fleet-mode, with a standard centos (or other distro) container where we deploy the Agent binary to a service from .tar.gz / .deb / .rpm.

The test approach discussed here uses the Agent Docker container, and as such would only currently cover a portion of the written tests (stand-alone mode versus all other fleet and endpoint tests). Further, while we can certainly and will enhance stand-alone tests to include more, they cannot include certain key feature tests that require a Centos & package install method to complete them, as the usage is inherently different when using a container that has the Agent process as its main thread.

While we are nearly done with the work to use the Docker container here, I submit its highly valuable to consider switching gears and research if we can work to build the Agent binary with mage, etc, from source, as part of the test (if it isn't already) and then consume it in the bare Centos and Debian images. Without this change of direction, the PR CI won't be able to test any of the needed enrollment token and enrollment tests, or the stop / start of Agent or the container (since stopping a container is very different concept). They are key areas that continue to regress and need coverage!

Note, the support for the 'latest green build' of a Beats PR to be available as an Agent Docker container is here: #20323

Even if we use this, I think we'll need a 2-stage approach to running the e2e-testing framework, because the 'latest' docker container won't be available until the current tests pass and the build is green, right? And one of the

I know Jenkins a little, but I'm no expert, looking for help here. :) Does this all make sense or am I misunderstanding something? I will post to slack to encourage faster discourse. I can schedule a meeting happily too. Thanks all.

@EricDavisX
Copy link
Contributor

Separately, these 2 scenarios from the desc need to be inverted so they 'do not' test the e2e I think...
Scenario: A developer creates a PR
When the PR has changes for Filebeat, Metricbeat, or the Elastic Agent
And the PR has the label skip-ci
And the e2e tests are launched for the docker image of the PR

Scenario: A developer push changes to a PR
When the PR has changes for Filebeat, Metricbeat, or the Elastic Agent
And the PR has the label skip-ci
And the e2e tests are launched for the docker image of the PR

@mdelapenya
Copy link
Contributor

@elastic/observablt-robots, let's discuss about it on our daily sync

@ph
Copy link
Contributor

ph commented Sep 10, 2020

I do have the same concerns as @EricDavisX here, we should ideally tests theses differents artifacts/workflow. (docker, rpm)
Also there are features or behaviors that wont be able to be tested without an os installation mainly the upgrades/downgrade or rollback on failures. The version on docker containers will be fixed.

@EricDavisX
Copy link
Contributor

I discussed with Manu and Ivan and we will want to do a little work to speed up the packaging job, but we think we have all the tools we need and are really really close to getting this done. My prior concerns are alleviated, the 'packaging' job will build the Docker container and .tar.gz, .deb, .deb files as needed - and we can consume them.

Confirming final details:
Ideally, we'd be able to run the test if Libbeat, Filebeat, Metricbeat or any Elastic Agent code changes in the Beats repo.

  • it is ok to start with just the Elastic Agent files changes, it will help get it in faster

We shall use the specific option to use the gcp storage bucket for the artifacts, instead of the main download site

The tests have been a little flakey recently and need some test hardening and from some recent changes, Manu and Eric and working to get them fixed... when do we think the build will all be passing? If we enable it in PR CI and immediately block merges that would be unhelpful. So, Ivan is going to start the changes and stage the PR CI pull-request and we'll wait to merge until Fleet & Agent team can confirm that the build is quite stable and passing.

@michalpristas
Copy link
Contributor

we do include heartbeat in agent package so maybe it would make sense to include this as well

@mdelapenya
Copy link
Contributor

I'm going to sum-up what we discussed today:

We are going to start working on launching the e2e-tests suite alongside the packaging job, which means, whenever a developer comments in the PR /package: we will add a step to trigger the tests right after the binaries and docker images for the PR are generated.

We will add certain logic to control which parts of Beats we want to package and test, based on the modified files. I.e., if any of metricbeat, filebeat, elastic-agent, libbeat or heartbeat (thanks Michal) is modified, the packaging job will build and tests only those artifacts, resulting on a faster feedback.

Does this sound good?

@mdelapenya
Copy link
Contributor

mdelapenya commented Sep 15, 2020

Hey @michalpristas, we are not checking that the heartbeat process is started/stopped in the host for the e2e tests, as we do for filebeat and metricbeat. Is this something to take care of? If so, how is this process started: does it need an integration? does it happen by default?

@ph
Copy link
Contributor

ph commented Sep 15, 2020

@mdelapenya Adding heartbeat is not necessary for now, I would wait until we officially support it.

@botelastic
Copy link

botelastic bot commented Aug 16, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@botelastic botelastic bot added the Stalled label Aug 16, 2021
@botelastic botelastic bot closed this as completed Sep 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Agent automation ci Stalled Team:Automation Label for the Observability productivity team
Projects
None yet
Development

No branches or pull requests

5 participants