Skip to content
This repository has been archived by the owner on Sep 17, 2024. It is now read-only.

chore: add resiliency when provisioning the stack #1990

Merged
merged 1 commit into from
Jan 11, 2022

Conversation

mdelapenya
Copy link
Contributor

What does this PR do?

It adds a retry-wih-sleep when executing the installation of the Ansible roles to provision the stack (docker, kubectl...)

Why is it important?

We have seen certain instability accessing the network recently, and we hope to mitigate those issues with the retry strategy.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have run the Unit tests (make unit-test), and they are passing locally
  • I have run the End-2-End tests for the suite I'm working on, and they are passing locally
  • I have noticed new Go dependencies (run make notice in the proper directory)

Author's Checklist

  • @adam-stokes do you foresee any other Ansible call downloading stuff?

@mdelapenya mdelapenya added Team:Automation Label for the Observability productivity team area:ci Anything related to the CI backport-v7.16.0 Automated backport with mergify backport-v7.17.0 Automated backport with mergify backport-v8.0.0 Automated backport with mergify impact:low Long-term priority, unless it's a quick fix. priority:medium Important work, but not urgent or blocking. size:S less than 1 day labels Jan 11, 2022
@mdelapenya mdelapenya self-assigned this Jan 11, 2022
@mdelapenya mdelapenya requested review from tetianakravchenko and a team January 11, 2022 10:23
@elasticmachine
Copy link
Contributor

elasticmachine commented Jan 11, 2022

❕ Build Aborted

Either there was a build timeout or someone aborted the build.

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Start Time: 2022-01-11T10:54:00.301+0000

  • Duration: 62 min 22 sec

  • Commit: 776c62a

Test stats 🧪

Test Results
Failed 5
Passed 211
Skipped 0
Total 216

Test errors 5

Expand to view the tests failures

Initializing / End-To-End Tests / kubernetes_autodiscover_elastic-agent / Logs collection from running pod – elastic-agent standalone
    Expand to view the error details

     Step "elastic-agent" collects events with "kubernetes.pod.name:a-pod": failed to copy events from test-bdd4091e-c1ae-4779-adfd-1726b864182e/elastic-agent-th6s6:/tmp/beats-events: stat /tmp/test-1842835733/events: no such file or directory 
    

  • no stacktrace
Initializing / End-To-End Tests / kubernetes_autodiscover_elastic-agent / Logs collection from a pod with an init container – elastic-agent standalone
    Expand to view the error details

     Step "elastic-agent" collects events with "kubernetes.container.name:container-in-pod" 
    

  • no stacktrace
Initializing / End-To-End Tests / kubernetes_autodiscover_elastic-agent / Logs collection from short-living cronjobs – elastic-agent standalone
    Expand to view the error details

     Step "elastic-agent" collects events with "kubernetes.container.name:cronjob-container": failed to copy events from test-b0b44440-7460-4487-92e8-56ddecefa54e/elastic-agent-kkvqp:/tmp/beats-events: stat /tmp/test-730636594/events: no such file or directory 
    

  • no stacktrace
Initializing / End-To-End Tests / kubernetes_autodiscover_elastic-agent / Logs collection from failing pod – elastic-agent standalone
    Expand to view the error details

     Step "elastic-agent" collects events with "kubernetes.pod.name:a-failing-pod": failed to copy events from test-6e94a588-0a50-40c3-982c-0279aaf41986/elastic-agent-grg76:/tmp/beats-events: stat /tmp/test-3931697739/events: no such file or directory 
    

  • no stacktrace
Initializing / End-To-End Tests / kubernetes_autodiscover_elastic-agent / Metrics collection configured from targeted Redis Pod – elastic-agent standalone
    Expand to view the error details

     Step "elastic-agent" collects events with "kubernetes.pod.name:redis": failed to copy events from test-421a22ab-4039-44b6-b743-678b7e538e4f/elastic-agent-zbf4z:/tmp/beats-events: stat /tmp/test-919876881/events: no such file or directory 
    

  • no stacktrace

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@mdelapenya
Copy link
Contributor Author

/test

@mdelapenya
Copy link
Contributor Author

The 5 failing errors, not related to this PR; are reported in here #1992

Copy link

@tetianakravchenko tetianakravchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix!

@mdelapenya mdelapenya merged commit 54e882b into elastic:main Jan 11, 2022
mergify bot pushed a commit that referenced this pull request Jan 11, 2022
mergify bot pushed a commit that referenced this pull request Jan 11, 2022
mdelapenya added a commit that referenced this pull request Jan 12, 2022
(cherry picked from commit 54e882b)

Co-authored-by: Manuel de la Peña <[email protected]>
mdelapenya added a commit that referenced this pull request Jan 12, 2022
(cherry picked from commit 54e882b)

Co-authored-by: Manuel de la Peña <[email protected]>
mdelapenya added a commit to mdelapenya/e2e-testing that referenced this pull request Jan 19, 2022
mdelapenya added a commit that referenced this pull request Jan 19, 2022
* chore: add resiliency when provisioning the stack (#1990) (#1993)

(cherry picked from commit 54e882b)

Co-authored-by: Manuel de la Peña <[email protected]>

* SLES15 enablement (#2007)

* SLES15 enablement

* fix: set ansible_user depending on OS

* fix: proper vars path

* fix: rename SLES distribution file

* fix: read distribution vars dynamically with include_vars

* fix: keep original behaviour for installing the stack on Debian

* fix: set vars correctly

* chore: debug ansible user

* Update .ci/.e2e-tests.yaml

Co-authored-by: Victor Martinez <[email protected]>

* Update .ci/ansible/vars/SLES.yml

Co-authored-by: Adam Stokes <[email protected]>

* Update .ci/ansible/playbook.yml

Co-authored-by: Victor Martinez <[email protected]>

* Fix package install for distro, update include_vars for all tasks

Signed-off-by: Adam Stokes <[email protected]>

* fix path to vars

Signed-off-by: Adam Stokes <[email protected]>

* try with ansible_playbook_vars_root

Signed-off-by: Adam Stokes <[email protected]>

* try var_files

Signed-off-by: Adam Stokes <[email protected]>

* typo

Signed-off-by: Adam Stokes <[email protected]>

* use full path and fix quoting

Signed-off-by: Adam Stokes <[email protected]>

* use include_vars

Signed-off-by: Adam Stokes <[email protected]>

* make include_vars first in task list for each block

Signed-off-by: Adam Stokes <[email protected]>

* dont include_vars on localhost execution

Signed-off-by: Adam Stokes <[email protected]>

* remove conflicting statements

Signed-off-by: Adam Stokes <[email protected]>

* have e2e-tests.yaml drive the login information

Signed-off-by: Adam Stokes <[email protected]>

* fix update cache on debian based systems

Signed-off-by: Adam Stokes <[email protected]>

* fix permission on output directory

Signed-off-by: Adam Stokes <[email protected]>

* fix group ownership in create test script

Signed-off-by: Adam Stokes <[email protected]>

* better os detection in ansible

Signed-off-by: Adam Stokes <[email protected]>

* fix chown in jenkinsfile

Signed-off-by: Adam Stokes <[email protected]>

Co-authored-by: Manuel de la Peña <[email protected]>
Co-authored-by: Adam Stokes <[email protected]>
Co-authored-by: Victor Martinez <[email protected]>

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Julien Lind <[email protected]>
Co-authored-by: Adam Stokes <[email protected]>
Co-authored-by: Victor Martinez <[email protected]>
mdelapenya added a commit to mdelapenya/e2e-testing that referenced this pull request Jan 26, 2022
* main: (45 commits)
  feat: add CentOS 8 support (elastic#2034)
  fix: set default region for AWS cli (elastic#2053)
  chore: use Ansible's built-in replace instead of sed (elastic#2048)
  chore: split stack configuration and start into two tasks (elastic#2044)
  feat: enable SSH access to users for debugging cloud instances (elastic#2001)
  fix: use the right branch for 7.17 backports (elastic#2025)
  SLES15 enablement (elastic#2007)
  chore: bump stale agent for main (elastic#2014)
  Update `fetchBeatsBinary` to be reused in elastic-agent-poc (elastic#1984)
  chore: add resiliency when provisioning the stack (elastic#1990)
  chore: bump elastic-package to v0.32.1 (elastic#1959)
  feat: export Fetch&Download methods in the /pkg directory (elastic#1943)
  bump stack version 8.1.0-dbc834fd (elastic#1948)
  bump stack version 8.1.0-76902d39 (elastic#1946)
  chore: retire 7.15 adding 7.17 (elastic#1938)
  ci: use withAPMEnv (elastic#1917)
  Update main branch (elastic#1928)
  bump stack version 8.1.0-befff95a (elastic#1929)
  chore: properly evaluate how tests are skipped on CI when checking modified files (elastic#1924)
  bump stack version 8.1.0-60bffc32 (elastic#1921)
  ...
@mdelapenya mdelapenya deleted the ansible-resiliency branch July 13, 2022 09:33
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area:ci Anything related to the CI backport-v7.16.0 Automated backport with mergify backport-v7.17.0 Automated backport with mergify backport-v8.0.0 Automated backport with mergify impact:low Long-term priority, unless it's a quick fix. priority:medium Important work, but not urgent or blocking. size:S less than 1 day Team:Automation Label for the Observability productivity team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants