Skip to content
This repository has been archived by the owner on Sep 17, 2024. It is now read-only.

feat: enable SSH access to users for debugging cloud instances #2001

Merged
merged 26 commits into from
Jan 20, 2022

Conversation

mdelapenya
Copy link
Contributor

@mdelapenya mdelapenya commented Jan 13, 2022

  • chore: install ssh-import-id from pip
  • feat: do not destroy cloud resources if DEVELOPER_MODE is true
  • feat: install public SSH keys into the cloud instances
  • chore: enable @mdelapenya in SSH access
  • chore: mark EC2 instances with a proper label to be reaped
  • feat: add a pipeline that removes AWS cloud resources on Sundays

What does this PR do?

This PR does a few things:

  • it enables DEVELOPER_MODE as a Jenkins parameter: if checked, cloud instances won't be destroyed and a log will be shown
  • it adds a file where developers interested in SSH'ing into the cloud instances shuold place their Github usernames, so that we are able to get their public SSH keys from Github using ssh-import-id (see https://github.com/dustinkirkland/ssh-import-id)
  • it adds a shell script to execute the previous tool: the "magic" file with usernames will be read and the tool executed for each line in the file. This script will be called while provisioning the cloud instances.
  • it adds @mdelapenya as the first user with SSH access
  • it adds a new AWS tag to the created instances, so that it will be possible to destroy all instances with a specific tag: ReaperMark=e2e-testing-vm
  • it adds a regular pipeline with a cron trigger (Sundays at 0:00) that uses AWS cli to destroy all cloud instances with the specific tag.
  • it enriches the tags used to create the AWS instances:
    • ReaperMark=e2e-testing-vm
    • Kind equals to the nodeLabel value (stack or suite_arch)
    • GitSHA: last commit in the PR
    • Name: e2e-$Kind-$run_id
    • BuildURL: Jenkins URL of the build

Screenshot 2022-01-13 at 17 10 43

Why is it important?

We need developers to access the machines to troubleshoot test errors, and because we are not there yet in terms of full Observability of the test execution, we need to SSH into the machines and check logs, file states, etc.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have run the Unit tests (make unit-test), and they are passing locally
  • I have run the End-2-End tests for the suite I'm working on, and they are passing locally
  • I have noticed new Go dependencies (run make notice in the proper directory)

Author's Checklist

How to test this PR locally

To verify that the keys are installed, please first make a copy of your authorised keys file:

cp ~/.ssh/authorized_keys ~/.ssh/authorized_keys_backup
pip install ssh-import-id
./.ci/scripts/import-ssh-keys.sh
cat ~/.ssh/authorized_keys
# check that mdelapenya's keys are copied there
mv ~/.ssh/authorized_keys_backup ~/.ssh/authorized_keys

Related issues

@mdelapenya mdelapenya added backport-v7.16.0 Automated backport with mergify backport-v7.17.0 Automated backport with mergify backport-v8.0.0 Automated backport with mergify labels Jan 13, 2022
@mdelapenya mdelapenya self-assigned this Jan 13, 2022
@mdelapenya mdelapenya requested a review from a team January 13, 2022 11:46
.ci/aws-instances-reaper.groovy Outdated Show resolved Hide resolved
quietPeriod(10)
}
triggers {
cron '0 0 * * 0'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should run this daily, removing the DEVELOPER_MODE logic and never destroy the stack and runners.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, though what if they are 50 PR runs in a day? We may run into resource restrictions and tests would fail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, Travis had a nice feature you were able to connect to the machines but only for 30 minutes

@elasticmachine
Copy link
Contributor

elasticmachine commented Jan 13, 2022

💔 Tests Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-01-20T07:00:57.110+0000

  • Duration: 62 min 55 sec

  • Commit: f8ef5e5

Test stats 🧪

Test Results
Failed 1
Passed 217
Skipped 0
Total 218

Test errors 1

Expand to view the tests failures

Initializing / End-To-End Tests / kubernetes_autodiscover_elastic-agent / [empty] – TEST-x86_64-kubernetes-autodiscover-1da92a12-2022-01-20-07:20:36.xml
  • no error details
  • Expand to view the stacktrace

     Test report file /var/lib/jenkins/workspace/PR-2001-20-a99ca7a8-81cb-4056-aee7-6194a0d2cb65/outputs/13.59.27.247/TEST-x86_64-kubernetes-autodiscover-1da92a12-2022-01-20-07:20:36.xml was length 0 
    

🐛 Flaky test report

❕ There are test failures but not known flaky tests.

Expand to view the summary

Genuine test errors 1

💔 There are test failures but not known flaky tests, most likely a genuine test failure.

  • Name: Initializing / End-To-End Tests / kubernetes_autodiscover_elastic-agent / [empty] – TEST-x86_64-kubernetes-autodiscover-1da92a12-2022-01-20-07:20:36.xml

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@mdelapenya
Copy link
Contributor Author

The cloud instance has been created with the new tag!
Screenshot 2022-01-13 at 13 30 47

Provisioning from local machine will lead to buildURL not to be populated,
so we need a default value
@mergify
Copy link
Contributor

mergify bot commented Jan 19, 2022

This pull request is now in conflict. Could you fix it @mdelapenya? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b ssh-access-to-cloud-vms upstream/ssh-access-to-cloud-vms
git merge upstream/main
git push upstream ssh-access-to-cloud-vms

@@ -93,6 +94,7 @@ pipeline {
githubCheckNotify('PENDING') // we want to notify the upstream about the e2e the soonest
stash allowEmpty: true, name: 'source', useDefaultExcludes: false
setEnvVar("GO_VERSION", readFile("${env.WORKSPACE}/${env.BASE_DIR}/.go-version").trim())
setEnvVar("LABELS_STRING", "buildURL=${env.BUILD_URL} gitSha=${env.GIT_BASE_COMMIT}")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define labels only once

@mdelapenya
Copy link
Contributor Author

/test

@mdelapenya
Copy link
Contributor Author

@adam-stokes I think these changes are correct: the machines are receiving the desired tags, and they are kept after the pipeline finishes. We may need to iterate through the reaper pipeline in the case is not working properly, but we can merge this one and open a new ticket for those potential issues.

@mdelapenya mdelapenya merged commit fd75b80 into elastic:main Jan 20, 2022
@mdelapenya
Copy link
Contributor Author

@Mergifyio refresh

mdelapenya added a commit to mdelapenya/e2e-testing that referenced this pull request Jan 20, 2022
…ic#2001)

* chore: install ssh-import-id from pip

It will allow importing public SSH keys from Github so we can easily provide
a way for developers to connect to the cloud instances.
See https://github.com/dustinkirkland/ssh-import-id

* feat: do not destroy cloud resources if DEVELOPER_MODE is true

* feat: install public SSH keys into the cloud instances

* chore: enable @mdelapenya in SSH access

* chore: mark EC2 instances with a proper label to be reaped

* feat: add a pipeline that removes AWS cloud resources on Sundays

* chore: force installation

* fix: install pip

* fix: run script after having the repo (order matters)

* chore: use base commit as RUN_ID

This way we will be able to correlate the cloud instances with a commit SHA
instead of a random UUID

* docs: update RUN_ID docs

* chore: add @adam-stokes as SSH user

* fix: do not import ssh keys as root

* chore: rename tag used on AWS reaper

* chore: add a label for node kind

* chore: add buildURL as tag

* fix: camelcase

* fix: pass buildURL when creating runners

* chore: use default value for buildURL

Provisioning from local machine will lead to buildURL not to be populated,
so we need a default value

* chore: simplify reaper pipeline removing git checkout

* chore: install python-pip for Suse and CentOS

* chore: back to a random UUID

* feat: add a VM label for the git base commit

* chore: define labels once

Co-authored-by: Adam Stokes <[email protected]>
mdelapenya added a commit to mdelapenya/e2e-testing that referenced this pull request Jan 20, 2022
…ic#2001)

* chore: install ssh-import-id from pip

It will allow importing public SSH keys from Github so we can easily provide
a way for developers to connect to the cloud instances.
See https://github.com/dustinkirkland/ssh-import-id

* feat: do not destroy cloud resources if DEVELOPER_MODE is true

* feat: install public SSH keys into the cloud instances

* chore: enable @mdelapenya in SSH access

* chore: mark EC2 instances with a proper label to be reaped

* feat: add a pipeline that removes AWS cloud resources on Sundays

* chore: force installation

* fix: install pip

* fix: run script after having the repo (order matters)

* chore: use base commit as RUN_ID

This way we will be able to correlate the cloud instances with a commit SHA
instead of a random UUID

* docs: update RUN_ID docs

* chore: add @adam-stokes as SSH user

* fix: do not import ssh keys as root

* chore: rename tag used on AWS reaper

* chore: add a label for node kind

* chore: add buildURL as tag

* fix: camelcase

* fix: pass buildURL when creating runners

* chore: use default value for buildURL

Provisioning from local machine will lead to buildURL not to be populated,
so we need a default value

* chore: simplify reaper pipeline removing git checkout

* chore: install python-pip for Suse and CentOS

* chore: back to a random UUID

* feat: add a VM label for the git base commit

* chore: define labels once

Co-authored-by: Adam Stokes <[email protected]>
@elasticmachine
Copy link
Contributor

💔 Build Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-01-20T08:57:03.266+0000

  • Duration: 32 min 47 sec

  • Commit: f8ef5e5

Test stats 🧪

Test Results
Failed 0
Passed 185
Skipped 0
Total 185

Test errors 0

Expand to view the tests failures

Initializing / End-To-End Tests / kubernetes_autodiscover_elastic-agent / [empty] – TEST-x86_64-kubernetes-autodiscover-01b17299-2022-01-20-09:16:31.xml
  • no error details
  • Expand to view the stacktrace

     Test report file /var/lib/jenkins/workspace/PR-2001-21-3855f1cd-a5b1-4f27-a5b5-48da79e6c08c/outputs/13.59.26.25/TEST-x86_64-kubernetes-autodiscover-01b17299-2022-01-20-09:16:31.xml was length 0 
    

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Genuine test errors 1

💔 There are test failures but not known flaky tests, most likely a genuine test failure.

  • Name: Initializing / End-To-End Tests / kubernetes_autodiscover_elastic-agent / [empty] – TEST-x86_64-kubernetes-autodiscover-01b17299-2022-01-20-09:16:31.xml

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

mdelapenya added a commit that referenced this pull request Jan 20, 2022
#2031)

* chore: install ssh-import-id from pip

It will allow importing public SSH keys from Github so we can easily provide
a way for developers to connect to the cloud instances.
See https://github.com/dustinkirkland/ssh-import-id

* feat: do not destroy cloud resources if DEVELOPER_MODE is true

* feat: install public SSH keys into the cloud instances

* chore: enable @mdelapenya in SSH access

* chore: mark EC2 instances with a proper label to be reaped

* feat: add a pipeline that removes AWS cloud resources on Sundays

* chore: force installation

* fix: install pip

* fix: run script after having the repo (order matters)

* chore: use base commit as RUN_ID

This way we will be able to correlate the cloud instances with a commit SHA
instead of a random UUID

* docs: update RUN_ID docs

* chore: add @adam-stokes as SSH user

* fix: do not import ssh keys as root

* chore: rename tag used on AWS reaper

* chore: add a label for node kind

* chore: add buildURL as tag

* fix: camelcase

* fix: pass buildURL when creating runners

* chore: use default value for buildURL

Provisioning from local machine will lead to buildURL not to be populated,
so we need a default value

* chore: simplify reaper pipeline removing git checkout

* chore: install python-pip for Suse and CentOS

* chore: back to a random UUID

* feat: add a VM label for the git base commit

* chore: define labels once

Co-authored-by: Adam Stokes <[email protected]>

Co-authored-by: Adam Stokes <[email protected]>
mdelapenya added a commit that referenced this pull request Jan 20, 2022
#2030)

* chore: install ssh-import-id from pip

It will allow importing public SSH keys from Github so we can easily provide
a way for developers to connect to the cloud instances.
See https://github.com/dustinkirkland/ssh-import-id

* feat: do not destroy cloud resources if DEVELOPER_MODE is true

* feat: install public SSH keys into the cloud instances

* chore: enable @mdelapenya in SSH access

* chore: mark EC2 instances with a proper label to be reaped

* feat: add a pipeline that removes AWS cloud resources on Sundays

* chore: force installation

* fix: install pip

* fix: run script after having the repo (order matters)

* chore: use base commit as RUN_ID

This way we will be able to correlate the cloud instances with a commit SHA
instead of a random UUID

* docs: update RUN_ID docs

* chore: add @adam-stokes as SSH user

* fix: do not import ssh keys as root

* chore: rename tag used on AWS reaper

* chore: add a label for node kind

* chore: add buildURL as tag

* fix: camelcase

* fix: pass buildURL when creating runners

* chore: use default value for buildURL

Provisioning from local machine will lead to buildURL not to be populated,
so we need a default value

* chore: simplify reaper pipeline removing git checkout

* chore: install python-pip for Suse and CentOS

* chore: back to a random UUID

* feat: add a VM label for the git base commit

* chore: define labels once

Co-authored-by: Adam Stokes <[email protected]>

Co-authored-by: Adam Stokes <[email protected]>
@mergify
Copy link
Contributor

mergify bot commented Jan 20, 2022

refresh

✅ Pull request refreshed

mergify bot pushed a commit that referenced this pull request Jan 20, 2022
* chore: install ssh-import-id from pip

It will allow importing public SSH keys from Github so we can easily provide
a way for developers to connect to the cloud instances.
See https://github.com/dustinkirkland/ssh-import-id

* feat: do not destroy cloud resources if DEVELOPER_MODE is true

* feat: install public SSH keys into the cloud instances

* chore: enable @mdelapenya in SSH access

* chore: mark EC2 instances with a proper label to be reaped

* feat: add a pipeline that removes AWS cloud resources on Sundays

* chore: force installation

* fix: install pip

* fix: run script after having the repo (order matters)

* chore: use base commit as RUN_ID

This way we will be able to correlate the cloud instances with a commit SHA
instead of a random UUID

* docs: update RUN_ID docs

* chore: add @adam-stokes as SSH user

* fix: do not import ssh keys as root

* chore: rename tag used on AWS reaper

* chore: add a label for node kind

* chore: add buildURL as tag

* fix: camelcase

* fix: pass buildURL when creating runners

* chore: use default value for buildURL

Provisioning from local machine will lead to buildURL not to be populated,
so we need a default value

* chore: simplify reaper pipeline removing git checkout

* chore: install python-pip for Suse and CentOS

* chore: back to a random UUID

* feat: add a VM label for the git base commit

* chore: define labels once

Co-authored-by: Adam Stokes <[email protected]>
(cherry picked from commit fd75b80)
mergify bot pushed a commit that referenced this pull request Jan 20, 2022
* chore: install ssh-import-id from pip

It will allow importing public SSH keys from Github so we can easily provide
a way for developers to connect to the cloud instances.
See https://github.com/dustinkirkland/ssh-import-id

* feat: do not destroy cloud resources if DEVELOPER_MODE is true

* feat: install public SSH keys into the cloud instances

* chore: enable @mdelapenya in SSH access

* chore: mark EC2 instances with a proper label to be reaped

* feat: add a pipeline that removes AWS cloud resources on Sundays

* chore: force installation

* fix: install pip

* fix: run script after having the repo (order matters)

* chore: use base commit as RUN_ID

This way we will be able to correlate the cloud instances with a commit SHA
instead of a random UUID

* docs: update RUN_ID docs

* chore: add @adam-stokes as SSH user

* fix: do not import ssh keys as root

* chore: rename tag used on AWS reaper

* chore: add a label for node kind

* chore: add buildURL as tag

* fix: camelcase

* fix: pass buildURL when creating runners

* chore: use default value for buildURL

Provisioning from local machine will lead to buildURL not to be populated,
so we need a default value

* chore: simplify reaper pipeline removing git checkout

* chore: install python-pip for Suse and CentOS

* chore: back to a random UUID

* feat: add a VM label for the git base commit

* chore: define labels once

Co-authored-by: Adam Stokes <[email protected]>
(cherry picked from commit fd75b80)
mergify bot pushed a commit that referenced this pull request Jan 20, 2022
* chore: install ssh-import-id from pip

It will allow importing public SSH keys from Github so we can easily provide
a way for developers to connect to the cloud instances.
See https://github.com/dustinkirkland/ssh-import-id

* feat: do not destroy cloud resources if DEVELOPER_MODE is true

* feat: install public SSH keys into the cloud instances

* chore: enable @mdelapenya in SSH access

* chore: mark EC2 instances with a proper label to be reaped

* feat: add a pipeline that removes AWS cloud resources on Sundays

* chore: force installation

* fix: install pip

* fix: run script after having the repo (order matters)

* chore: use base commit as RUN_ID

This way we will be able to correlate the cloud instances with a commit SHA
instead of a random UUID

* docs: update RUN_ID docs

* chore: add @adam-stokes as SSH user

* fix: do not import ssh keys as root

* chore: rename tag used on AWS reaper

* chore: add a label for node kind

* chore: add buildURL as tag

* fix: camelcase

* fix: pass buildURL when creating runners

* chore: use default value for buildURL

Provisioning from local machine will lead to buildURL not to be populated,
so we need a default value

* chore: simplify reaper pipeline removing git checkout

* chore: install python-pip for Suse and CentOS

* chore: back to a random UUID

* feat: add a VM label for the git base commit

* chore: define labels once

Co-authored-by: Adam Stokes <[email protected]>
(cherry picked from commit fd75b80)

# Conflicts:
#	.ci/Jenkinsfile
#	.ci/ansible/playbook.yml
#	.ci/ansible/tasks/install_deps.yml
mdelapenya added a commit that referenced this pull request Jan 24, 2022
… backport for 7.16 (#2032)

* chore: remove unused code (#1119)

* chore: remove unused code

* chore: remove all references to fleet server hostname

Because we assume it's a runtime dependency, provided by the initial
compose file, we do not need to calculate service names, or URIs for the
fleet-service endpoint. Instead, we assume it's listening in the 8220 port
in the "fleet-server" hostname, which is accessible from the network
created by docker-compose.

* fix: use HTTP to connect to fleet-server

* chore: remove fleet server policy code

We do not need it anymore, as the fleet server is already bootstrapped

* chore: remove all policies but system and fleet_server

* Update policies.go

* Update fleet.go

* Update stand-alone.go

Co-authored-by: Adam Stokes <[email protected]>

* fix: wrong resolve conflicts

* fix: wrong resolve conflicts

Co-authored-by: Adam Stokes <[email protected]>
mdelapenya added a commit to mdelapenya/e2e-testing that referenced this pull request Jan 26, 2022
* main: (45 commits)
  feat: add CentOS 8 support (elastic#2034)
  fix: set default region for AWS cli (elastic#2053)
  chore: use Ansible's built-in replace instead of sed (elastic#2048)
  chore: split stack configuration and start into two tasks (elastic#2044)
  feat: enable SSH access to users for debugging cloud instances (elastic#2001)
  fix: use the right branch for 7.17 backports (elastic#2025)
  SLES15 enablement (elastic#2007)
  chore: bump stale agent for main (elastic#2014)
  Update `fetchBeatsBinary` to be reused in elastic-agent-poc (elastic#1984)
  chore: add resiliency when provisioning the stack (elastic#1990)
  chore: bump elastic-package to v0.32.1 (elastic#1959)
  feat: export Fetch&Download methods in the /pkg directory (elastic#1943)
  bump stack version 8.1.0-dbc834fd (elastic#1948)
  bump stack version 8.1.0-76902d39 (elastic#1946)
  chore: retire 7.15 adding 7.17 (elastic#1938)
  ci: use withAPMEnv (elastic#1917)
  Update main branch (elastic#1928)
  bump stack version 8.1.0-befff95a (elastic#1929)
  chore: properly evaluate how tests are skipped on CI when checking modified files (elastic#1924)
  bump stack version 8.1.0-60bffc32 (elastic#1921)
  ...
@mdelapenya mdelapenya deleted the ssh-access-to-cloud-vms branch July 13, 2022 09:32
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport-v7.16.0 Automated backport with mergify backport-v7.17.0 Automated backport with mergify backport-v8.0.0 Automated backport with mergify
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Re-enable DEV_MODE to enable access the VMs
4 participants