Skip to content
This repository has been archived by the owner on Sep 17, 2024. It is now read-only.

feat: enable SSH access to users for debugging cloud instances #2001

Merged
merged 26 commits into from
Jan 20, 2022
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
a6f0f3a
chore: install ssh-import-id from pip
mdelapenya Jan 13, 2022
03fe5fa
feat: do not destroy cloud resources if DEVELOPER_MODE is true
mdelapenya Jan 13, 2022
12de3e0
feat: install public SSH keys into the cloud instances
mdelapenya Jan 13, 2022
ffcb5e6
chore: enable @mdelapenya in SSH access
mdelapenya Jan 13, 2022
da45eda
chore: mark EC2 instances with a proper label to be reaped
mdelapenya Jan 13, 2022
39d9e7c
feat: add a pipeline that removes AWS cloud resources on Sundays
mdelapenya Jan 13, 2022
459c110
chore: force installation
mdelapenya Jan 13, 2022
8e90eb1
fix: install pip
mdelapenya Jan 13, 2022
55608f2
fix: run script after having the repo (order matters)
mdelapenya Jan 13, 2022
72841b6
chore: use base commit as RUN_ID
mdelapenya Jan 13, 2022
bd3a3e1
docs: update RUN_ID docs
mdelapenya Jan 13, 2022
5943bc1
chore: add @adam-stokes as SSH user
mdelapenya Jan 13, 2022
1f13255
fix: do not import ssh keys as root
mdelapenya Jan 13, 2022
36067d3
chore: rename tag used on AWS reaper
mdelapenya Jan 13, 2022
62448ec
chore: add a label for node kind
mdelapenya Jan 13, 2022
42859f2
chore: add buildURL as tag
mdelapenya Jan 13, 2022
dd69e47
fix: camelcase
mdelapenya Jan 13, 2022
e33ba84
fix: pass buildURL when creating runners
mdelapenya Jan 13, 2022
1a1d1b8
chore: use default value for buildURL
mdelapenya Jan 13, 2022
600eb0a
chore: simplify reaper pipeline removing git checkout
mdelapenya Jan 14, 2022
ce33f28
Merge branch 'main' into ssh-access-to-cloud-vms
adam-stokes Jan 18, 2022
f0967c8
Merge branch 'main' into ssh-access-to-cloud-vms
mdelapenya Jan 19, 2022
5450430
chore: install python-pip for Suse and CentOS
mdelapenya Jan 19, 2022
56f215f
chore: back to a random UUID
mdelapenya Jan 19, 2022
743ba78
feat: add a VM label for the git base commit
mdelapenya Jan 19, 2022
f8ef5e5
chore: define labels once
mdelapenya Jan 19, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 12 additions & 2 deletions .ci/Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ pipeline {
}
parameters {
booleanParam(name: 'Run_As_Main_Branch', defaultValue: false, description: 'Allow to run any steps on a PR, some steps normally only run on main branch.')
booleanParam(name: "DEVELOPER_MODE", defaultValue: false, description: "If checked, cloud resources won't be destroyed at the end of the pipeline. Default false")
booleanParam(name: "SKIP_SCENARIOS", defaultValue: true, description: "If it's needed to skip those scenarios marked as @skip. Default true")
booleanParam(name: "NIGHTLY_SCENARIOS", defaultValue: false, description: "If it's needed to include the scenarios marked as @nightly in the test execution. Default false")
string(name: 'runTestsSuites', defaultValue: '', description: 'A comma-separated list of test suites to run (default: empty to run all test suites)')
Expand Down Expand Up @@ -268,9 +269,14 @@ pipeline {
script {
def stackWorkspace = "${env.WORKSPACE}/${env.BASE_DIR}"
def stackMachine = getMachineInfo(stackWorkspace, 'stack')
ansible(stackWorkspace,
if (params.DEVELOPER_MODE) {
def stackRunner = getNodeIp(stackWorkspace, 'stack')
log(level: 'DEBUG', text: "Stack instance won't be destroyed after the build. Please SSH into the stack machine on ${stackRunner.ip}")
} else {
ansible(stackWorkspace,
env.RUN_ID.split('-')[0],
mdelapenya marked this conversation as resolved.
Show resolved Hide resolved
"-t destroy --extra-vars=\"nodeLabel=stack nodeImage=${stackMachine.image} nodeInstanceType=${stackMachine.instance_type}\"")
}
}
}
}
Expand Down Expand Up @@ -530,9 +536,13 @@ def generateFunctionalTestStep(Map args = [:]){
"e2e-testing/outputs/TEST-*${runId}*.xml",
"outputs/${testRunner.ip}/.")
sh "ls -l outputs/${testRunner.ip}"
ansible("${env.WORKSPACE}",
if (params.DEVELOPER_MODE) {
log(level: 'DEBUG', text: "Cloud instance won't be destroyed after the build. Please SSH into the test runner machine on ${testRunner.ip}. ")
} else {
ansible("${env.WORKSPACE}",
runId,
"-t destroy --extra-vars=\"nodeLabel=${platform} nodeImage=${machine.image} nodeInstanceType=${machine.instance_type}\"")
}
junit allowEmptyResults: true,
keepLongStdio: true,
testResults: "outputs/${testRunner.ip}/TEST-*${runId}*.xml"
Expand Down
1 change: 1 addition & 0 deletions .ci/ansible/github-ssh-keys
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
mdelapenya
10 changes: 10 additions & 0 deletions .ci/ansible/playbook.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,11 @@
tags:
- setup-stack

- name: Add SSH keys to stack
include_tasks: tasks/install_ssh_keys.yml
tags:
- setup-stack

- name: Start stack
shell: |
sed -i '' -e 's,http://elasticsearch,http://{{inventory_hostname}},g' /home/{{ansible_user}}/e2e-testing/cli/config/compose/profiles/fleet/default/kibana.config.yml
Expand Down Expand Up @@ -103,6 +108,11 @@
tags:
- setup-node

- name: Add SSH keys to runner instances
include_tasks: tasks/install_ssh_keys.yml
tags:
- setup-node

- name: Setup source code
include_tasks: tasks/copy_test_files.yml
tags:
Expand Down
4 changes: 4 additions & 0 deletions .ci/ansible/tasks/install_deps.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,7 @@
register: package_install_res
retries: 5
until: package_install_res is success

- name: Install ssh-import-id python package to copy public SSH keys from Github accounts
pip:
name: ssh-import-id
4 changes: 4 additions & 0 deletions .ci/ansible/tasks/install_ssh_keys.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
- name: Install SSH keys
shell: |
/home/{{ansible_user}}/e2e-testing/.ci/scripts/import-ssh-keys.sh
mdelapenya marked this conversation as resolved.
Show resolved Hide resolved
1 change: 1 addition & 0 deletions .ci/ansible/tasks/runners.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
image: '{{nodeImage}}'
instance_type: '{{nodeInstanceType}}'
instance_tags:
Kind: "e2e-testing-vm"
Name: "e2e-{{nodeLabel}}-{{runId}}"
count_tag:
Name: "e2e-{{nodeLabel}}-{{runId}}"
Expand Down
60 changes: 60 additions & 0 deletions .ci/aws-instances-reaper.groovy
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
#!/usr/bin/env groovy

@Library('apm@current') _

pipeline {
agent { label 'ubuntu-20' }
environment {
REPO = 'e2e-testing'
BASE_DIR = "src/github.com/elastic/${env.REPO}"
HOME = "${env.WORKSPACE}"
NOTIFY_TO = credentials('notify-to')
PIPELINE_LOG_LEVEL = 'INFO'
JOB_GIT_CREDENTIALS = "f6c7695a-671e-4f4f-a331-acdce44ff9ba"
AWS_PROVISIONER_SECRET = 'secret/observability-team/ci/elastic-observability-aws-account-auth'
AWS_EC2_INSTANCES_TAG= 'e2e-testing-vm'
}
options {
timeout(time: 1, unit: 'HOURS')
buildDiscarder(logRotator(numToKeepStr: '20', artifactNumToKeepStr: '20'))
timestamps()
ansiColor('xterm')
disableResume()
durabilityHint('PERFORMANCE_OPTIMIZED')
rateLimitBuilds(throttle: [count: 60, durationName: 'hour', userBoost: true])
quietPeriod(10)
}
triggers {
cron '0 0 * * 0'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should run this daily, removing the DEVELOPER_MODE logic and never destroy the stack and runners.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, though what if they are 50 PR runs in a day? We may run into resource restrictions and tests would fail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, Travis had a nice feature you were able to connect to the machines but only for 30 minutes

}
stages {
stage('Checkout') {
steps {
deleteDir()
gitCheckout(basedir: "${BASE_DIR}",
branch: "main",
repo: "https://github.com/elastic/${REPO}.git",
credentialsId: "${JOB_GIT_CREDENTIALS}"
)
stash allowEmpty: true, name: 'source', useDefaultExcludes: false
}
}
stage('Reap AWS instances'){
environment {
HOME = "${env.WORKSPACE}/${BASE_DIR}"
}
steps {
deleteDir()
unstash 'source'
withAWSEnv(secret: "${env.AWS_PROVISIONER_SECRET}") {
mdelapenya marked this conversation as resolved.
Show resolved Hide resolved
sh("aws ec2 terminate-instances --instance-ids `aws ec2 describe-instances --filters Name=tag:Kind,Values=${env.AWS_EC2_INSTANCES_TAG} --query Reservations[].Instances[].InstanceId --output text`")
}
}
}
}
post {
cleanup {
notifyBuildResult()
}
}
}
22 changes: 22 additions & 0 deletions .ci/jobs/aws-instances-reaper.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
- job:
name: e2e-tests/aws-instances-reaper
display-name: AWS Instances Reaper
description: Job to remove cloud resources on Sundays
view: Beats
project-type: pipeline
pipeline-scm:
script-path: .ci/aws-instances-reaper.groovy
scm:
- git:
url: [email protected]:elastic/e2e-testing.git
refspec: +refs/heads/*:refs/remotes/origin/*
wipe-workspace: true
name: origin
shallow-clone: true
credentials-id: f6c7695a-671e-4f4f-a331-acdce44ff9ba
reference-repo: /var/lib/jenkins/.git-references/e2e-testing.git
branches:
- main
triggers:
- timed: '0 0 * * 0'
19 changes: 19 additions & 0 deletions .ci/scripts/import-ssh-keys.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/usr/bin/env bash

## Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
## or more contributor license agreements. Licensed under the Elastic License;
## you may not use this file except in compliance with the Elastic License.

set -euxo pipefail

#
# Imports public SSH keys from Github profiles
#

BASEDIR=$(dirname "$0")

input="${BASEDIR}/../ansible/github-ssh-keys"
while IFS= read -r line
do
ssh-import-id "gh:$line"
done < "$input"
5 changes: 5 additions & 0 deletions e2e/TROUBLESHOOTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@ The first step in determining the exact failure is to try and reproduce the test

Each test suite's documentation should contain the specifics to run the tests, but it's summarises to executing `go test` or `godog` in the right directory.

### SSH into the Cloud machines
On CI, we are running the Elastic Stack and all test suites in AWS instances, so whenever a build failed we would need to access those machines and inspect the state of the machine: logs, files, containers... For that, we are enabling SSH access to those ephemeral machines, which will be kept for debugging purpose if and only if the DEVELOPER_MODE environment variable is set at the Jenkinsfile. In the UI of Jenkins, you can enable it using the DEVELOPMENT_MODE input argument, checking it to true (default is false). After the build finishes, the cloud instances won't be destroyed.

To access the machines, you must be allowed to do so first, and for that, please submit a PR adding your Github username in alphabetical order to [this file](../.ci/ansible/github-ssh-keys), keeping a blank line as file ending.

### Tests fail because the product could not be configured or run correctly
This type of failure usually indicates that code for these tests itself needs to be changed. See the sections on how to run the tests locally in the specific test suite.

Expand Down