-
Notifications
You must be signed in to change notification settings - Fork 864
infrastructure jenkins
The Open MPI community runs a Jenkins build server (and fleet of build workers) to check both pull requests and new commits. The server runs on aws.open-mpi.org, with builders running at a number of institutions as well as AWS. In addition, Mellanox and IBM run their own Jenkins master, as there is some significant level of trust between master and slave. This page does not document how to use Jenkins, but instead is notes on how Jenkins is currently configured. Brian and Howard have been doing most of the configuration lately.
If you're interested in hosting a build server on the community Jenkins, there are a couple requirements:
- If you're providing a new architecture (say, the first MIPS64 machine), you must provide at least two build hosts, preferably with different fault characteristics (at a minimum, avoid rebooting both at the same time for maintenance). It's really a drag when development stops for a day because one of the build servers went down.
- Your build servers must be able to reach jenkins.open-mpi.org and github.com on both http and https connections. While it's possible to have a non-transparent proxy, Howard can attest that it makes the setup considerably more difficult.
- You (or your company, if this is hosted at a company) have to be willing to let the agent run. Keep in mind that the Jenkins builders run whatever arbitrary commands the Jenkins server gives it. Those commands can include
rm -rf /
or other things if someone makes a mistake. Yes, the agent (hopefully) runs as an unprivileged user, but this can still be scary for some organizations.
If that didn't chase you away, you need to set up an agent.
EC2 Builders run in the Open MPI Production account (same place aws.open-mpi.org
runs), using the Jenkins EC2 plugin, which does auto scaling and instance stop/terminate when idle. The configuration is pretty straight forward, configured in the master configuration screen on Jenkins. aws.open-mpi.org
has a role that allows all the permissions needed to modify EC2 instance state, meaning that we don't embed AWS creds in Jenkins, but instead use the Use EC2 instance profile to obtain credentials
option.
There are a number of disadvantages to starting/stoping an instance (roughly equivalent to turning off the machine, then turning it back on later), so we terminate instances when idle. This means we also build a new instance every time we need new workers. Initially, we did this by having a long CloudInit script that installed all the packages, but this slowed new instance startup time and sometimes made Jenkins angry. So we now custom-build AMIs for all our test platforms and skip the CloudInit script.
AMI | Description | Parent AMI | Default User |
---|---|---|---|
ami-3f130246 | Amazon Linux 17.03 | ami-8ca83fec | ec2-user |
ami-e0120399 | FreeBSD 11 | ami-6926f809 | ec2-user |
ami-6c100115 | RHEL 7.3 | ami-6f68cf0f | ec2-user |
ami-9c1302e5 | SLES 12 | ami-e4a30084 | ec2-user |
ami-ab1001d2 | Ubuntu 16.04 | ami-a58d0dc5 | ubuntu |
Follow the instructions in https://github.com/open-mpi/ompi-scripts/blob/master/jenkins/customize-ami.sh to build a new Jenkins builder AMI (The script works for all the OSes listed above; some updates may be needed for new OSes / releases). You must be using the ompi-aws-production (518752846868) account when building new AMIs; if you don't have access talk to Brian, Jeff, or Howard. Add the AMI id to the table above when the AMI is completed.
Configure Jenkins as below. Launch a test builder to verify that it becomes health. The RHEL ami billing code doesn't support spot pricing, so we use a t2.medium instead of a spot m4.large for building. The t2 can run into credit exhaustion problems, but is significantly cheaper than the m4 on-demand pricing.
Configuration (all AMIs except RHEL):
Instance Type: M4Large
Availability Zone: us-west-2c
Use Spot Instances: <yes>
Spot Max Bid Price: <approximately 2x on-demand price>
Security group names: sg-f87d7a9e
Remote FS root: /home/<DEFAULT_USER>
Remote user: DEFAULT_USER
AMI Type: unix
Root command prefix: sudo
Remote ssh port: 22
Labels: <LABELS FROM customize-ami.sh>
Idle termination time: -5
User Date:
TBD
Number of Executors: 1
Stop/Disconnect on Idle Timeout: True
Subnet ID for VPC: subnet-a9e154f1
Instance Cap: 5
IAM Instance Profile: arn:aws:iam::518752846868:instance-profile/jenkins-worker
Associate Public IP: True
Configuration (RHEL):
Instance Type: T2Micro
Availability Zone: us-west-2c
Security group names: sg-f87d7a9e
Remote FS root: /home/<DEFAULT_USER>
Remote user: DEFAULT_USER
AMI Type: unix
Root command prefix: sudo
Remote ssh port: 22
Labels: <LABELS FROM customize-ami.sh>
Idle termination time: -5
User Date:
TBD
Number of Executors: 1
Stop/Disconnect on Idle Timeout: True
Subnet ID for VPC: subnet-a9e154f1
Instance Cap: 5
IAM Instance Profile: arn:aws:iam::518752846868:instance-profile/jenkins-worker
Associate Public IP: True
The Idle termination time: -5
is not a typo. EC2 bills in hour increments. So if an instance gets stopped after running 30 minutes or after 59 minutes, it's the same cost. -5 means Jenkins won't stop an instance until 5 minutes before the hour is up, maximizing instance reuse if there are multiple builds in the hour.