Skip to content
Guillaume Tucker edited this page Mar 12, 2019 · 10 revisions

Changes in kernelci-core are first made in kernelci-core-staging and tested on staging.kernelci.org. Then a PR is made to merge these changes with the production repository kernelci-core which is used directly by Jenkins.

To create a staging -> prod PR

First, create a fork of kernelci-core. Then clone it and add the staging and prod remotes:

git clone [email protected]:<user-name>/kernelci-core.git
cd kernelci-core
git remote add staging [email protected]:kernelci/kernelci-core-staging.git
git remote update staging
git checkout staging/master -b staging-master
git remote add prod [email protected]:kernelci/kernelci-core.git
git remote update prod

Then create a new staging tag (replacing YYYYMMDD with the actual date), start a PR branch from it and rebase all the changes since the previous tag onto prod:

git checkout staging-master
git pull --ff-only
git tag -a staging-YYYYMMDD -m staging-YYYYMMDD
git push staging staging-YYYYMMDD
git checkout -b pr/staging-YYYYMMDD
git remote update prod
git rebase -i staging-<old-tag> --onto=prod/master

At this point it's worth comparing the staging branch for the PR with the staging-master branch to check all the changes are in the PR:

git diff staging-master staging-YYYYMMDD

If it all looks OK then push it to your forked repo:

git push origin HEAD:staging-YYYYMMDD

Then visit your kernelci-core fork and create a PR as usual from there. Add a description with a summary of the changes in the PR.

To update production

Updating the code used in production can be disruptive, especially as kernelci-backend, kernelci-frontend and kernelci-core may need to be kept in sync and updated all together. The Jenkins jobs configuration may also need to be adjusted to match kernelci-core changes (list of parameters, overall jobs configuration). Several things need to be done in order to maximise the continuity of the kernelci.org service, as explained below. These things should gradually become either automated or simplified to reduce the amount of manual work and chances of missing a step.

Pause the kernel tree monitor job to flush Jenkins jobs

The kernel-tree-monitor job periodically looks for new changes in all the kernel trees. As updating the production code should be done atomically for all the components, it's necessary to wait until all the Jenkins jobs have completed. This can be achieved by pausing the monitor job and then waiting for any on-going or queued job to complete in Jenkins.

Ideally, all the LAVA jobs should also complete in case updating the kernelci-backend may cause some callbacks to fail. The email reports should also ideally all be sent before updating the kernelci-backend code, although scheduled reports will remain in the queue even if the code is updated. In practice, if the kernelci-backend code doesn't need to be updated or if the API hasn't changed, given the short downtime caused by the update it is typically fine to not wait for these things to complete (and which may take several hours).

Merge the staging -> production PR

As explained above, a PR should be created to synchronise the staging branch with the production one. Once the Jenkins jobs are all done, it's time to merge it.

Go through all the Jenkins jobs and update their configuration if necessary. This should be explained in the PR release notes, but it depends on each case. (ToDo: keep Jenkins jobs definitions in Git to automate that)

Update kernelci-backend and kernelci-frontend

If there have been any kernelci-backend or kernelci-frontend changes, ensure that a new version has been created with a tag. Also double check that the modified Javascript files in kernelci-frontend have a new dated name to force web browsers to download them (probably something to improve...). Then run the ansible commands to update them, ensuring that all steps are done automatically (proper restart of the services, regenerating the static files etc...).

For example, to update the backend:

cd kernelci-backend-config
ansible-playbook \
  -i hosts site.yml \
  -l api.kernelci.org \
  -D \
  -b \
  --ask-sudo-pass \
  --skip-tags=secrets \
  -t app \
  -e git_head=master

Update the Docker images

If any changes have been made to the Dockerfile definitions, build and push them all again to the registry. Wait until this has completed and be sure the latest images are available to be pulled before carrying on. (see https://github.com/kernelci/kernelci-core-staging/pull/94)

In doubt, this can be done every time as if there haven't been any changes the docker image builds should complete very quickly.

Update the root file systems

If there have been any changes to the debos recipes, or in order to get the latest version of the test suites, run a build of all the rootfs jobs (stretch, stretch-igt, stretch-v4l2...). Then update the test-configs.yaml file with the new URL for these file systems directly on the master branch. Once that has been done, the patch should be cherry-picked on the staging branch to keep them in sync. If any rootfs fails to build, keep the previous revision in test-configs.yaml and report the issue so it can be fixed for the next production update.

In doubt, this can be done every time. It can take about 1h to complete but it's usually a good idea to keep the test suites updated and built with the latest available revisions.

Run a "pipe cleaner" job

Before enabling the tree monitor again, it's important to run a final check to verify that all the kernels are building correctly and all the tests are running as expected. The test coverage on the staging instance doesn't allow building over 200 kernel variants like production does, and some labs are only available to run tests in production but not on staging.

This can be done using any individual's tree listed in kernelci-builds.yaml by updating a branch based on the latest stable that is known to be building and passing tests in production. Having a recent version is necessary to cover all the available hardware, and using a stable branch is necessary to avoid false positives (i.e. finding actual kernel problems rather than KernelCI infrastructure ones). It's however generally a good idea to build all the configs on all the architectures rather than the reduced set normally built on stable branches. For example, see gtucker_stable in build-configs.yaml.

The kernel tree monitor job can be scheduled manually with only one build config specified as a parameter (it requires enabling it to start the job then disabling it again, or having a copy of the job). Wait for all the builds and tests to complete, and all the emails to be received. The should typically be sent to a limited audience given the tree being built. If any issues arise, fix them if possible or revert changes in the code to be able to restart production shortly. Re-run parts or all of the pipe cleaner job after applying a fix to ensure things are working well before enabling the monitor job again.

Enable the tree monitor job again

Re-enable the tree monitor job, and manually start one to avoid waiting potentially for another hour until the next automated trigger occurs. Check that it works as expected and keep an eye on the results when they finally come in, to double check there hasn't been any regression introduced in spite of all the precautions explained above.