-
Notifications
You must be signed in to change notification settings - Fork 4
Workflow
The master
branch in kernelci-core
is used as a development branch. The code used in production can be found on the kernelci.org
branch, which gets updated manually typically once a week. This procedure is described below.
Updating the code used in production can be disruptive, especially as kernelci-backend, kernelci-frontend and kernelci-core may need to be kept in sync and updated all together. The Jenkins jobs configuration may also need to be adjusted to match kernelci-core changes (list of parameters, overall jobs configuration). Several things need to be done in order to maximise the continuity of the kernelci.org service, as explained below. These things should gradually become either automated or simplified to reduce the amount of manual work and chances of missing a step.
This is a simple step to keep track of which versions get put into production. So create a kernelci-yyyyddmm
tag with the date on the master
branch in the kernelci-core
project. Similarly, create a new version tag if needed in kernelci-backend
and kernelci-frontend
(see previous version updates in the history).
Send an email to the kernelci.org mailing list with a summary of all the changes going into production. This first step should be done at least a day ahead of time in order give a chance to anyone to comment on it before rolling things out into production.
The kernel-tree-monitor
job periodically looks for new changes in all the kernel trees. As updating the production code should be done atomically for all the components, it's necessary to wait until all the Jenkins jobs have completed. This can be achieved by pausing the monitor job and then waiting for any on-going or queued job to complete in Jenkins.
Ideally, all the LAVA jobs should also complete in case updating the kernelci-backend may cause some callbacks to fail. The email reports should also ideally all be sent before updating the kernelci-backend code, although scheduled reports will remain in the queue even if the code is updated. In practice, if the kernelci-backend code doesn't need to be updated or if the API hasn't changed, given the short downtime caused by the update it is typically fine to not wait for these things to complete (and which may take several hours).
Push the tagged revision from kernelci-core master branch to the kernelci.org
branch used in production. This branch should match exactly, so if for any reason the history was not linear then the kernelci.org
branch needs to be force-pushed.
Go through all the Jenkins jobs and update their configuration if necessary (job params...). This should be explained in the PR release notes, but it depends on each case. (ToDo: keep Jenkins jobs definitions in Git to automate that)
If there have been any kernelci-backend or kernelci-frontend changes, ensure that a new version has been created with a tag. Also double check that the modified Javascript files in kernelci-frontend have a new dated name to force web browsers to download them (probably something to improve...). Then run the ansible
commands to update them, ensuring that all steps are done automatically (proper restart of the services, regenerating the static files etc...).
For example, to update the backend:
cd kernelci-backend-config
ansible-playbook \
-i hosts site.yml \
-l api.kernelci.org \
-D \
-b \
--ask-sudo-pass \
--skip-tags=secrets \
-t app \
-e git_head=master
If any changes have been made to the Dockerfile
definitions, build and push them all again to the registry. Wait until this has completed and be sure the latest images are available to be pulled before carrying on. (see https://github.com/kernelci/kernelci-core-staging/pull/94)
In doubt, this can be done every time as if there haven't been any changes the docker image builds should complete very quickly.
If there have been any changes to the debos
recipes, or in order to get the latest version of the test suites, run a build of all the rootfs jobs (stretch
, stretch-igt
, stretch-v4l2
...). Then update the test-configs.yaml
file with the new URL for these file systems directly on the master
branch. Once that has been done, the patch should be cherry-picked on the staging branch to keep them in sync. If any rootfs fails to build, keep the previous revision in test-configs.yaml
and report the issue so it can be fixed for the next production update.
In doubt, this can be done every time. It can take about 1h to complete but it's usually a good idea to keep the test suites updated and built with the latest available revisions.
Before enabling the tree monitor again, it's important to run a final check to verify that all the kernels are building correctly and all the tests are running as expected. The test coverage on the staging instance doesn't allow building over 200 kernel variants like production does, and some labs are only available to run tests in production but not on staging.
This can be done using any individual's tree listed in kernelci-builds.yaml
by updating a branch based on the latest stable that is known to be building and passing tests in production. Having a recent version is necessary to cover all the available hardware, and using a stable branch is necessary to avoid false positives (i.e. finding actual kernel problems rather than KernelCI infrastructure ones). It's however generally a good idea to build all the configs on all the architectures rather than the reduced set normally built on stable branches. For example, see gtucker_stable
in build-configs.yaml
.
The kernel tree monitor job can be scheduled manually with only one build config specified as a parameter (it requires enabling it to start the job then disabling it again, or having a copy of the job). Wait for all the builds and tests to complete, and all the emails to be received. The should typically be sent to a limited audience given the tree being built. If any issues arise, fix them if possible or revert changes in the code to be able to restart production shortly. Re-run parts or all of the pipe cleaner job after applying a fix to ensure things are working well before enabling the monitor job again.
Re-enable the tree monitor job, and manually start one to avoid waiting potentially for another hour until the next automated trigger occurs. Check that it works as expected and keep an eye on the results when they finally come in, to double check there hasn't been any regression introduced in spite of all the precautions explained above.
KernelCI has currently a lot of moving parts and all the information in this wiki is work in progress. This is a public wiki, feel free to contribute!