-
Notifications
You must be signed in to change notification settings - Fork 4
Workflow
Changes in kernelci-core
are first made in kernelci-core-staging
and tested on staging.kernelci.org. Then a PR is made to merge these changes with the production repository kernelci-core
which is used directly by Jenkins.
First, create a fork of kernelci-core
. Then clone it and add the staging
and prod
remotes:
git clone [email protected]:<user-name>/kernelci-core.git
cd kernelci-core
git remote add staging [email protected]:kernelci/kernelci-core-staging.git
git remote update staging
git checkout staging/master -b staging-master
git remote add prod [email protected]:kernelci/kernelci-core.git
git remote update prod
Then create a new staging tag (replacing YYYYMMDD with the actual date), start a PR branch from it and rebase all the changes since the previous tag onto prod:
git checkout staging-master
git pull --ff-only
git tag -a staging-YYYYMMDD -m staging-YYYYMMDD
git push staging staging-YYYYMMDD
git checkout -b pr/staging-YYYYMMDD
git remote update prod
git rebase -i staging-<old-tag> --onto=prod/master
At this point it's worth comparing the staging branch for the PR with the staging-master branch to check all the changes are in the PR:
git diff staging-master staging-YYYYMMDD
If it all looks OK then push it to your forked repo:
git push origin HEAD:staging-YYYYMMDD
Then visit your kernelci-core
fork and create a PR as usual from there. Add a description with a summary of the changes in the PR.
Updating the code used in production can be disruptive, especially as kernelci-backend, kernelci-frontend and kernelci-core may need to be kept in sync and updated all together. The Jenkins jobs configuration may also need to be adjusted to match kernelci-core changes (list of parameters, overall jobs configuration). Several things need to be done in order to maximise the continuity of the kernelci.org service, as explained below. These things should gradually become either automated or simplified to reduce the amount of manual work and chances of missing a step.
The kernel-tree-monitor
job periodically looks for new changes in all the kernel trees. As updating the production code should be done atomically for all the components, it's necessary to wait until all the Jenkins jobs have completed. This can be achieved by pausing the monitor job and then waiting for any on-going or queued job to complete in Jenkins.
Ideally, all the LAVA jobs should also complete in case updating the kernelci-backend may cause some callbacks to fail. The email reports should also ideally all be sent before updating the kernelci-backend code, although scheduled reports will remain in the queue even if the code is updated. In practice, if the kernelci-backend code doesn't need to be updated or if the API hasn't changed, given the short downtime caused by the update it is typically fine to not wait for these things to complete (and which may take several hours).
As explained above, a PR should be created to synchronise the staging branch with the production one. Once the Jenkins jobs are all done, it's time to merge it.
Go through all the Jenkins jobs and update their configuration if necessary. This should be explained in the PR release notes, but it depends on each case. (ToDo: keep Jenkins jobs definitions in Git to automate that)
If there have been any kernelci-backend or kernelci-frontend changes, ensure that a new version has been created with a tag. Also double check that the modified Javascript files in kernelci-frontend have a new dated name to force web browsers to download them (probably something to improve...). Then run the ansible
commands to update them, ensuring that all steps are done automatically (proper restart of the services, regenerating the static files etc...).
For example, to update the backend:
cd kernelci-backend-config
ansible-playbook \
-i hosts site.yml \
-l api.kernelci.org \
-D \
-b \
--ask-sudo-pass \
--skip-tags=secrets \
-t app \
-e git_head=master
If any changes have been made to the Dockerfile
definitions, build and push them all again to the registry. Wait until this has completed and be sure the latest images are available to be pulled before carrying on. (see https://github.com/kernelci/kernelci-core-staging/pull/94)
In doubt, this can be done every time as if there haven't been any changes the docker image builds should complete very quickly.
If there have been any changes to the debos
recipes, or in order to get the latest version of the test suites, run a build of all the rootfs jobs (stretch
, stretch-igt
, stretch-v4l2
...). Then update the test-configs.yaml
file with the new URL for these file systems directly on the master
branch. Once that has been done, the patch should be cherry-picked on the staging branch to keep them in sync. If any rootfs fails to build, keep the previous revision in test-configs.yaml
and report the issue so it can be fixed for the next production update.
In doubt, this can be done every time. It can take about 1h to complete but it's usually a good idea to keep the test suites updated and built with the latest available revisions.
Before enabling the tree monitor again, it's important to run a final check to verify that all the kernels are building correctly and all the tests are running as expected. The test coverage on the staging instance doesn't allow building over 200 kernel variants like production does, and some labs are only available to run tests in production but not on staging.
This can be done using any individual's tree listed in kernelci-builds.yaml
by updating a branch based on the latest stable that is known to be building and passing tests in production. Having a recent version is necessary to cover all the available hardware, and using a stable branch is necessary to avoid false positives (i.e. finding actual kernel problems rather than KernelCI infrastructure ones). It's however generally a good idea to build all the configs on all the architectures rather than the reduced set normally built on stable branches. For example, see gtucker_stable
in build-configs.yaml
.
Wait for all the builds and tests to complete, and all the emails to be received (typically to a limited audience given the tree being built). If any issues arise, fix them if possible or revert changes in the code to be able to restart production shortly. Re-run parts of the pipe cleaner job to ensure things are working well before enabling the monitor job again.
Re-enable the tree monitor job, and manually start one now to avoid waiting potentially another hour until the next automated trigger. Check that it works as expected and keep an eye on the results when they finally come in, to double check there hasn't been any regression introduced in spite of all the precautions explained above.
KernelCI has currently a lot of moving parts and all the information in this wiki is work in progress. This is a public wiki, feel free to contribute!