This document explains how to manage the QMK Compile service. It is assumed that you have access to the Rancher instance powering the QMK API. If you are using this guide to manage your personal installation you may need to make adjustments for your environment.
The QMK infrastructure is built on top of Rancher 1.6. We use github authentication for access to the environment. When you were given access you were also given a "farmhouse" URL- this is the main entrypoint into our operational infrastructure.
The "Stacks" tab at the top is where you'll spend most of your time. If this is empty you may need to select the "Default" environment from the dropdown to the left of Stacks.
The QMK API services are in the qmk-api
group. Please do not do anything with the other groups, they are Clueboard or infrastructure related and should not be touched without consulting @skullydazed.
This section will provide a brief overview of the services.
api
- The HTTP frontend to
api.qmk.fm
. All HTTP/HTTPS requests from the internet are handled here. - https://github.com/qmk/qmk_api
- The HTTP frontend to
bot
- HTTP frontend for the bot, only used to receive webhooks from github.
- https://github.com/qmk/qmk_bot
compiler
- Compiles a keymap for users. Pulls compile jobs from RQ, does the work, and then uploads the result to S3.
- https://github.com/qmk/qmk_compiler
redis
- Key/value store. Used to store API data and as a queue for RQ.
- https://redis.io/
- https://github.com/qmk/qmk_api_redis
tasks
- Service that performs ongoing tasks to keep the API healthy.
- https://github.com/qmk/qmk_api_tasks
For all of the services below you can drill down into them by clicking the service name from the dashboard. This will show you all the containers running that particular service. Each container has a hamburger menu, and from this menu you can do two of the most useful things- View Logs and Execute Shell.
Keep in mind that these are minimal docker containers. You may need to apt update; apt install procps
to do some basic troubleshooting.
This is a standard flask service, powered by gunicorn. You can tune the gunicorn startup parameters by "Upgrading" the service and setting these environment variables:
MAX_REQUESTS=1000
- Passed to
--max-requests
- Passed to
MAX_REQUESTS_JITTER=100
- Passed to
--max-requests-jitter
- Passed to
NUM_WORKERS=8
- Passed to
-w
- Passed to
TIMEOUT=60
- Passed to
-t
- Passed to
This is a standard flask service, powered by gunicorn. There are no tunables. Mostly this services chugs along unnoticed.
Most of your troubleshooting time will be spent at the shell for this service. Drill down to the service, click the hamburger menu for any one of the running containers, and "Execute Shell". From here you have access to several scripts to help you manage the compile job queue.
Currently there's no easy way to see what job each compile node is running. For that you have to go down the list of containers and look at the log to see what the last entry is.
This script lets you test the compiling infrastructure even without the HTTP frontend being up. It will insert a compile job and monitor its progress so you can make sure everything is working.
This script lists the jobs that are waiting to be compiled, grouped by IP. This does not show any jobs that are currently being compiled.
This script lists the jobs that are waiting to run. This does not show any jobs that are currently being compiled.
This script will cancel a single job, by job_id. The job_id is the GUID printed out by ls_jobs
.
This will set qmk_needs_update=True
in redis, which will cause qmk_api_tasks
to refresh the API data.
This will refresh the API data immediately. This is needed in some bootstrap situations after redis has been restarted.
This will flush the queue and delete all pending compiles. This will break anyone who is currently using QMK Configurator, please do not do this lightly.
Redis is our key/value store and queue for RQ. We have a custom configuration here: https://github.com/qmk/qmk_api_redis
If you need to restart redis for any reason the API data will need to be repopulated. Follow this procedure to restart redis:
- Pause the
api
service by clicking the service hamburger menu and selecting "Stop". The API will begin to return 500 errors. - In a new tab open up a shell on one of the
compiler
containers. Keep this shell open. - Run
./flush_queue
in the shell - Watch the logs of the
compiler
instances, wait until they're all idle - Run
python3 ./update_kb_redis.py
in the shell - Wait for the API data to be generated
- Use
./test_keyboard clueboard/66/rev3
to make sure the API data is good - Unpause the
api
service by clicking the service hamburger menu and selecting "Start". The API will resume working shortly. - Monitor the queue with
ls_jobs
to make sure things stay healthy.
The tasks
service does 3 main things- it inserts the update_qmk_firmware
job, it inserts the s3_cleanup
job, and it continually tests both infrastructure and keyboards to see if both are working. The results of these test compiles can be viewed here: https://yanfali.github.io/qmk_error_page/