-
Notifications
You must be signed in to change notification settings - Fork 3
Server administration
Team members with their SSH keys installed on the server can add this to their ~/.ssh/config
file:
Host lzm
HostName monitoring.localzero.net
User monitoring
IdentityFile ~/.ssh/<private_key_file>
This enables easy login:
ssh lzm
We save nginx logs to /data/<production|testing>/<production|testing>-logs and can analyze them for usage. Those are a lot of logs (multiple GB), so make sure to adjust the commands you run to only use the data you're interested in.
E.g. for the production server, you can generate the unique IP addresses per day for October 2024 (note the 2024-10-*
part)
with this command:
egrep '^\d*.\d*.\d*.\d* ' /data/production/production-logs/2024-10-*/*.log \
| cut -d '"' -f 1,8 | sed -e 's/\[\(.*\):.*:.*:.*\]/\1/' \
| cut -d ' ' -f 4,5 | sort | uniq -c | cut -c 9-20 | sort | uniq -c \
> "usage-stats-$(date -I).txt"
A lot more detailed statistics are possible with varying the last command. Remove the created directory, when you are done.
Copy related files to the server:
scp -C -r docker/reverseproxy/conf.d/ docker/reverseproxy/docker-compose.yml docker/reverseproxy/.env.server [email protected]:/tmp/reverseproxy
On the server:
cd ~/reverseproxy
cp -r /tmp/reverseproxy .
mv .env.server .env
docker network create testing_nginx_network
docker network create production_nginx_network
docker compose up -d
Note that only project maintainers and admins are authorized to deploy.
To deploy a new version to the testing environment, create a pull request from main to deploy-to-testing
.
Merging the PR will trigger a deployment workflow that deploys the new version, imports the latest production data to the testing environment and applies migrations.
After deploying to testing, verify that the testing environment can be accessed in a browser, admin login and changing data works as expected, and new features work when deployed on the server.
Pull the production log and save in backup.
If everything looks good, deploy to production: create a PR from deploy-to-testing to deploy-to-production and merge it.
Important
Check that everything works in production after every deployment!
Alternatively, it is possible to test the new front-end currently developed on the branch react-frontend
on the testing system. In order to trigger this, create a pull request from react-frontend
to newfe-to-testing
and merge it.
Afterwards, deploy from the main
branch again as described above in section "Testing". This is important to prevent the accidental deployment of the new front-end to the production system.
In order to view, manipulate and export the database in any of the environments (local, testing, production), the database webclient Cloudbeaver is started automatically together with the application.
The client can be accessed at http://localhost/dbeaver (or http://monitoring-test.localzero.net/dbeaver, http://monitoring.localzero.net/dbeaver depending on the environment) and the credentials can be found in the .env.local file. For testing and production, the credentials should be configured in the respective .env files on the server.
We currently use a single TLS certificate for both monitoring.localzero.org and monitoring-test.localzero.org. The certificate is issued by letsencrypt.org and requesting and renewal is performed using acme.sh, which runs in a container. This solution allows us to have almost all necessary code and config in the repo instead of only on the server.
The initial certificate was issued using the following command:
docker exec acme-sh --issue -d monitoring-test.localzero.net -d monitoring.localzero.net --standalone --server https://acme-v02.api.letsencrypt.org/directory --fullchain-file /acme.sh/fullchain.cer --key-file /acme.sh/ssl-cert.key
Renewal is performed automatically by acme.sh's internal cron job, which...
- checks if a renewal is necessary, and if so:
- requests a new certificate from letsencrypt,
- performs the challenge-response-mechanism to verify ownership of the domain
- and exports the full certificate chain and key to where nginx can find it.
A reload of the nginx config is independently triggered every four hours by our own cron job which can be found in crontab or by executing the following on the server:
crontab -l
This job runs a script which applies the latest certificate that acme.sh has produced. This means there can be some delay between renewal and application of the certificate, but since acme.sh performs renewal a few days before expiry, there should be enough time for nginx to reload the certificate.
The configuration used by acme-sh's cronjob (not our nginx reload cronjob!), e.g. renewal interval, can be changed in `reverseproxy/ssl_certificates/monitoring-test.localzero.net_ecc/`` on the server.
The following commands might be executed on the server to debug and test the acme-sh configuration:
# view certificate creation date and next renew date
docker exec acme-sh --list
# tell acme-sh to run its cronjob now, using letsencrypt's test environment (to bypass rate limiting)
docker exec acme-sh --cron --staging
# tell acme-sh to run its cronjob now, using letsencrypt's PROD environment (affected by rate limiting - 5 certs every couple weeks...)
docker exec acme-sh --cron
# force a renewal via letsencrypt's PROD environment, even if renewal time hasn't been reached yet
docker exec acme-sh --cron --force
# change mail address that will receive expiry warnings (only one address supported as of acme.sh v3.0.6)
docker exec acme-sh --update-account --accountemail '<[email protected]>' --debug 2 --server https://acme-v02.api.letsencrypt.org/directory
When running locally, we instead use a certificate created for localhost. Since ownership of localhost cannot be certified, this is a single self-signed certificate instead of a full chain signed by a CA like on the server, and an exception must be added to your browser to trust it.
When the website isn't working as it should, first make sure all containers are running by SSHing into the server and executing docker ps
, which should give output similar to this:
monitoring:~% docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
36f7e245232a klimaschutzmonitor-dbeaver:testing "./run-server.sh" 6 minutes ago Up 3 minutes 0.0.0.0:32770->8978/tcp, :::32770->8978/tcp dbeaver-testing
595c571072ca nginxinc/nginx-unprivileged:alpine "/docker-entrypoint.…" 6 minutes ago Up 3 minutes 0.0.0.0:32771->8080/tcp, :::32771->8080/tcp nginx-testing
3b62a190e679 cpmonitor:testing "gunicorn --log-leve…" 6 minutes ago Up 3 minutes 8000/tcp djangoapp-testing
a1bfc5a6804a nginxinc/nginx-unprivileged:alpine "/docker-entrypoint.…" 12 minutes ago Up 3 minutes 0.0.0.0:32768->8080/tcp, :::32768->8080/tcp nginx-production
24429bdab24b klimaschutzmonitor-dbeaver:production "./run-server.sh" 12 minutes ago Up 3 minutes 0.0.0.0:32769->8978/tcp, :::32769->8978/tcp dbeaver-production
cfbeac603091 cpmonitor:production "gunicorn --log-leve…" 13 minutes ago Up 3 minutes 8000/tcp djangoapp-production
c5e42e81888a nginxinc/nginx-unprivileged:alpine "/docker-entrypoint.…" 2 hours ago Up 3 minutes 0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp, 8080/tcp reverse-proxy
81d2ffdd8e6d neilpang/acme.sh "/entry.sh daemon" 2 hours ago Up 3 minutes acme-sh
Check that the same 8 containers are running.
If some containers are missing, go the their respective directory (~/<testing|production|reverse_proxy
) and run docker-compose up --detach --no-build
to try and start them.
Probably the Docker Enginge was reset somehow. The deployments have to be repeated:
- Check that the last successfull deployments are from the current HEAD of
deploy-to-testing
anddeploy-to-production
:- In GitHub web interface go to Repo, then
Actions > Deploy
- There should be two successfull Action Runs for each of the branches.
- Check in git History / Graph that the two branches are still on the respective commits.
- In GitHub web interface go to Repo, then
- On
Actions > Deploy
selectRun workflow
and select firstdeploy-to-testing
. - Wait till finished and check if it is running.
- Then do the same with
deploy-to-production
.
This error can occur after restarting Docker containers or compositions, and looks like this:
Error response from daemon: driver failed programming external connectivity on endpoint nginx-testing
(70a3ed0163c3c86845b7ed8c89c8a38da7406647548566cfa68f30d75c63737f):
(iptables failed: iptables --wait -t filter -A DOCKER ! -i br-68c36a88c3e2 -o br-68c36a88c3e2 -p tcp -d 172.18.0.3 --dport 8080 -j ACCEPT:
iptables: No chain/target/match by that name.
Ask GermanZero IT admins to please restart the Docker daemon using sudo systemctl restart docker
. If you don't have their contact info, ask the other devs.
This can be caused by Docker tags disappearing after a system upgrade (a somewhat common Docker bug).
If tags like cpmonitor:production are missing, but a tag such as cpmonitor:production-2024-01-18-07-50-18 still exists, just recreate the missing tag using docker tag cpmonitor:production-2024-01-18-07-50-18 cpmonitor:production
.
If no other tags for the missing image remain, the image needs to be redeployed. Triggerring a deployment via GitHub Actions (see Deploying a new Version) might work, but worst case, you'll need to execute some of the steps in deploy.sh manually.
This is often caused by running docker-compose up --detach
without the --no-build
option.
Add the --no-build
option to your command.
When GermanZero IT staff upgrades the system our VM is running on, they often need to restart the VM. For some unknown reason, this makes the currently used Docker image tags disappear from our Docker installation. As a result, neither the testing nor the production app will be running after such an upgrade.
The staff will inform us by text chat before such upgrades, asking for a time slot in which to perform them.
After GermanZero IT staff has performed their upgrade, the last deployments to testing and production need to be re-run. To do this,
- navigate to Actions > Deploy
- find the latest successful run for the branch deploy-to-testing and click on it
- in the top right, click
Re-run all jobs
and confirm - ensure the run completes successfully
- check if https://monitoring-test.localzero.net/ is reachable
- repeat steps 2-5 with deploy-to-production and https://monitoring.localzero.net/