Skip to content

Server administration

Felix Lampe edited this page Dec 18, 2024 · 17 revisions

SSH access

Team members with their SSH keys installed on the server can add this to their ~/.ssh/config file:

Host lzm
    HostName monitoring.localzero.net
    User monitoring
    IdentityFile ~/.ssh/<private_key_file>

This enables easy login:

ssh lzm

Get usage statistics

We save nginx logs to /data/<production|testing>/<production|testing>-logs and can analyze them for usage. Those are a lot of logs (multiple GB), so make sure to adjust the commands you run to only use the data you're interested in.

E.g. for the production server, you can generate the unique IP addresses per day for October 2024 (note the 2024-10-* part) with this command:

egrep '^\d*.\d*.\d*.\d* ' /data/production/production-logs/2024-10-*/*.log \
  | cut -d '"' -f 1,8 | sed -e 's/\[\(.*\):.*:.*:.*\]/\1/' \
  | cut -d ' ' -f 4,5 | sort | uniq -c | cut -c 9-20 | sort | uniq -c \
  > "usage-stats-$(date -I).txt"

A lot more detailed statistics are possible with varying the last command. Remove the created directory, when you are done.

Deploying the reverse proxy initially

Copy related files to the server:

scp -C -r docker/reverseproxy/conf.d/ docker/reverseproxy/docker-compose.yml docker/reverseproxy/.env.server [email protected]:/tmp/reverseproxy

On the server:

cd ~/reverseproxy
cp -r /tmp/reverseproxy .
mv .env.server .env

docker network create testing_nginx_network
docker network create production_nginx_network
docker compose up -d

Deploying a new version

Note that only project maintainers and admins are authorized to deploy.

Testing

To deploy a new version to the testing environment, create a pull request from main to deploy-to-testing. Merging the PR will trigger a deployment workflow that deploys the new version, imports the latest production data to the testing environment and applies migrations.

Production

After deploying to testing, verify that the testing environment can be accessed in a browser, admin login and changing data works as expected, and new features work when deployed on the server.

Pull the production log and save in backup.

If everything looks good, deploy to production: create a PR from deploy-to-testing to deploy-to-production and merge it.

Important

Check that everything works in production after every deployment!

The New Front-End on Testing

Alternatively, it is possible to test the new front-end currently developed on the branch react-frontend on the testing system. In order to trigger this, create a pull request from react-frontend to newfe-to-testing and merge it.

Afterwards, deploy from the main branch again as described above in section "Testing". This is important to prevent the accidental deployment of the new front-end to the production system.

Database Client

In order to view, manipulate and export the database in any of the environments (local, testing, production), the database webclient Cloudbeaver is started automatically together with the application.

The client can be accessed at http://localhost/dbeaver (or http://monitoring-test.localzero.net/dbeaver, http://monitoring.localzero.net/dbeaver depending on the environment) and the credentials can be found in the .env.local file. For testing and production, the credentials should be configured in the respective .env files on the server.

TLS Certificate and Renewal

Overview

We currently use a single TLS certificate for both monitoring.localzero.org and monitoring-test.localzero.org. The certificate is issued by letsencrypt.org and requesting and renewal is performed using acme.sh, which runs in a container. This solution allows us to have almost all necessary code and config in the repo instead of only on the server.

Initial Issuance

The initial certificate was issued using the following command:

docker exec acme-sh  --issue -d monitoring-test.localzero.net  -d monitoring.localzero.net --standalone --server https://acme-v02.api.letsencrypt.org/directory --fullchain-file /acme.sh/fullchain.cer --key-file /acme.sh/ssl-cert.key

Renewal

Renewal is performed automatically by acme.sh's internal cron job, which...

  • checks if a renewal is necessary, and if so:
  • requests a new certificate from letsencrypt,
  • performs the challenge-response-mechanism to verify ownership of the domain
  • and exports the full certificate chain and key to where nginx can find it.

A reload of the nginx config is independently triggered every four hours by our own cron job which can be found in crontab or by executing the following on the server:

crontab -l

This job runs a script which applies the latest certificate that acme.sh has produced. This means there can be some delay between renewal and application of the certificate, but since acme.sh performs renewal a few days before expiry, there should be enough time for nginx to reload the certificate.

acme-sh Configuration and Debugging

The configuration used by acme-sh's cronjob (not our nginx reload cronjob!), e.g. renewal interval, can be changed in `reverseproxy/ssl_certificates/monitoring-test.localzero.net_ecc/`` on the server.

The following commands might be executed on the server to debug and test the acme-sh configuration:

# view certificate creation date and next renew date
docker exec acme-sh --list

# tell acme-sh to run its cronjob now, using letsencrypt's test environment (to bypass rate limiting)
docker exec acme-sh --cron --staging

# tell acme-sh to run its cronjob now, using letsencrypt's PROD environment (affected by rate limiting - 5 certs every couple weeks...)
docker exec acme-sh --cron

# force a renewal via letsencrypt's PROD environment, even if renewal time hasn't been reached yet
docker exec acme-sh --cron --force

# change mail address that will receive expiry warnings (only one address supported as of acme.sh v3.0.6)
docker exec acme-sh --update-account --accountemail '<[email protected]>' --debug 2 --server https://acme-v02.api.letsencrypt.org/directory

TLS Certificates and Running Locally

When running locally, we instead use a certificate created for localhost. Since ownership of localhost cannot be certified, this is a single self-signed certificate instead of a full chain signed by a CA like on the server, and an exception must be added to your browser to trust it.

Troubleshooting

When the website isn't working as it should, first make sure all containers are running by SSHing into the server and executing docker ps, which should give output similar to this:

monitoring:~% docker ps
CONTAINER ID   IMAGE                                   COMMAND                  CREATED          STATUS         PORTS                                                                                NAMES
36f7e245232a   klimaschutzmonitor-dbeaver:testing      "./run-server.sh"        6 minutes ago    Up 3 minutes   0.0.0.0:32770->8978/tcp, :::32770->8978/tcp                                          dbeaver-testing
595c571072ca   nginxinc/nginx-unprivileged:alpine      "/docker-entrypoint.…"   6 minutes ago    Up 3 minutes   0.0.0.0:32771->8080/tcp, :::32771->8080/tcp                                          nginx-testing
3b62a190e679   cpmonitor:testing                       "gunicorn --log-leve…"   6 minutes ago    Up 3 minutes   8000/tcp                                                                             djangoapp-testing
a1bfc5a6804a   nginxinc/nginx-unprivileged:alpine      "/docker-entrypoint.…"   12 minutes ago   Up 3 minutes   0.0.0.0:32768->8080/tcp, :::32768->8080/tcp                                          nginx-production
24429bdab24b   klimaschutzmonitor-dbeaver:production   "./run-server.sh"        12 minutes ago   Up 3 minutes   0.0.0.0:32769->8978/tcp, :::32769->8978/tcp                                          dbeaver-production
cfbeac603091   cpmonitor:production                    "gunicorn --log-leve…"   13 minutes ago   Up 3 minutes   8000/tcp                                                                             djangoapp-production
c5e42e81888a   nginxinc/nginx-unprivileged:alpine      "/docker-entrypoint.…"   2 hours ago      Up 3 minutes   0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp, 8080/tcp   reverse-proxy
81d2ffdd8e6d   neilpang/acme.sh                        "/entry.sh daemon"       2 hours ago      Up 3 minutes                                                                                        acme-sh

Check that the same 8 containers are running.

If some containers are missing, go the their respective directory (~/<testing|production|reverse_proxy) and run docker-compose up --detach --no-build to try and start them.

Known Issues and Solutions

The required images are missing and the docker-compose up fails

Probably the Docker Enginge was reset somehow. The deployments have to be repeated:

  • Check that the last successfull deployments are from the current HEAD of deploy-to-testing and deploy-to-production:
    • In GitHub web interface go to Repo, then Actions > Deploy
    • There should be two successfull Action Runs for each of the branches.
    • Check in git History / Graph that the two branches are still on the respective commits.
  • On Actions > Deploy select Run workflow and select first deploy-to-testing.
  • Wait till finished and check if it is running.
  • Then do the same with deploy-to-production.

Docker complains about missing iptables rules

This error can occur after restarting Docker containers or compositions, and looks like this:

Error response from daemon: driver failed programming external connectivity on endpoint nginx-testing
(70a3ed0163c3c86845b7ed8c89c8a38da7406647548566cfa68f30d75c63737f):
(iptables failed: iptables --wait -t filter -A DOCKER ! -i br-68c36a88c3e2 -o br-68c36a88c3e2 -p tcp -d 172.18.0.3 --dport 8080 -j ACCEPT: 
iptables: No chain/target/match by that name.

Solution

Ask GermanZero IT admins to please restart the Docker daemon using sudo systemctl restart docker. If you don't have their contact info, ask the other devs.

Docker tries to pull one of our custom images (cpmonitor, klimaschutzmonitor-dbeaver)

This can be caused by Docker tags disappearing after a system upgrade (a somewhat common Docker bug).

Solution

If tags like cpmonitor:production are missing, but a tag such as cpmonitor:production-2024-01-18-07-50-18 still exists, just recreate the missing tag using docker tag cpmonitor:production-2024-01-18-07-50-18 cpmonitor:production. If no other tags for the missing image remain, the image needs to be redeployed. Triggerring a deployment via GitHub Actions (see Deploying a new Version) might work, but worst case, you'll need to execute some of the steps in deploy.sh manually.

Docker tries to build one of our custom images (cpmonitor, klimaschutzmonitor-dbeaver)

This is often caused by running docker-compose up --detach without the --no-build option.

Solution

Add the --no-build option to your command.

Apps not running after VM upgrade by GermanZero IT staff

When GermanZero IT staff upgrades the system our VM is running on, they often need to restart the VM. For some unknown reason, this makes the currently used Docker image tags disappear from our Docker installation. As a result, neither the testing nor the production app will be running after such an upgrade.

The staff will inform us by text chat before such upgrades, asking for a time slot in which to perform them.

Solution

After GermanZero IT staff has performed their upgrade, the last deployments to testing and production need to be re-run. To do this,

  1. navigate to Actions > Deploy
  2. find the latest successful run for the branch deploy-to-testing and click on it
  3. in the top right, click Re-run all jobs and confirm
  4. ensure the run completes successfully
  5. check if https://monitoring-test.localzero.net/ is reachable
  6. repeat steps 2-5 with deploy-to-production and https://monitoring.localzero.net/