Skip to content

Commit

Permalink
systemd: add housekeeping service unit
Browse files Browse the repository at this point in the history
Problem: future housekeeping scripts should be run in a systemd cgroup
for reliable termination, logging to the local systemd journal, and
debugging using well known systemd tools.

Add a systemd "oneshot" service unit for housekeeping.  The service
unit runs a user-provided /etc/flux/system/housekeeping script.
It is configured so that 'systemctl start housekeeping' blocks until
the the run is complete and its exit code reflects the exit code of
the housekeeping script.

Add a helper script that can be configured as an IMP run command.
It runs 'systemctl start housekeeping' and traps SIGTERM, which can
be sent to enforce a timeout.  Upon receipt of SIGTERM, it stops the
unit and exits with a nonzero code.

To enable environment variables such as the FLUX_JOB_ID to be passed
into the user-provided housekeeping script via the systemd unit,
the helper script dumps its environment into /run/housekeeping.env,
which is read in by the unit.

Also of note: the user-provided scripts are automatically idempotent
when run this way.  systemd never starts multiple instances of the
unit.  If one is running when a start request is received, the second
start blocks until the existing run finishes and reports its status.
  • Loading branch information
garlick committed Jun 13, 2024
1 parent 6462184 commit ede3103
Show file tree
Hide file tree
Showing 5 changed files with 27 additions and 2 deletions.
2 changes: 2 additions & 0 deletions configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -599,6 +599,8 @@ AC_CONFIG_FILES( \
etc/flux-hostlist.pc \
etc/flux-taskmap.pc \
etc/flux.service \
etc/housekeeping.service \
src/cmd/flux-run-housekeeping \
doc/Makefile \
doc/test/Makefile \
t/Makefile \
Expand Down
4 changes: 3 additions & 1 deletion etc/Makefile.am
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
#if HAVE_SYSTEMD
systemdsystemunit_DATA = flux.service
systemdsystemunit_DATA = \
flux.service \
housekeeping.service
#endif

tmpfilesdir = $(prefix)/lib/tmpfiles.d
Expand Down
7 changes: 7 additions & 0 deletions etc/housekeeping.service.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[Unit]
Description=Node Maintenance for Flux

[Service]
Type=oneshot
EnvironmentFile=-@X_RUNSTATEDIR@/housekeeping.env
ExecStart=@X_SYSCONFDIR@/flux/system/housekeeping
3 changes: 2 additions & 1 deletion src/cmd/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,8 @@ dist_fluxcmd_SCRIPTS = \
flux-imp-exec-helper \
py-runner.py \
flux-hostlist.py \
flux-post-job-event.py
flux-post-job-event.py \
flux-run-housekeeping

fluxcmd_PROGRAMS = \
flux-terminus \
Expand Down
13 changes: 13 additions & 0 deletions src/cmd/flux-run-housekeeping.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/bin/sh

terminate() {
systemctl stop housekeeping
exit 1
}

trap terminate INT TERM

umask 022
printenv >@X_RUNSTATEDIR@/housekeeping.env

systemctl start housekeeping --quiet

0 comments on commit ede3103

Please sign in to comment.