Skip to content

Commit

Permalink
feat: shutdown race resilience
Browse files Browse the repository at this point in the history
A significant rewrite to ensure that we don't suffer from shutdown race
conditions as the prune condition is met and additional resources are
being created.

Previously this would remove resources that were still in use, now we
retry if we detect new resources have been created within a window of
the prune condition triggering.

This supports the following new environment configuration settings:
- RYUK_REMOVE_RETRIES - The number of times to retry removing a resource.
- RYUK_REQUEST_TIMEOUT - The timeout for any Docker requests.
- RYUK_RETRY_OFFSET - The offset added to the start time of the prune
  pass that is used as the minimum resource creation time.
- RYUK_SHUTDOWN_TIMEOUT - The duration after shutdown has been requested
  when the remaining connections are ignored and prune checks start.

Update README to correct example, as health is only valid for containers
not the other resources, so would cause failures.

Enable race detection on CI tests.
  • Loading branch information
stevenh committed Oct 4, 2024
1 parent 8aef324 commit 40b4aed
Show file tree
Hide file tree
Showing 15 changed files with 1,706 additions and 742 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/build-docker-image.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
run: go build

- name: go-test
run: go test -v ./...
run: go test -race -v ./...

test-windows:
runs-on: windows-2022
Expand Down
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,9 @@

vendor/
bin/

# Binary
moby-ryuk

# VS Code
.vscode
89 changes: 63 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,80 @@
# Moby Ryuk

This project helps you to remove containers/networks/volumes/images by given filter after specified delay.
This project helps you to remove containers, networks, volumes and images by given filter after specified delay.

# Usage
## Building

1. Start it:
To build the binary only run:

$ RYUK_PORT=8080 ./bin/moby-ryuk
$ # You can also run it with Docker
$ docker run -v /var/run/docker.sock:/var/run/docker.sock -e RYUK_PORT=8080 -p 8080:8080 testcontainers/ryuk:0.9.0
```shell
go build
```

1. Connect via TCP:
To build the Linux docker container as the latest tag:

$ nc localhost 8080
```shell
docker build -f linux/Dockerfile -t testcontainers/ryuk:latest .
```

1. Send some filters:
## Usage

label=testing=true&health=unhealthy
ACK
label=something
ACK
To start it using the default settings:

1. Close the connection
```shell
docker run -v /var/run/docker.sock:/var/run/docker.sock -p 8080:8080 testcontainers/ryuk:latest
```

1. Send more filters with "one-off" style:
If you want to test local changes with the default settings:

printf "label=something_else" | nc localhost 8080
```shell
go run .
```

1. See containers/networks/volumes deleted after 10s:
You can then simulate a connection from testcontainer container using:

2018/01/15 18:38:52 Timed out waiting for connection
2018/01/15 18:38:52 Deleting {"label":{"something":true}}
2018/01/15 18:38:52 Deleting {"label":{"something_else":true}}
2018/01/15 18:38:52 Deleting {"health":{"unhealthy":true},"label":{"testing=true":true}}
2018/01/15 18:38:52 Removed 1 container(s), 0 network(s), 0 volume(s), 0 image(s)
```shell
nc -N localhost 8080 << EOF
label=testing=true&label=testing.sessionid=mysession
label=something
EOF
```

You can send additional session information for monitoring using:

```shell
printf "label=something_else" | nc -N localhost 8080
```

In the ryuk window you'll see containers/networks/volumes deleted after 10s

```log
time=2024-09-30T19:42:30.000+01:00 level=INFO msg=starting connection_timeout=1m0s reconnection_timeout=10s request_timeout=10s shutdown_timeout=10m0s remove_retries=10 retry_offset=-1s port=8080 verbose=false
time=2024-09-30T19:42:30.001+01:00 level=INFO msg=listening address=[::]:8080
time=2024-09-30T19:42:30.001+01:00 level=INFO msg="client processing started"
time=2024-09-30T19:42:38.002+01:00 level=INFO msg="client connected" address=127.0.0.1:56432 clients=1
time=2024-09-30T19:42:38.002+01:00 level=INFO msg="adding filter" type=label values="[testing=true testing.sessionid=mysession]"
time=2024-09-30T19:42:38.002+01:00 level=INFO msg="adding filter" type=label values=[something]
time=2024-09-30T19:42:38.002+01:00 level=INFO msg="client disconnected" address=127.0.0.1:56432 clients=0
time=2024-09-30T19:42:42.047+01:00 level=INFO msg="adding filter" type=label values=[something_else]
time=2024-09-30T19:42:42.047+01:00 level=INFO msg="client connected" address=127.0.0.1:56434 clients=1
time=2024-09-30T19:42:42.047+01:00 level=INFO msg="client disconnected" address=127.0.0.1:56434 clients=0
time=2024-09-30T19:42:52.051+01:00 level=INFO msg="prune check" clients=0
time=2024-09-30T19:42:52.216+01:00 level=INFO msg="client processing stopped"
time=2024-09-30T19:42:52.216+01:00 level=INFO msg=removed containers=0 networks=0 volumes=0 images=0
time=2024-09-30T19:42:52.216+01:00 level=INFO msg=done
```

## Ryuk configuration

- `RYUK_CONNECTION_TIMEOUT` - Environment variable that defines the timeout for Ryuk to receive the first connection (default: 60s). Value layout is described in [time.ParseDuration](https://golang.org/pkg/time/#ParseDuration) documentation.
- `RYUK_PORT` - Environment variable that defines the port where Ryuk will be bound to (default: 8080).
- `RYUK_RECONNECTION_TIMEOUT` - Environment variable that defines the timeout for Ryuk to reconnect to Docker (default: 10s). Value layout is described in [time.ParseDuration](https://golang.org/pkg/time/#ParseDuration) documentation.
- `RYUK_VERBOSE` - Environment variable that defines if Ryuk should print debug logs (default: false).
The following environment variables can be configured to change the behaviour:

| Environment Variable | Default | Format | Description |
| --------------------------- | ------- | ------- | ------------ |
| `RYUK_CONNECTION_TIMEOUT` | `60s` | [Duration](https://golang.org/pkg/time/#ParseDuration) | The duration without receiving any connections which will trigger a shutdown |
| `RYUK_PORT` | `8080` | `uint16` | The port to listen on for connections |
| `RYUK_RECONNECTION_TIMEOUT` | `10s` | [Duration](https://golang.org/pkg/time/#ParseDuration) | The duration after the last connection closes which will trigger resource clean up and shutdown |
| `RYUK_REQUEST_TIMEOUT` | `10s` | [Duration](https://golang.org/pkg/time/#ParseDuration) | The timeout for any Docker requests |
| `RYUK_REMOVE_RETRIES` | `10` | `int` | The number of times to retry removing a resource |
| `RYUK_RETRY_OFFSET` | `-1s` | [Duration](https://golang.org/pkg/time/#ParseDuration) | The offset added to the start time of the prune pass that is used as the minimum resource creation time. Any resource created after this calculated time will trigger a retry to ensure in use resources are not removed |
| `RYUK_VERBOSE` | `false` | `bool` | Whether to enable verbose aka debug logging |
| `RYUK_SHUTDOWN_TIMEOUT` | `10m` | [Duration](https://golang.org/pkg/time/#ParseDuration) | The duration after shutdown has been requested when the remaining connections are ignored and prune checks start |
14 changes: 14 additions & 0 deletions config.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,17 @@ type config struct {
// resource clean up and shutdown.
ReconnectionTimeout time.Duration `env:"RYUK_RECONNECTION_TIMEOUT" envDefault:"10s"`

// RequestTimeout is the timeout for any Docker requests.
RequestTimeout time.Duration `env:"RYUK_REQUEST_TIMEOUT" envDefault:"10s"`

// RemoveRetries is the number of times to retry removing a resource.
RemoveRetries int `env:"RYUK_REMOVE_RETRIES" envDefault:"10"`

// RetryOffset is the offset added to the start time of the prune pass that is
// used as the minimum resource creation time. Any resource created after this
// calculated time will trigger a retry to ensure in use resources are not removed.
RetryOffset time.Duration `env:"RYUK_RETRY_OFFSET" envDefault:"-1s"`

// ShutdownTimeout is the maximum amount of time the reaper will wait
// for once signalled to shutdown before it terminates even if connections
// are still established.
Expand All @@ -34,7 +45,10 @@ func (c config) LogAttrs() []slog.Attr {
return []slog.Attr{
slog.Duration("connection_timeout", c.ConnectionTimeout),
slog.Duration("reconnection_timeout", c.ReconnectionTimeout),
slog.Duration("request_timeout", c.RequestTimeout),
slog.Duration("shutdown_timeout", c.ShutdownTimeout),
slog.Int("remove_retries", c.RemoveRetries),
slog.Duration("retry_offset", c.RetryOffset),
slog.Int("port", int(c.Port)),
slog.Bool("verbose", c.Verbose),
}
Expand Down
12 changes: 12 additions & 0 deletions config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,9 @@ func Test_loadConfig(t *testing.T) {
ConnectionTimeout: time.Minute,
ReconnectionTimeout: time.Second * 10,
ShutdownTimeout: time.Minute * 10,
RemoveRetries: 10,
RequestTimeout: time.Second * 10,
RetryOffset: -time.Second,
}

cfg, err := loadConfig()
Expand All @@ -47,13 +50,19 @@ func Test_loadConfig(t *testing.T) {
t.Setenv("RYUK_RECONNECTION_TIMEOUT", "3s")
t.Setenv("RYUK_SHUTDOWN_TIMEOUT", "7s")
t.Setenv("RYUK_VERBOSE", "true")
t.Setenv("RYUK_REQUEST_TIMEOUT", "4s")
t.Setenv("RYUK_REMOVE_RETRIES", "5")
t.Setenv("RYUK_RETRY_OFFSET", "-6s")

expected := config{
Port: 1234,
ConnectionTimeout: time.Second * 2,
ReconnectionTimeout: time.Second * 3,
ShutdownTimeout: time.Second * 7,
Verbose: true,
RemoveRetries: 5,
RequestTimeout: time.Second * 4,
RetryOffset: -time.Second * 6,
}

cfg, err := loadConfig()
Expand All @@ -67,6 +76,9 @@ func Test_loadConfig(t *testing.T) {
"RYUK_RECONNECTION_TIMEOUT",
"RYUK_SHUTDOWN_TIMEOUT",
"RYUK_VERBOSE",
"RYUK_REQUEST_TIMEOUT",
"RYUK_REMOVE_RETRIES",
"RYUK_RETRY_OFFSET",
} {
t.Run("invalid-"+name, func(t *testing.T) {
t.Setenv(name, "invalid")
Expand Down
18 changes: 18 additions & 0 deletions consts.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
package main

const (
// labelBase is the base label for testcontainers.
labelBase = "org.testcontainers"

// ryukLabel is the label used to identify reaper containers.
ryukLabel = labelBase + ".ryuk"

// fieldError is the log field key for errors.
fieldError = "error"

// fieldAddress is the log field a client or listening address.
fieldAddress = "address"

// fieldClients is the log field used for client counts.
fieldClients = "clients"
)
55 changes: 20 additions & 35 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -6,64 +6,49 @@ require (
github.com/caarlos0/env/v11 v11.2.2
github.com/docker/docker v27.2.0+incompatible
github.com/stretchr/testify v1.9.0
github.com/testcontainers/testcontainers-go v0.33.0
gopkg.in/matryer/try.v1 v1.0.0-20150601225556-312d2599e12e
)

require (
dario.cat/mergo v1.0.0 // indirect
github.com/AdaLogics/go-fuzz-headers v0.0.0-20230811130428-ced1acdcaa24 // indirect
github.com/AdaLogics/go-fuzz-headers v0.0.0-20240806141605-e8a1dd7889d6 // indirect
github.com/Azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1 // indirect
github.com/Microsoft/go-winio v0.6.2 // indirect
github.com/cenkalti/backoff/v4 v4.2.1 // indirect
github.com/cheekybits/is v0.0.0-20150225183255-68e9c0620927 // indirect
github.com/containerd/log v0.1.0 // indirect
github.com/containerd/platforms v0.2.1 // indirect
github.com/cpuguy83/dockercfg v0.3.1 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/distribution/reference v0.6.0 // indirect
github.com/docker/go-connections v0.5.0 // indirect
github.com/docker/go-units v0.5.0 // indirect
github.com/felixge/httpsnoop v1.0.4 // indirect
github.com/go-logr/logr v1.4.2 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/go-ole/go-ole v1.2.6 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/klauspost/compress v1.17.4 // indirect
github.com/lufia/plan9stats v0.0.0-20211012122336-39d0f177ccd0 // indirect
github.com/magiconair/properties v1.8.7 // indirect
github.com/matryer/try v0.0.0-20161228173917-9ac251b645a2 // indirect
github.com/klauspost/compress v1.17.10 // indirect
github.com/kr/pretty v0.3.1 // indirect
github.com/moby/docker-image-spec v1.3.1 // indirect
github.com/moby/patternmatcher v0.6.0 // indirect
github.com/moby/sys/sequential v0.5.0 // indirect
github.com/moby/sys/user v0.1.0 // indirect
github.com/moby/sys/sequential v0.6.0 // indirect
github.com/moby/sys/user v0.3.0 // indirect
github.com/moby/sys/userns v0.1.0 // indirect
github.com/moby/term v0.5.0 // indirect
github.com/morikuni/aec v1.0.0 // indirect
github.com/opencontainers/go-digest v1.0.0 // indirect
github.com/opencontainers/image-spec v1.1.0 // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/power-devops/perfstat v0.0.0-20210106213030-5aafc221ea8c // indirect
github.com/shirou/gopsutil/v3 v3.23.12 // indirect
github.com/shoenig/go-m1cpu v0.1.6 // indirect
github.com/rogpeppe/go-internal v1.12.0 // indirect
github.com/sirupsen/logrus v1.9.3 // indirect
github.com/tklauser/go-sysconf v0.3.12 // indirect
github.com/tklauser/numcpus v0.6.1 // indirect
github.com/yusufpapurcu/wmi v1.2.3 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.54.0 // indirect
go.opentelemetry.io/otel v1.29.0 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.19.0 // indirect
go.opentelemetry.io/otel/metric v1.29.0 // indirect
go.opentelemetry.io/otel/sdk v1.19.0 // indirect
go.opentelemetry.io/otel/trace v1.29.0 // indirect
golang.org/x/crypto v0.24.0 // indirect
golang.org/x/net v0.26.0 // indirect
golang.org/x/sys v0.25.0 // indirect
golang.org/x/time v0.0.0-20220210224613-90d013bbcef8 // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20240318140521-94a12d6c2237 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20240318140521-94a12d6c2237 // indirect
google.golang.org/protobuf v1.33.0 // indirect
github.com/stretchr/objx v0.5.2 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.53.0 // indirect
go.opentelemetry.io/otel v1.28.0 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.28.0 // indirect
go.opentelemetry.io/otel/metric v1.28.0 // indirect
go.opentelemetry.io/otel/sdk v1.28.0 // indirect
go.opentelemetry.io/otel/trace v1.28.0 // indirect
golang.org/x/net v0.27.0 // indirect
golang.org/x/sys v0.22.0 // indirect
golang.org/x/time v0.5.0 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20240725223205-93522f1f2a9f // indirect
google.golang.org/grpc v1.65.0 // indirect
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
gotest.tools/v3 v3.5.1 // indirect
)
Loading

0 comments on commit 40b4aed

Please sign in to comment.