-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Thanos Rule] Rules Files not reloading after SIGHUP signal #4432
Comments
Possible workaround until it's fixed is triggering reload by calling http endpoint: |
Yes, the workaround works. |
Fixed by #4442. It's now covered by tests so it's working 100% 💪 |
Thanos, Prometheus and Golang version used:
thanos, version 0.21.1 (branch: HEAD, revision: 3558f4a)
build user: root@744cf7ef4576
build date: 20210604-12:11:58
go version: go1.16.5
platform: linux/amd64
What happened:
After rule files update, a SIGHUP signal is sent to Thanos in order to reload the rules in run time. The rule files update is no applied by Thanos Rule, keeping applying the same rules as when Thanos was started.
By restarting the Thanos process the rules are updated.
What you expected to happen:
Thanos should reload the rules at runtime after receiving SIGHUP signal, with no need to stop and start the Thanos process.
How to reproduce it (as minimally and precisely as possible):
Run Thanos Rule with minimal configuration and a basic and valid rule_test.yaml file:
thanos rule --log.level debug --log.format logfmt --http-address 0.0.0.0:10902 --http-grace-period 2m --grpc-address 0.0.0.0:10901 --grpc-grace-period 2m --data-dir ./data --rule-file './*.yml' --resend-delay 1m --eval-interval 30s --tsdb.block-duration 2h --tsdb.retention 2d --query thanos-query.domain :20902
Access UI and check that rules defined in rule_test.yamls file are being applied.
Add a new rules file (rule__test_2.yaml) to the same folder.
Get Thanos proccess id:
thanos_pid=$(pgrep thanos)
Reload Thanos process:
kill -1 $thanos_pid
Acces the UI, and check the rules.
Full logs to relevant components:
After sending the SIGUHP this is the log:
level=info ts=2021-07-09T15:43:27.658638469Z caller=main.go:180 msg="caught signal. Reloading." signal=hangup
level=info ts=2021-07-09T15:41:04.904147954Z caller=main.go:183 msg="reload dispatched."
level=debug ts=2021-07-09T15:43:49.605131566Z caller=promclient.go:398 component=rules msg="querying instant" url="http://10.103.69.158:30902/api/v1/query?......
Anything else we need to know:
Tested other Thanos versions, and the last version where it worked properly was v0.19.0.
Following the same steps with v0.19.0 the rules files are reloaded. The log is different, and its reporting that rules files are being loaded:
level=info ts=2021-07-09T15:41:04.904087152Z caller=main.go:180 msg="caught signal. Reloading." signal=hangup
level=info ts=2021-07-09T15:41:04.904147954Z caller=main.go:183 msg="reload dispatched."
level=debug ts=2021-07-09T15:41:04.904165954Z caller=rule.go:820 component=rules msg="configured rule files" files=./*.yaml
level=info ts=2021-07-09T15:41:04.904271701Z caller=rule.go:843 component=rules msg="reload rule files" numFiles=2
The text was updated successfully, but these errors were encountered: