You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The timeout command's SIGTERM signal should always cause the consul-template to terminate.
Actual behavior
After running for sometime, intermittently, the consul-template doesn't get terminated upon timeout command's SIGTERM signal, thus it keeps running until something else happens, but not as expected on the given timeout.
Debug output
According to my investigation and debug, what we concluded is that the SIGTERM signal from the timeout command comes at exact really short time as a SIGUSR signal, internally produced by go and the SIGTERM signal is not handled, due to the signal channel buffer set to only 1:
The following logs are from strace of the timeout command running the consul-template and bellow is the log from consul-template itself at exactly the same 1/1000 of second - 12:29:49.789.
Just after that it shows that it had received SIGCONT - which is sent from the timeout command after the SIGTERM signal - thus the SIGTERM has not been handled by the consul-template process:
Run the consul-template as provided above with the timeout command - doesn't matter of the configuration - it just needs to do something, so the internal SIGUSR sent by the go routines is being sent to the consul-template handler
The above should be run in supervisor of your choosing so the process is started everytime it terminates
Monitor that after some point the consul-template is never terminated by the timeout command
References
There has been a complaint regarding the abundance of SIGUSR produced logs and an implementation has been provided to ignore them - but as far as I understand it - the implementation, doesn't account for the fact that the SIGUSR signals are still received by the go channel, which handles them, with buffer set to 1 - which seems, as described seems too small for the amount of signals received due to internal SIGUSR signals.
Just to add an update - we tried with the obvious fix of increasing the channel buffer - at first to 10, but the issue appeared again.
Increasing it to 100, seems to be solving the issue on our end - just to add our findings for completeness
I'm just going to bump the channel size to 100 as you have done. Makes almost no difference to consul-template except a small bit of added memory and should prevent this from being an issue for anyone in the future.
Hmm.. looking at the signal/ code we already have handy lists of the signals we care about. I'm going to look quickly into the idea of passing the list to Notify (limiting what we receive) instead of receiving everything and filtering them out locally.
Configuration
We use default configuration for the consul-template itself, the templates doesn't seem to be relevant and not needed details
Command
exec \ /usr/bin/timeout -s SIGTERM 120 \ /usr/local/consul-template/bin/consul-template \ -log-level info \ -kill-signal SIGTERM \ -reload-signal SIGHUP \ -vault-renew-token=false \ -max-stale 10m \ -config=/etc/consul-template/configs \ -config=/etc/consul-template/templates.hcl
Expected behavior
The timeout command's SIGTERM signal should always cause the consul-template to terminate.
Actual behavior
After running for sometime, intermittently, the consul-template doesn't get terminated upon timeout command's SIGTERM signal, thus it keeps running until something else happens, but not as expected on the given timeout.
Debug output
According to my investigation and debug, what we concluded is that the SIGTERM signal from the timeout command comes at exact really short time as a SIGUSR signal, internally produced by go and the SIGTERM signal is not handled, due to the signal channel buffer set to only 1:
consul-template/cli.go
Line 58 in be5fcd8
The following logs are from strace of the timeout command running the consul-template and bellow is the log from consul-template itself at exactly the same 1/1000 of second - 12:29:49.789.
Just after that it shows that it had received SIGCONT - which is sent from the timeout command after the SIGTERM signal - thus the SIGTERM has not been handled by the consul-template process:
Here I am running just slightly modified version - just changed logging level - of the latest consul-template in order to get only the needed logs:
Steps to reproduce
References
There has been a complaint regarding the abundance of SIGUSR produced logs and an implementation has been provided to ignore them - but as far as I understand it - the implementation, doesn't account for the fact that the SIGUSR signals are still received by the go channel, which handles them, with buffer set to 1 - which seems, as described seems too small for the amount of signals received due to internal SIGUSR signals.
The text was updated successfully, but these errors were encountered: