Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced SupervisorD to exit immediately if one of its managed process get crashed which causes respective docker container to stop.Then container will be restarted gracefully. #2208

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions dockers/docker-base/Dockerfile.j2
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ RUN mkdir -p /etc/supervisor
RUN mkdir -p /var/log/supervisor

COPY ["etc/supervisor/supervisord.conf", "/etc/supervisor/"]
COPY ["etc/supervisor/kill_supervisor.py", "/usr/bin/"]

RUN apt-get -y purge \
exim4 \
Expand Down
59 changes: 59 additions & 0 deletions dockers/docker-base/etc/supervisor/kill_supervisor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#!/usr/bin/python

# Please follow the link for documentation: http://supervisord.org/events.html
# SupervisorD exits immediately if one of its managed process get crashed as it subscribes to "EVENT Listener" process.

import sys
import os
import signal
import subprocess

from supervisor.childutils import listener

def write_stdout(s):
# only eventlistener protocol messages may be sent to stdout
sys.stdout.write(s)
sys.stdout.flush()

def write_stderr(s):
sys.stderr.write(s)
sys.stderr.flush()

def main():
while True:
all_service_list = []
proc = subprocess.Popen(["supervisorctl avail | cut -d' ' -f1"], shell=True, stdout=subprocess.PIPE)
(out, err) = proc.communicate()

all_service_list = out.split()

# "exception_service_list" contains all the program excluded from event listener process.
exception_service_list = ["start.sh", "enable_counters", "swssconfig", "arp_update", "ledinit", "fancontrol", "lm-sensors", "ledd", "xcvrd", "configdb-load.sh", "snmpd-config-updater"]
samaity marked this conversation as resolved.
Show resolved Hide resolved

service_list = [x for x in all_service_list if x not in exception_service_list]
headers, body = listener.wait(sys.stdin, sys.stdout)
body = dict([pair.split(":") for pair in body.split(" ")])

write_stderr("Headers: %r\n" % repr(headers))
write_stderr("Body: %r\n" % repr(body))

process = body["processname"];
state = headers["eventname"].split('_')[2];
if process in service_list:
write_stderr("Process {} got {} !!! Time to kill Supervisord !!!\n".format(process,state))
try:
pidfile = open('/var/run/supervisord.pid','r')
pid = int(pidfile.readline());
os.kill(pid, signal.SIGQUIT)
except Exception as e:
write_stdout('Could not kill supervisor: ' + e.strerror + '\n')
else:
write_stderr("Process {} got {} !!! But no need to kill Supervisor !!!\n".format(process,state))

# # transition from READY to ACKNOWLEDGED
write_stdout("RESULT 2\nOK")


if __name__ == "__main__":
main()

6 changes: 6 additions & 0 deletions dockers/docker-database/supervisord.conf
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,9 @@ autorestart=false
startsecs=0
stdout_logfile=syslog
stderr_logfile=syslog

[eventlistener:kill_supervisor]
command=/usr/bin/kill_supervisor.py
events=PROCESS_STATE_STOPPED, PROCESS_STATE_EXITED, PROCESS_STATE_FATAL
stdout_logfile=syslog
stderr_logfile=syslog
Original file line number Diff line number Diff line change
Expand Up @@ -65,3 +65,9 @@ stderr_logfile=syslog
{% endfor %}
{% endif %}
{% endif %}

[eventlistener:kill_supervisor]
Copy link
Contributor

@jleveque jleveque Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the [eventlistener:...] section(s) should be listed immediately after the [supervisord] section; before any [program:...] sections. This applies to all supervisor config files. This way, the [eventlistener:...] section will be in a consistent position in the file for all containers.

command=/usr/bin/kill_supervisor.py
events=PROCESS_STATE_STOPPED, PROCESS_STATE_EXITED, PROCESS_STATE_FATAL
stdout_logfile=syslog
stderr_logfile=syslog
6 changes: 6 additions & 0 deletions dockers/docker-fpm-quagga/supervisord.conf
Original file line number Diff line number Diff line change
Expand Up @@ -56,3 +56,9 @@ autorestart=false
startsecs=0
stdout_logfile=syslog
stderr_logfile=syslog

[eventlistener:kill_supervisor]
command=/usr/bin/kill_supervisor.py
events=PROCESS_STATE_STOPPED, PROCESS_STATE_EXITED, PROCESS_STATE_FATAL
stdout_logfile=syslog
stderr_logfile=syslog
6 changes: 6 additions & 0 deletions dockers/docker-lldp-sv2/supervisord.conf
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,9 @@ autostart=false
autorestart=false
stdout_logfile=syslog
stderr_logfile=syslog

[eventlistener:kill_supervisor]
command=/usr/bin/kill_supervisor.py
events=PROCESS_STATE_STOPPED, PROCESS_STATE_EXITED, PROCESS_STATE_FATAL
stdout_logfile=syslog
stderr_logfile=syslog
6 changes: 6 additions & 0 deletions dockers/docker-orchagent/supervisord.conf
Original file line number Diff line number Diff line change
Expand Up @@ -116,3 +116,9 @@ autostart=false
autorestart=false
stdout_logfile=syslog
stderr_logfile=syslog

[eventlistener:kill_supervisor]
command=/usr/bin/kill_supervisor.py
events=PROCESS_STATE_STOPPED, PROCESS_STATE_EXITED, PROCESS_STATE_FATAL
stdout_logfile=syslog
stderr_logfile=syslog
6 changes: 6 additions & 0 deletions dockers/docker-platform-monitor/supervisord.conf
Original file line number Diff line number Diff line change
Expand Up @@ -54,3 +54,9 @@ autorestart=false
stdout_logfile=syslog
stderr_logfile=syslog
startsecs=0

[eventlistener:kill_supervisor]
command=/usr/bin/kill_supervisor.py
events=PROCESS_STATE_STOPPED, PROCESS_STATE_EXITED, PROCESS_STATE_FATAL
stdout_logfile=syslog
stderr_logfile=syslog
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,9 @@ autostart=false
autorestart=false
stdout_logfile=syslog
stderr_logfile=syslog

[eventlistener:kill_supervisor]
command=/usr/bin/kill_supervisor.py
events=PROCESS_STATE_STOPPED, PROCESS_STATE_EXITED, PROCESS_STATE_FATAL
stdout_logfile=syslog
stderr_logfile=syslog
6 changes: 6 additions & 0 deletions dockers/docker-snmp-sv2/supervisord.conf
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,9 @@ autostart=false
autorestart=false
stdout_logfile=syslog
stderr_logfile=syslog

[eventlistener:kill_supervisor]
command=/usr/bin/kill_supervisor.py
events=PROCESS_STATE_STOPPED, PROCESS_STATE_EXITED, PROCESS_STATE_FATAL
stdout_logfile=syslog
stderr_logfile=syslog
6 changes: 6 additions & 0 deletions dockers/docker-sonic-telemetry/supervisord.conf
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,9 @@ autostart=false
autorestart=true
stdout_logfile=syslog
stderr_logfile=syslog

[eventlistener:kill_supervisor]
command=/usr/bin/kill_supervisor.py
events=PROCESS_STATE_STOPPED, PROCESS_STATE_EXITED, PROCESS_STATE_FATAL
stdout_logfile=syslog
stderr_logfile=syslog
6 changes: 6 additions & 0 deletions dockers/docker-teamd/supervisord.conf
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,9 @@ autostart=false
autorestart=false
stdout_logfile=syslog
stderr_logfile=syslog

[eventlistener:kill_supervisor]
command=/usr/bin/kill_supervisor.py
events=PROCESS_STATE_STOPPED, PROCESS_STATE_EXITED, PROCESS_STATE_FATAL
stdout_logfile=syslog
stderr_logfile=syslog
2 changes: 2 additions & 0 deletions files/build_templates/database.service.j2
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,7 @@ ExecStartPre=/usr/bin/{{docker_container_name}}.sh start
ExecStart=/usr/bin/{{docker_container_name}}.sh attach
ExecStop=/usr/bin/{{docker_container_name}}.sh stop

Restart=always

[Install]
WantedBy=multi-user.target
2 changes: 2 additions & 0 deletions files/build_templates/dhcp_relay.service.j2
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,7 @@ ExecStartPre=/usr/bin/{{ docker_container_name }}.sh start
ExecStart=/usr/bin/{{ docker_container_name }}.sh attach
ExecStop=/usr/bin/{{ docker_container_name }}.sh stop

Restart=always

[Install]
WantedBy=multi-user.target teamd.service
2 changes: 2 additions & 0 deletions files/build_templates/lldp.service.j2
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,7 @@ ExecStartPre=/usr/bin/{{docker_container_name}}.sh start
ExecStart=/usr/bin/{{docker_container_name}}.sh attach
ExecStop=/usr/bin/{{docker_container_name}}.sh stop

Restart=always

[Install]
WantedBy=multi-user.target
2 changes: 2 additions & 0 deletions files/build_templates/pmon.service.j2
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,7 @@ ExecStartPre=/usr/bin/{{docker_container_name}}.sh start
ExecStart=/usr/bin/{{docker_container_name}}.sh attach
ExecStop=/usr/bin/{{docker_container_name}}.sh stop

Restart=always

[Install]
WantedBy=multi-user.target
2 changes: 2 additions & 0 deletions files/build_templates/radv.service.j2
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,7 @@ ExecStartPre=/usr/bin/{{ docker_container_name }}.sh start
ExecStart=/usr/bin/{{ docker_container_name }}.sh attach
ExecStop=/usr/bin/{{ docker_container_name }}.sh stop

Restart=always

[Install]
WantedBy=multi-user.target
2 changes: 2 additions & 0 deletions files/build_templates/snmp.service.j2
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,5 @@ After=updategraph.service swss.service
ExecStartPre=/usr/bin/{{docker_container_name}}.sh start
ExecStart=/usr/bin/{{docker_container_name}}.sh attach
ExecStop=/usr/bin/{{docker_container_name}}.sh stop

Restart=always
2 changes: 2 additions & 0 deletions files/build_templates/swss.service.j2
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,7 @@ Environment=sonic_asic_platform={{ sonic_asic_platform }}
ExecStart=/usr/local/bin/swss.sh start
ExecStop=/usr/local/bin/swss.sh stop

Restart=always

[Install]
WantedBy=multi-user.target
2 changes: 2 additions & 0 deletions files/build_templates/syncd.service.j2
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,7 @@ Environment=sonic_asic_platform={{ sonic_asic_platform }}
ExecStart=/usr/local/bin/syncd.sh start
ExecStop=/usr/local/bin/syncd.sh stop

Restart=always

[Install]
WantedBy=multi-user.target
2 changes: 2 additions & 0 deletions files/build_templates/teamd.service.j2
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,7 @@ ExecStartPre=/usr/bin/{{docker_container_name}}.sh start
ExecStart=/usr/bin/{{docker_container_name}}.sh attach
ExecStop=/usr/bin/{{docker_container_name}}.sh stop

Restart=always

[Install]
WantedBy=multi-user.target
2 changes: 2 additions & 0 deletions files/build_templates/telemetry.service.j2
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,7 @@ ExecStartPre=/usr/bin/{{docker_container_name}}.sh start
ExecStart=/usr/bin/{{docker_container_name}}.sh attach
ExecStop=/usr/bin/{{docker_container_name}}.sh stop

Restart=always

[Install]
WantedBy=multi-user.target
5 changes: 5 additions & 0 deletions platform/barefoot/docker-syncd-bfn/supervisord.conf
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,8 @@ autorestart=false
stdout_logfile=syslog
stderr_logfile=syslog

[eventlistener:kill_supervisor]
command=/usr/bin/kill_supervisor.py
events=PROCESS_STATE_STOPPED, PROCESS_STATE_EXITED, PROCESS_STATE_FATAL
stdout_logfile=syslog
stderr_logfile=syslog
6 changes: 6 additions & 0 deletions platform/broadcom/docker-syncd-brcm/supervisord.conf
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,9 @@ autostart=false
autorestart=false
stdout_logfile=syslog
stderr_logfile=syslog

[eventlistener:kill_supervisor]
command=/usr/bin/kill_supervisor.py
events=PROCESS_STATE_STOPPED, PROCESS_STATE_EXITED, PROCESS_STATE_FATAL
stdout_logfile=syslog
stderr_logfile=syslog
6 changes: 6 additions & 0 deletions platform/cavium/docker-syncd-cavm/supervisord.conf
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,9 @@ autostart=false
autorestart=false
stdout_logfile=syslog
stderr_logfile=syslog

[eventlistener:kill_supervisor]
command=/usr/bin/kill_supervisor.py
events=PROCESS_STATE_STOPPED, PROCESS_STATE_EXITED, PROCESS_STATE_FATAL
stdout_logfile=syslog
stderr_logfile=syslog
6 changes: 6 additions & 0 deletions platform/centec/docker-syncd-centec/supervisord.conf
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,9 @@ autostart=false
autorestart=false
stdout_logfile=syslog
stderr_logfile=syslog

[eventlistener:kill_supervisor]
command=/usr/bin/kill_supervisor.py
events=PROCESS_STATE_STOPPED, PROCESS_STATE_EXITED, PROCESS_STATE_FATAL
stdout_logfile=syslog
stderr_logfile=syslog
5 changes: 5 additions & 0 deletions platform/marvell/docker-syncd-mrvl/supervisord.conf
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,8 @@ autorestart=false
stdout_logfile=syslog
stderr_logfile=syslog

[eventlistener:kill_supervisor]
command=/usr/bin/kill_supervisor.py
events=PROCESS_STATE_STOPPED, PROCESS_STATE_EXITED, PROCESS_STATE_FATAL
stdout_logfile=syslog
stderr_logfile=syslog
6 changes: 6 additions & 0 deletions platform/mellanox/docker-syncd-mlnx/supervisord.conf
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,9 @@ autostart=false
autorestart=false
stdout_logfile=syslog
stderr_logfile=syslog

[eventlistener:kill_supervisor]
command=/usr/bin/kill_supervisor.py
events=PROCESS_STATE_STOPPED, PROCESS_STATE_EXITED, PROCESS_STATE_FATAL
stdout_logfile=syslog
stderr_logfile=syslog
6 changes: 6 additions & 0 deletions platform/nephos/docker-syncd-nephos/supervisord.conf
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,9 @@ autostart=false
autorestart=false
stdout_logfile=syslog
stderr_logfile=syslog

[eventlistener:kill_supervisor]
command=/usr/bin/kill_supervisor.py
events=PROCESS_STATE_STOPPED, PROCESS_STATE_EXITED, PROCESS_STATE_FATAL
stdout_logfile=syslog
stderr_logfile=syslog
6 changes: 6 additions & 0 deletions platform/vs/docker-sonic-vs/supervisord.conf
Original file line number Diff line number Diff line change
Expand Up @@ -154,3 +154,9 @@ autostart=false
autorestart=false
stdout_logfile=syslog
stderr_logfile=syslog

[eventlistener:kill_supervisor]
command=/usr/bin/kill_supervisor.py
events=PROCESS_STATE_STOPPED, PROCESS_STATE_EXITED, PROCESS_STATE_FATAL
stdout_logfile=syslog
stderr_logfile=syslog
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,8 @@ stdout_logfile=syslog
stderr_logfile=syslog


[eventlistener:kill_supervisor]
command=/usr/bin/kill_supervisor.py
events=PROCESS_STATE_STOPPED, PROCESS_STATE_EXITED, PROCESS_STATE_FATAL
stdout_logfile=syslog
stderr_logfile=syslog