Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sflow] Failed to start with result 'start-limit-hit' (Reproduced intermittently) #7177

Closed
Hedgehog-Guru opened this issue Mar 29, 2021 · 2 comments

Comments

@Hedgehog-Guru
Copy link

Description
From time to time restarting of sflow.service fails:

Sep  2 12:44:40.228634 sonic-step-04 INFO sonic-ztp[9524]: Executing restart of service sflow...
Sep  2 12:44:40.243347 sonic-step-04 WARNING systemd[1]: sflow.service: Start request repeated too quickly.
Sep  2 12:44:40.244186 sonic-step-04 ERR systemd[1]: Failed to start sFlow container.
Sep  2 12:44:40.244876 sonic-step-04 NOTICE systemd[1]: sflow.service: Unit entered failed state.
Sep  2 12:44:40.245806 sonic-step-04 WARNING systemd[1]: sflow.service: Failed with result 'start-limit-hit'.
Sep  2 12:44:40.247692 sonic-step-04 INFO sonic-ztp[9524]: Job for sflow.service failed.
Sep  2 12:44:40.248548 sonic-step-04 INFO sonic-ztp[9524]: See "systemctl status sflow.service" and "journalctl -xe" for details.
Sep  2 12:44:40.249335 sonic-step-04 ERR config: Failed to execute restart of service sflow with error 1
Sep  2 12:44:40.370798 sonic-step-04 ERR sonic-ztp[22682]: configdb-json: Command '/usr/bin/config reload -y /tmp/config_dl.json' failed with exit code(1).

The issue was detected during ztp process only.
This issue is reproduced intermittently (approximately one time out of ten)

Steps to reproduce the issue
1.Boot sonic with ZTP enabled feature
2.Announce DHCP option 67 pointing to ztp_config.json (see at the bottom)
3.Put all necessary plugins and configuration files using in ZTP process on WEB server.
4. Start ZTP process

ztp erase
ztp run

5.Wait till ZTP process has finished.

Describe the results you received
During config reloading process sflow.service fails to start.

Output of show version

SONiC Software Version: SONiC.HEAD.0-fffee7e3
Distribution: Debian 9.12
Kernel: 4.9.0-11-2-amd64
Build commit: fffee7e3
Build date: Sun Jun 21 09:33:14 UTC 2020
Built by: ezrada@r-build-sonic02

Platform: x86_64-mlnx_msn2700-r0
HwSKU: Mellanox-SN2700
ASIC: mellanox
Serial Number: MT1646X05282            
Uptime: 13:36:03 up 12 min,  3 users,  load average: 0.41, 0.49, 0.50 

The issue was detected on SONiC Software Version: SONiC.SONIC.202012.53-aee4892c_Internal as well

SONiC Software Version: SONiC.SONIC.202012.53-aee4892c_Internal
Distribution: Debian 10.8
Kernel: 4.19.0-12-2-amd64
Build commit: aee4892c
Build date: Thu Mar 25 07:17:58 UTC 2021
Built by: sw-r2d2-bot@r-build-sonic-ci03

sonic_dump_DUT-1_20200902_133602.tar.gz

@anshuv-mfst
Copy link

Issue triage 3/31:

  1. BRCM team to help investigate (@adyeung)

@vivekrnv
Copy link
Contributor

Start Limit failing is an Expected behavior for Sflow

sflow systemd unit

StartLimitIntervalSec = 1200 and StartLimitBurst=3 which means it'll allow 3 attempts of starting the service within the last 1200 sec. This is obviously crossed in the logs attached.

Sep  2 11:59:57.325251 sonic-step-03 INFO sonic-ztp[3497]: Executing restart of service sflow...
Sep  2 12:40:02.891768 DUT-1 INFO sonic-ztp[9524]: Executing restart of service sflow...
Sep  2 12:42:54.425863 sonic-step-03 INFO sonic-ztp[9524]: Executing restart of service sflow...
Sep  2 12:44:40.228634 sonic-step-04 INFO sonic-ztp[9524]: Executing restart of service sflow...
Sep  2 12:44:40.247692 sonic-step-04 INFO sonic-ztp[9524]: Job for sflow.service failed.

This can be manually observed as well

root@sonic-dut:/home/admin# config feature state sflow enabled
root@sonic-dut:/home/admin# systemctl restart sflow
root@sonic-dut:/home/admin# systemctl restart sflow
root@sonic-dut:/home/admin# systemctl restart sflow
Job for sflow.service failed.
See "systemctl status sflow.service" and "journalctl -xe" for details.

The "start-limit" exceeded behavior can be avoided by running systemd reset-failed which flushes the timer values and start-limit won't hit.

root@sonic-dut:/home/admin# systemctl reset-failed sflow
root@sonic-dut:/home/admin# systemctl restart sflow
root@sonic-dut:/home/admin# systemctl reset-failed sflow
root@sonic-dut:/home/admin# systemctl restart sflow
root@sonic-dut:/home/admin# systemctl reset-failed sflow
root@sonic-dut:/home/admin# systemctl restart sflow
root@sonic-dut:/home/admin# systemctl reset-failed sflow
root@sonic-dut:/home/admin# systemctl restart sflow

Before this PR: #1999. reset-failed was not run on sflow service, during a config reload. The reason being config reload used to filter out those services which doesn't have "feature enabled" status. and sflow is by default disabled.

Sep  2 12:42:58.745206 sonic-step-03 INFO sonic-ztp[9524]: Executing stop of service swss...
Sep  2 12:43:34.441345 sonic-step-03 INFO sonic-ztp[9524]: Executing stop of service lldp...
Sep  2 12:43:44.746268 sonic-step-03 INFO sonic-ztp[9524]: Executing stop of service bgp...
Sep  2 12:43:49.161023 sonic-step-03 INFO sonic-ztp[9524]: Executing stop of service hostcfgd...
Sep  2 12:43:49.177712 sonic-step-03 INFO sonic-ztp[9524]: Executing stop of service nat...
Sep  2 12:43:59.501858 sonic-step-03 INFO sonic-ztp[9524]: Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /tmp/config_dl.json --write-to-db
Sep  2 12:43:59.824282 sonic-step-03 INFO sonic-ztp[9524]: Running command: /usr/bin/db_migrator.py -o migrate
Sep  2 12:44:00.135513 sonic-step-03 INFO sonic-ztp[9524]: Executing reset-failed of service bgp...
Sep  2 12:44:00.147184 sonic-step-03 INFO sonic-ztp[9524]: Executing reset-failed of service dhcp_relay...
Sep  2 12:44:00.159230 sonic-step-03 INFO sonic-ztp[9524]: Executing reset-failed of service hostcfgd...
Sep  2 12:44:00.171274 sonic-step-03 INFO sonic-ztp[9524]: Executing reset-failed of service hostname-config...
Sep  2 12:44:00.183062 sonic-step-03 INFO sonic-ztp[9524]: Executing reset-failed of service interfaces-config...
Sep  2 12:44:00.194888 sonic-step-03 INFO sonic-ztp[9524]: Executing reset-failed of service lldp...
Sep  2 12:44:00.206694 sonic-step-03 INFO sonic-ztp[9524]: Executing reset-failed of service ntp-config...
Sep  2 12:44:00.218487 sonic-step-03 INFO sonic-ztp[9524]: Executing reset-failed of service pmon...
Sep  2 12:44:00.230561 sonic-step-03 INFO sonic-ztp[9524]: Executing reset-failed of service radv...
Sep  2 12:44:00.242284 sonic-step-03 INFO sonic-ztp[9524]: Executing reset-failed of service rsyslog-config...
Sep  2 12:44:00.254115 sonic-step-03 INFO sonic-ztp[9524]: Executing reset-failed of service snmp...
Sep  2 12:44:00.266037 sonic-step-03 INFO sonic-ztp[9524]: Executing reset-failed of service swss...
Sep  2 12:44:00.278283 sonic-step-03 INFO sonic-ztp[9524]: Executing reset-failed of service syncd...
Sep  2 12:44:00.290116 sonic-step-03 INFO sonic-ztp[9524]: Executing reset-failed of service teamd...
Sep  2 12:44:00.301907 sonic-step-03 INFO sonic-ztp[9524]: Executing reset-failed of service nat...
Sep  2 12:44:00.314407 sonic-step-03 INFO sonic-ztp[9524]: Executing restart of service hostname-config...
Sep  2 12:44:00.652423 sonic-step-03 INFO sonic-ztp[9524]: Executing restart of service interfaces-config...
Sep  2 12:44:09.094233 sonic-step-03 INFO sonic-ztp[9524]: Executing restart of service ntp-config...
Sep  2 12:44:09.805182 sonic-step-03 INFO sonic-ztp[9524]: Executing restart of service rsyslog-config...
Sep  2 12:44:10.174881 sonic-step-04 INFO sonic-ztp[9524]: Executing restart of service swss...
Sep  2 12:44:12.983702 sonic-step-04 INFO sonic-ztp[9524]: Executing restart of service bgp...
Sep  2 12:44:16.270515 sonic-step-04 INFO sonic-ztp[9524]: Executing restart of service lldp...
Sep  2 12:44:27.779352 sonic-step-04 INFO sonic-ztp[9524]: Executing restart of service hostcfgd...
Sep  2 12:44:27.839321 sonic-step-04 INFO sonic-ztp[9524]: Executing restart of service nat...
Sep  2 12:44:40.228634 sonic-step-04 INFO sonic-ztp[9524]: Executing restart of service sflow...

The new PR restarts sonic.target which in turns restarts every service dependent on it (inclusing sflow) even though it's feature status is disabled) and backported to 202012 from this commit.

Hence, i certainly think that this issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants