Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[monit/snmp] Tests fail against master image because snmp_subagent is not running #5502

Closed
daall opened this issue Sep 29, 2020 · 5 comments · Fixed by #5480
Closed

[monit/snmp] Tests fail against master image because snmp_subagent is not running #5502

daall opened this issue Sep 29, 2020 · 5 comments · Fixed by #5480
Assignees

Comments

@daall
Copy link
Contributor

daall commented Sep 29, 2020

Description
Many tests are failing during teardown on public image SONiC.master.425-4006ce71 because loganalyzer is flagging monit error messages due to snmp_subagent not running.

Steps to reproduce the issue:

  1. Run a set of nightly tests
  2. Check the failed test cases

Describe the results you received:

duthost = <tests.common.devices.SonicHost object at 0x7fdee906c590>
request = <SubRequest 'loganalyzer' for <Function test_crm_nexthop_group[False-2.2.2.0/24]>>

    @pytest.fixture(autouse=True)
    def loganalyzer(duthost, request):
        if request.config.getoption("--disable_loganalyzer") or "disable_loganalyzer" in request.keywords:
            logging.info("Log analyzer is disabled")
            yield
            return
    
        # Force rotate logs
        try:
            duthost.shell(
                "/usr/sbin/logrotate -f /etc/logrotate.conf > /dev/null 2>&1"
                )
        except RunAnsibleModuleFail as e:
            logging.warning("logrotate is failed. Command returned:\n"
                            "Stdout: {}\n"
                            "Stderr: {}\n"
                            "Return code: {}".format(e.results["stdout"], e.results["stderr"], e.results["rc"]))
    
        loganalyzer = LogAnalyzer(ansible_host=duthost, marker_prefix=request.node.name)
        logging.info("Add start marker into DUT syslog")
        marker = loganalyzer.init()
        logging.info("Load config and analyze log")
        # Read existed common regular expressions located with legacy loganalyzer module
        loganalyzer.load_common_config()
    
        yield loganalyzer
        # Skip LogAnalyzer if case is skipped
        if request.node.rep_call.skipped:
            return
>       loganalyzer.analyze(marker)

duthost    = <tests.common.devices.SonicHost object at 0x7fdee906c590>
loganalyzer = <tests.common.plugins.loganalyzer.loganalyzer.LogAnalyzer instance at 0x7fded5e1caa0>
marker     = 'test_crm_nexthop_group[False-2.2.2.0/24].2020-09-29-09:01:43'
request    = <SubRequest 'loganalyzer' for <Function test_crm_nexthop_group[False-2.2.2.0/24]>>

common/plugins/loganalyzer/__init__.py:42: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
common/plugins/loganalyzer/loganalyzer.py:254: in analyze
    self._verify_log(analyzer_summary)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <tests.common.plugins.loganalyzer.loganalyzer.LogAnalyzer instance at 0x7fded5e1caa0>
result = {'expect_messages': {'/tmp/syslog.2020-09-29-09:03:54': []}, 'match_files': {'/tmp/syslog.2020-09-29-09:03:54': {'expe...r/local/bin/bgpmon' is not running.\n"]}, 'total': {'expected_match': 0, 'expected_missing_match': 0, 'match': 4}, ...}

    def _verify_log(self, result):
        """
        Verify that total match and expected missing match equals to zero or raise exception otherwise.
        Verify that expected_match is not equal to zero when there is configured expected regexp in self.expect_regex list
        """
        if not result:
            raise LogAnalyzerError("Log analyzer failed - no result.")
        if result["total"]["match"] != 0 or result["total"]["expected_missing_match"] != 0:
>           raise LogAnalyzerError(result)
E           LogAnalyzerError: {'match_messages': {'/tmp/syslog.2020-09-29-09:03:54': ["Sep 29 09:02:35.496639 str2-dx010-acs-6 ERR monit[517]: 'snmp|snmp_subagent' status failed (1) -- 'python3.6 -m sonic_ax_impl' is not running.\n", "Sep 29 09:02:35.872821 str2-dx010-acs-6 ERR monit[517]: 'bgp|bgpmon' status failed (1) -- 'python /usr/local/bin/bgpmon' is not running.\n", "Sep 29 09:03:36.130296 str2-dx010-acs-6 ERR monit[517]: 'snmp|snmp_subagent' status failed (1) -- 'python3.6 -m sonic_ax_impl' is not running.\n", "Sep 29 09:03:36.476046 str2-dx010-acs-6 ERR monit[517]: 'bgp|bgpmon' status failed (1) -- 'python /usr/local/bin/bgpmon' is not running.\n"]}, 'total': {'expected_match': 0, 'expected_missing_match': 0, 'match': 4}, 'match_files': {'/tmp/syslog.2020-09-29-09:03:54': {'expected_match': 0, 'match': 4}}, 'expect_messages': {'/tmp/syslog.2020-09-29-09:03:54': []}, 'unused_expected_regexp': []}

result     = {'expect_messages': {'/tmp/syslog.2020-09-29-09:03:54': []}, 'match_files': {'/tmp/syslog.2020-09-29-09:03:54': {'expe...r/local/bin/bgpmon' is not running.\n"]}, 'total': {'expected_match': 0, 'expected_missing_match': 0, 'match': 4}, ...}
self       = <tests.common.plugins.loganalyzer.loganalyzer.LogAnalyzer instance at 0x7fded5e1caa0>

common/plugins/loganalyzer/loganalyzer.py:85: LogAnalyzerError

Describe the results you expected:
Tests should not run into any teardown failures due to loganalyzer.

Output of show version:

admin@str2-dx010-acs-6:~$ show ver

SONiC Software Version: SONiC.master.425-4006ce71
Distribution: Debian 10.6
Kernel: 4.19.0-9-2-amd64
Build commit: 4006ce71
Build date: Sun Sep 27 07:52:44 UTC 2020
Built by: johnar@jenkins-worker-8

Platform: x86_64-cel_seastone-r0
HwSKU: Celestica-DX010-C32
ASIC: broadcom
Serial Number: DX010F2B118711MS100005
Uptime: 18:01:43 up  3:22,  1 user,  load average: 1.48, 2.06, 2.22

Docker images:
REPOSITORY                    TAG                   IMAGE ID            SIZE
docker-teamd                  latest                20617ff1849b        391MB
docker-teamd                  master.425-4006ce71   20617ff1849b        391MB
docker-sonic-mgmt-framework   latest                04e712b877e4        486MB
docker-sonic-mgmt-framework   master.425-4006ce71   04e712b877e4        486MB
docker-router-advertiser      latest                793cb0e5c2fa        359MB
docker-router-advertiser      master.425-4006ce71   793cb0e5c2fa        359MB
docker-platform-monitor       latest                f867f6e00a19        434MB
docker-platform-monitor       master.425-4006ce71   f867f6e00a19        434MB
docker-lldp                   latest                2ec74daaf081        388MB
docker-lldp                   master.425-4006ce71   2ec74daaf081        388MB
docker-dhcp-relay             latest                56b4112e9427        366MB
docker-dhcp-relay             master.425-4006ce71   56b4112e9427        366MB
docker-database               latest                32598ba6c32e        359MB
docker-database               master.425-4006ce71   32598ba6c32e        359MB
docker-orchagent              latest                bd1d90cb1a3f        405MB
docker-orchagent              master.425-4006ce71   bd1d90cb1a3f        405MB
docker-nat                    latest                de766da20285        394MB
docker-nat                    master.425-4006ce71   de766da20285        394MB
docker-sonic-telemetry        latest                3d34711e992d        429MB
docker-sonic-telemetry        master.425-4006ce71   3d34711e992d        429MB
docker-fpm-frr                latest                b79fad0eb74e        407MB
docker-fpm-frr                master.425-4006ce71   b79fad0eb74e        407MB
docker-sflow                  latest                418e8407b585        395MB
docker-sflow                  master.425-4006ce71   418e8407b585        395MB
docker-snmp                   latest                075ee0d414ab        399MB
docker-snmp                   master.425-4006ce71   075ee0d414ab        399MB
docker-syncd-brcm             latest                b93785e0dca0        447MB
docker-syncd-brcm             master.425-4006ce71   b93785e0dca0        447MB
@daall
Copy link
Contributor Author

daall commented Sep 29, 2020

@yxieca FYI

@daall daall transferred this issue from sonic-net/sonic-mgmt Sep 29, 2020
@daall daall changed the title Many tests fail against master image because snmp_subagent is not running [monit/snmp] Tests fail against master image because snmp_subagent is not running Sep 29, 2020
@theasianpianist
Copy link
Contributor

I saw a previous issue involving SNMP over the weekend, I believe some transient error caused the snmp timer to not run, which meant the SNMP container never started. Not sure if they are somehow related, but I have not seen the error since.

@daall
Copy link
Contributor Author

daall commented Sep 29, 2020

From #5488: "For snmp|snmp_subagent and bgp|bgpmon logs, I believe they are caused by incorrect config for monit."

@daall
Copy link
Contributor Author

daall commented Sep 30, 2020

Adding some more info from the syslog:

Sep 30 22:43:20.164629 str2-dx010-acs-6 INFO snmp#snmp-subagent [ax_interface] INFO: OID registration complete. Waiting to receive PDUs...
Sep 30 22:43:21.776816 str2-dx010-acs-6 WARNING snmp#snmp-subagent [sonic_ax_impl] WARNING: Invalid interface name in *' in APP_DB, skipping
Sep 30 22:43:24.396925 str2-dx010-acs-6 INFO snmp#supervisord 2020-09-30 22:43:16,769 INFO spawned: 'snmp-subagent' with pid 24
Sep 30 22:43:24.396925 str2-dx010-acs-6 INFO snmp#supervisord 2020-09-30 22:43:17,823 INFO success: snmp-subagent entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Sep 30 22:43:25.973206 str2-dx010-acs-6 WARNING snmp#snmp-subagent [sonic_ax_impl] WARNING: Invalid interface name in *' in APP_DB, skipping
Sep 30 22:43:25.973206 str2-dx010-acs-6 WARNING snmp#snmp-subagent [sonic_ax_impl] WARNING: Invalid interface name in TRANSCEIVER_INFO|*                      in STATE_DB, skipping
Sep 30 22:43:48.603654 str2-dx010-acs-6 INFO snmp#snmp-subagent [sonic_ax_impl] INFO: Recieved 'SIGTERM' signal, shutting down...
Sep 30 22:43:48.604442 str2-dx010-acs-6 INFO snmp#snmp-subagent [ax_interface] INFO: AgentX socket connection closed.
Sep 30 22:43:48.604938 str2-dx010-acs-6 INFO snmp#snmp-subagent [ax_interface] INFO: Run disabled. Connection loop stopping...
Sep 30 22:43:54.422641 str2-dx010-acs-6 INFO snmp#supervisord 2020-09-30 22:43:48,602 INFO waiting for snmpd, supervisor-proc-exit-listener, snmp-subagent, rsyslogd to die
Sep 30 22:43:54.422641 str2-dx010-acs-6 INFO snmp#supervisord 2020-09-30 22:43:51,607 INFO waiting for snmpd, supervisor-proc-exit-listener, snmp-subagent, rsyslogd to die
Sep 30 22:43:54.453291 str2-dx010-acs-6 INFO snmp#snmp-subagent [sonic_ax_impl] INFO: Goodbye!
Sep 30 22:43:59.432578 str2-dx010-acs-6 INFO snmp#supervisord 2020-09-30 22:43:54,514 INFO stopped: snmp-subagent (exit status 0)

@daall
Copy link
Contributor Author

daall commented Oct 2, 2020

admin@str2-dx010-acs-6:~$ ps -aux | grep sonic_ax_impl
admin    17378  0.0  0.0   6824   892 pts/0    S+   17:56   0:00 grep sonic_ax_impl
root     23075  7.0  0.8 121460 35916 pts/0    Sl   17:13   3:06 python3 -m sonic_ax_impl

@lguohan lguohan linked a pull request Oct 2, 2020 that will close this issue
3 tasks
@lguohan lguohan closed this as completed Oct 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants