merging master #4

ANISH-GOTTAPU · 2021-03-30T14:06:59Z

Description of PR

Summary:
Fixes # (issue)

Type of change

Bug fix
Testbed and Framework(new/improvement)
Test case(new/improvement)

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Approach What is the motivation for this PR? If two pytest clients try to use the same testbed and connect to the ptf_nn_agent on the ptf, due to that ptf_nn_agent uses PAIR pattern, the the second client actually fails to connect to ptf_nn_agent, but ptfadapter won't notify the user of this failure. How did you do it? Add an extra check _check_ptf_nn_agent_availability that tries to connect to the nanomsg socket exposed by ptf_nn_agent, it will verify if there is one established connection. If not, it will raise an error. Signed-off-by: Longxiang Lyu <[email protected]>

…un them (#3166) Signed-off-by: Danny Allen <[email protected]>

Approach What is the motivation for this PR? devutils will generate exception when ping'ing hosts that doesn't have ansible_host attribute. How did you do it? skip ping action for these hosts that don't have ansible_host attribute. How did you verify/test it? ping all hosts in an inventory contains hosts have no ansible_host attribute. Signed-off-by: Ying Xie <[email protected]>

What is the motivation for this PR? To cover the stress test case in the Dual ToR Orchagent test plan https://github.com/Azure/sonic-mgmt/blob/master/docs/testplan/dual_tor/dual_tor_orch_test_plan.md How did you do it? Changes: * Added new test script for dualtor stress test * Added new fixture mock_server_ip_mac_map in dual tor mock utility * Used minigraph facts for getting Vlan interface name and IP. The minigraph facts is cached. It is faster than config facts. * Improved topology check of applying the apply_mock_dual_tor_tables and * apply_mock_dual_tor_kernel_configs fixtures How did you verify/test it? * Tested on T0 KVM. * Tested on dual ToR physical testbed. Signed-off-by: Xin Wang <[email protected]>

…ck. (#3171) Signed-off-by: Yong Zhao <[email protected]> Summary: This PR aims to increase the maximum value of Monit stable time in sanity check. Fixes # (issue) Type of change [x ] Bug fix Testbed and Framework(new/improvement) Test case(new/improvement) Approach What is the motivation for this PR? When this PR (#2890) was ran against virtual testbed, it restarted Monit service in the fixture after testing. This will cause the failure of sanity check for next test since Monit did not have enough time to initialize the states of services. How did you do it? I increase the maximum value of Monit stable time in sanity check. How did you verify/test it? I verify this on the virtual testbed by running the script kvmtest.sh to make sure pytest script can pass the test. Any platform specific information? N/A

- Simplify logic for detecting traffic duplication/disruption - Send per-server packet streams to allow measuring per-server impact - More comprehensive test reporting Co-authored-by: Lawrence Lee <[email protected]>

#3117) Approach What is the motivation for this PR? Add new attribute modular_chassis which will be set true if DUT is modular chassis and False for other Add support to run tests on tests/platform_tests/test_cpu_memory_usage.py, tests/platform_tests/test_port_toggle.py on T2 chassis, change test_show_chassis_module to run duts per hwsku. Add new tests to verify thermal local and global state db on VoQ chassis introduced by PRs sonic-net/sonic-swss-common#395 and sonic-net/sonic-platform-daemons#101 How did you do it? modifications to current tests and new test file as below: tests/common/devices/sonic.py Add new attribute modular_chassis which can be used to skip tests only for modular chassis if DUT is not chassis possible values for attribute are True if DUT is part of modular chassis and False if not part of modular chassis tests/platform_tests/cli/test_show_chassis_module.py change test select one dut per hwsku instead of running on all DUTs tests/platform_tests/test_cpu_memory_usage.py change rand_one_dut_hostname to enum_rand_one_per_hwsku_hostname to test per hwsku. tests/platform_tests/test_port_toggle.py change rand_one_dut_hostname to enum_rand_one_per_hwsku_hostname to test per hwsku. tests/platform_tests/test_thermal_state_db.py Add new test for t2 chassis to verify thermal data in local state db for all thermal sensors Verify global state db in supervisor has data for all sensors from all modules in chassis How did you verify/test it? Validated the modified tests against a chassis .

Signed-off-by: Yong Zhao [email protected] Description of PR Summary: This PR aims to test the feature of container checker and PR link is sonic-net/sonic-buildimage#6251. Fixes # (issue) Type of change Bug fix Testbed and Framework(new/improvement) [ x] Test case(new/improvement) Approach What is the motivation for this PR? This PR aims to test the feature of container checker and PR link of container checker is sonic-net/sonic-buildimage#6251. The script of container_checker was run periodically by Monit and aims to monitor the running status of each container. Currently the auto-restart feature was enabled. If a critical process exited unexpected, the container will be restarted. If the container was restarted 3 times during 20 minutes, then it will not run anymore unless we cleared the flag using the command sudo systemctl reset-failed <container_name> manually. How did you do it? This pytest script will test the script container_checker in the following steps: Stop the containers explicitly. Check whether the names of stopped containers appear in the Monit alerting message. Restart the containers by the config_reload(...). Post-check all the critical processes are running and BGP sessions are established. How did you verify/test it? I tested the PR against the physical testbed (str-dx010-acs-1) which was installed image built from public master branch.

This PR defines a new dualtor topology dualtor-120. The newly defined topo is intended to be applied on Arista-7260cx3 with SKU Arista-7260CX3-D108C8, which has 8 100G uplink ports and 112 50G downlink ports (only the first port in SFP is enabled), Signed-off-by: bingwang <[email protected]>

Description of PR Summary: This PR implements a new test case test_snmp_interfaces_mibs according to the following test plan. Test should verify correct behavior of ports MIB objects such as: ifIndex, ifMtu, ifSpeed, ifAdminStatus, ifOperStatus, ifAlias, ifHighSpeed, ifType. Fixes # (issue) Testplan: Retrieve facts for a device using SNMP Get expected values for a device from DUT per each port. Compare that facts received by SNMP are equal to values received from system. Type of change Bug fix Testbed and Framework(new/improvement) Test case(new/improvement) Approach What is the motivation for this PR? Add test case to verify proper work of Interfaces MIB How did you do it? Retrieve facts for a device using SNMP and compare with data collected from DUT per each port. How did you verify/test it? Run test on master and 201911 images, on t0 and t1 topology. snmp/test_snmp_interfaces.py::test_snmp_interfaces_mibs PASSED Signed-off-by: Andrii-Yosafat Lozovyi <[email protected]>

Fix Ipv6 Dataplane ACL test cases . When we install IPv6 Dataplane ACL's IPv6 ND/NS message start getting dropped because of BGP TCP connection does not get established and route do not get programmed for traffic forwarding. This issue is prominent to hit in reboot case as ND6 entry need to be re-learn.

* [pytest][commo] Make backend port-channel check more robust (for MASIC platforms only)

Approach What is the motivation for this PR? As teardowns in the mocking fixtures could not reset the ToR. Mocking test cases need run_garp_service to setup mac table. Tunnel server utility uses persistent configs that don't have configs set by the mocking fixtures. How did you do it? add fixture cleanup_mocked_configs to do a load minigraph in dualtor mocking testcase teardown. Remove run_garp_service autouse since for dualtor mocking testcases, single T0 DUT doesn't have the mux cable table set in config DB. Users now should explicitly call run_garp_service after calling those mocking fixtures to setup mux cable table. For tunnel traffic monitor, modify it to use the running configs that have those configs from mocking fixtures. Signed-off-by: Longxiang Lyu <[email protected]>

What is the motivation for this PR? Add check to only announce route for non-fullmesh topo testbeds. How did you do it? Add an extra check. Signed-off-by: Longxiang Lyu <[email protected]>

What is the motivation for this PR? Without the fix, the OID of lane 1 will be: .1.3.6.1.4.1.1718.3.2.3.1.31 There should be a '.' between the last 3 and last 1 How did you do it? Add a '.' if has_lanes is True

What is the motivation for this PR? `FRR` checks if a new neighbor is a already-defined neighbor within the dynamic range. And `BGPNeighbor.start_session` always starts `exa_bgp` then pushs the neighbor definitions to config_db. For t0 testbeds, the neighbor is within subnet `192.168.0.0/21` that occurs to be same as the one used by `BGPVac`. So after `BGPNeighbor.start_session` starts `exa_bgp`, `FRR` will detects the new connections as `BGPVac`, so subsequent neighbor will make `FRR` complain. As issue #3188 How did you do it? Starts exa_bgp session after pushing neighbor definitions. Signed-off-by: Longxiang Lyu <[email protected]>

Generating a mux server URL for a specific interface name fails because of the missing tbinfo argument when getting minigraph facts Signed-off-by: Lawrence Lee <[email protected]>

This PR is continuation of #2269 where infra changes were done to support multi-asic platforms. In this PR:- Fix Issue done in #2269 where asic_id is 0. LLDP test is enhanced to run on multi-asic devices.

…t hit (#3194) Some of the pfcwd tgen testcases restart swss mutiple times within a short interval. This leads to start-limit-hit failure and fails to restart swss. Signed-off-by: Neetha John <[email protected]> How did you do it? Adding the 'reset-failed' command prior to attempting a restart.

Enhance ebtable test case for multi-asic platforms.

this is used for debugging tac_plus start issue Signed-off-by: Guohan Lu <[email protected]>

* Test PFC watchdog under many-to-one traffic pattern Signed-off-by: Wei Bai [email protected] How did you do it? - Add a new traffic pattern "many to one" in tests/ixia/pfcwd/files/pfcwd_multi_node_helper.py - Add a new function __gen_m2o_traffic in tests/ixia/pfcwd/files/pfcwd_multi_node_helper.py to generate traffic, including a - PFC pause storm sent from port_id, and many data traffic items sent to port_id. - Add a new test function test_pfcwd_m2o in tests/ixia/pfcwd/test_pfcwd_m2o.py How did you verify/test it? I did test using SONiC switches and IXIA chassis. The Tgen API version is 0.0.75. The IXIA Linux API server version is 9.10.

* Fix exception in ptf_adapter 1. Fix 'AttributeError: ports' when del self.ports 2. Fix logic in reinit. Signed-off-by: bingwang <[email protected]>

) This commit has the following changes - Changes to minigraph_facts to support gather on facts for multi asic. - Support in switch_arptable module to get neighbor entries for each namespace in the Kernel - Modify the tests in files to support multi asic platforms - test_arpall.py - test_neighbor_mac_noptf.py Signed-off-by: Arvindsrinivasan Lakshminarasimhan <[email protected]>

In current document for VS setup: https://github.com/Azure/sonic-mgmt/blob/master/docs/testbed/README.testbed.VsSetup.md Inventory file name `lab` is used in the example command line for deploying minigraph to VS setup: $ ./testbed-cli.sh -t vtestbed.csv -m veos_vtb deploy-mg vms-kvm-t0 lab password.txt The correct inventory file name for VS setup should be `veos_vtb`: $ ./testbed-cli.sh -t vtestbed.csv -m veos_vtb deploy-mg vms-kvm-t0 veos_vtb password.txt Signed-off-by: Xin Wang <[email protected]>

…3191) - Wait 10 seconds to start the server after applying the config template - Do not change user credentials during rw_user test Signed-off-by: Danny Allen <[email protected]>

…ase (#3170) What is the motivation for this PR? Encounter random test failure on different trap groups with copp policer attached, when it failed the rx_pps is higher than the PPS_LIMIT_MAX. After investigation found that this is due to the deviation of rx_pps calculation on a slower server. Although we expect that the PTF should be able to finish all the packets sending in DEFAULT_SEND_INTERVAL_SEC (10s), on a slower server it may need to take more time, so extra DEFAULT_RECEIVE_WAIT_TIME (3s) has already been added. But we shall also consider the deviation in the rx_pps calculation in this case. As my test, from the TCP dump on DUT, arp request can still be continuously and evenly received on the 11th or 12th second, worst case the 13th second. The calculation of the rx_pps is taking DEFAULT_SEND_INTERVAL_SEC(10s), in this worst scenarion the deviation can be 30%, so the PPS_LIMIT_MAX should be increased from 1.1 to 1.3 times of PPS_LIMIT to cover the worst case. How did you do it? Increase the PPS_LIMIT_MAX to PPS_LIMIT * 1.3 How did you verify/test it? Run copp test cases and it passing Signed-off-by: Kebo Liu <[email protected]>

Added new monit test case to capture log message for process not running for both single and multi-asic platforms.

Test is enhanced to run for multi-asic platforms. Optimized test cases not to get lag_facts() everytime but pass as argument.

* Update for route and SLB test cases

What is the motivation for this PR? Automate deployment and cleanup of topo_t0-56-po2vlan.yml How did you do it? Add ptf_portchannel.yml and ptf_portchannel.py to support portchannel_configuration in ptf. How did you verify/test it? Create the topology, topo_t0-56-po2vlan.yml, and execute show int portchannel in DUT, we should see that the PortChannel101 is UP. Signed-off-by: Ze Gan <[email protected]>

What is the motivation for this PR? We run a PTF script on PTF to sniff traffic. The PTF script calls scapy's sendrecv.sniff which by default capture on all the PTF interfaces, including the backplane interface for announcing routes from PTF to VMs. On VMs, the PTF backplane is the next hop for the announced routes. So, packets sent by DUT to VMs are forwarded to the PTF backplane interface as well. Then on PTF, the packets sent by DUT to VMs can be captured on both the PTF interfaces tapped to VMs and on the backplane interface. This will result in packet duplication and fail the test. How did you do it? This change is to add capture filter to filter out all the packets destined to the PTF backplane interface. Scapy 2.3 supports specifying a list of interfaces to be sniffed on. I have tried scapy 2.3 to sniff only on the PTF ethx ports. The problem is that v2.3 scapy's sniff function with iface list argument needs a long time (18 seconds on my test setup, probably different on other test setups) to startup. This makes it difficult to decide when to start the sender thread after the sniffer thread is started. After some experiments, it is much more easier to just use the current v2.2 scapy and update the capture filter to filter out all the traffic destined to the PTF backplane interface. How did you verify/test it? Test run a dual tor script to send upstream traffic. Examine the captured pcap file. No packets to the PTF backplane interface are captured. Signed-off-by: Xin Wang <[email protected]>

Currently the topology infrastructure supports a different management gateway for VMs. The VM startup config templates for t0 topologies have the logic of using {{ vm_mgmt_gw }} for vrf MGMT. If {{ vm_mgmt_gw }} is not defined, fallback to use {{ mgmt_gw }}. Variables "vm_mgmt_gw" and "mgmt_gw" can be defined in test server's host var file under ansible/host_vars'. The VM startup config templates for t1 topologies do not have the same logic. They always use {{ mgmt_gw }} as default gateway of vrf MGMT. This change is to align t1 with t0. Same logic is added to the VM startup config templates for t1 topologies. Signed-off-by: Xin Wang <[email protected]>

Enhance ACL test itself to run on multi-asic platforms. acl_table_ports will have port binding for each namespace. Updated mg_facts() library to associate External Port-Channel to namespace Updated interface_facts() to return data from all namespace/asic's on multi-asic platforms. Update port_toggle() utility to work for multi-asic/namespace.

…alizer states (#3184) Catch reboot errors in test. The advance warm-reboot test is seen to pass even with failures or when reboot never happened. Changes included: Increase the paramiko command execution timeout from default (10) to 30. This helps in retaining any errors that might happen during warm/fast reboot. Check warmboot finalizer states. Sometimes, after warmreboot, the finalizer never reaches activating state, and this is missed by test.

Rename variable name from sonic to vsonic. Because in the future, we will involve sonic container as the neighbor devices. The variable name vsonic is better and clearer to differentiate sonic vm with sonic container. Signed-off-by: Ze Gan <[email protected]>

What is the motivation for this PR? Add initial connection_db setup and provision. How did you do it? Modify creategraphy.py to include devices meta for Server and Sonic devices in connection graph files. Add two Ansible lookup plugins: graphfile: Find the connection graph file that defineds those DUTs listed in the testbed. servercfgd_client: Dispatches calls to the remote functions in servercfgd. Add Ansible role connection_db with the following three actions: start_db: Install Redis related depedencies for connection_db and servercfgd stop_db: Remove Redis related dependencies. provision_db: Calls provision_connection_db remote call of servercfgd to provision connection_db with the connection graph file that contains DUTs defined in this testbed. Signed-off-by: Longxiang Lyu <[email protected]>

In mux sanity check, 'pytest_assert' is used to verify some conditions. The pytest_assert checkes are protected using try...except AssertionError. However, 'pytest_assert' does raise exception AssertionError. It simply calls pytest.fail(msg) to fail the test earlier. Ideally, all the sanity check function should be fully executed and only return results to indication if it is successful. Then the sanity check plugin can have a chance to recover the testbed after inspected all the checked results. This change simply replaced "pytest_assert" in the check_mux_simulator function with plain "if" check. Signed-off-by: Xin Wang <[email protected]>

…lds (#3209) After the new "inv_name" and "auto_recover" fields were added to testbed.csv file, the testbed.py tool for generating yaml format testbed file needs to be updated. The testbed-cli.sh tool also has some issues with dealing with yaml format testbed file. This PR is to fix these issues. Changes: 1. Update tests/common/testbed.py to support generating testbed.yaml file based on the new testbed.csv file. 2. Tool testbed-cli.sh has issue of reading yaml testbed file. If a testbed name is substring of another testbed name, the tool may found multiple entries for the testbed with shorter name while parsing yaml testbed file. This PR changed the string "in" match with "==" match. 3. Another issue is that the testbed-cli.sh tool does not support reading new "inv_name" and "auto_recover" fields from yaml testbed file. 4. PR #3127 added 'announce-routes' function to the testbed-cli.sh tool. This new function depends on the new "inv_name" field. That PR added a new variable "inventory" to the read_file functions to store value of the "inv_name" field. Unfortunately, some other functions ,including deploy_minigraph and generate_minigraph, use a same "inventory" variable for storing inventory file name passed in from command line. There are potential conflicts under some scenarios. This change updated the variable name in read_file to "inv_file" to avoid this potential conflict. 5. Updated the vtestbed.yaml file to make it consistent with vtestbed.csv. Signed-off-by: Xin Wang <[email protected]>

lolyu and others added 30 commits March 17, 2021 21:42

[tacacs] Check if commands exist on the device before attempting to r…

066faa1

…un them (#3166) Signed-off-by: Danny Allen <[email protected]>

Dual ToR IO test enhancements (#3136)

087ed41

- Simplify logic for detecting traffic duplication/disruption - Send per-server packet streams to allow measuring per-server impact - More comprehensive test reporting Co-authored-by: Lawrence Lee <[email protected]>

[pytest common] Make backend portchannel check more robust (#3165)

03f29e8

* [pytest][commo] Make backend port-channel check more robust (for MASIC platforms only)

[start_topo] Announce route for non-fullmesh topos (#3190)

27e3c13

What is the motivation for this PR? Add check to only announce route for non-fullmesh topo testbeds. How did you do it? Add an extra check. Signed-off-by: Longxiang Lyu <[email protected]>

[snmp_pdu_controllers] Fix pdu oid issue (#3193)

f8a1b0b

What is the motivation for this PR? Without the fix, the OID of lane 1 will be: .1.3.6.1.4.1.1718.3.2.3.1.31 There should be a '.' between the last 3 and last 1 How did you do it? Add a '.' if has_lanes is True

[mux_sim_ctrl]: Fix url fixture when specifying interface (#3187)

da134ff

Generating a mux server URL for a specific interface name fails because of the missing tbinfo argument when getting minigraph facts Signed-off-by: Lawrence Lee <[email protected]>

Enhance LLDP test case for multi-asic platforms (#3137)

aa0f670

This PR is continuation of #2269 where infra changes were done to support multi-asic platforms. In this PR:- Fix Issue done in #2269 where asic_id is 0. LLDP test is enhanced to run on multi-asic devices.

Enhance ebtable test case for multi-asic platforms (#3169)

9847391

Enhance ebtable test case for multi-asic platforms.

[tacacs]: log tac_plus to /var/log/tac_plus.log (#3205)

a47e6df

this is used for debugging tac_plus start issue Signed-off-by: Guohan Lu <[email protected]>

[framework] Fix exceptions in ptf_adapter (#3196)

6b714fc

* Fix exception in ptf_adapter 1. Fix 'AttributeError: ports' when del self.ports 2. Fix logic in reinit. Signed-off-by: bingwang <[email protected]>

[tacacs] Address race conditions during TACACS server setup/teardown (#…

35e634e

…3191) - Wait 10 seconds to start the server after applying the config template - Do not change user credentials during rw_user test Signed-off-by: Danny Allen <[email protected]>

Enhancement to monit test case. (#3174)

cd0ae96

Added new monit test case to capture log message for process not running for both single and multi-asic platforms.

Enhanced test_lag_2.py for multi-asic platfroms (#3168)

03bb16e

Test is enhanced to run for multi-asic platforms. Optimized test cases not to get lag_facts() everytime but pass as argument.

[DualTor] Update orch test for route cases (#3157)

8821ac8

* Update for route and SLB test cases

Pterosaur and others added 9 commits March 29, 2021 14:28

ANISH-GOTTAPU merged commit 672a4f6 into ANISH-GOTTAPU:master Mar 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merging master #4

merging master #4

ANISH-GOTTAPU commented Mar 30, 2021

merging master #4

merging master #4

Conversation

ANISH-GOTTAPU commented Mar 30, 2021

Description of PR

Type of change

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation