Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a couple issues in change_lag_lacp_timer #15778

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

arista-nwolfe
Copy link
Contributor

#14469 was introduced to increase the LACP timeout of EOS neighbors during QOS tests.
This was required because the QOS tests would disable TX credits on the test's dest ports, without TX credits LACP packets won't egress and eventually the LAG will be brought down by the EOS neighbor due to LACP timeout.

This change adds a couple improvements to that existing PR:
-Updated to work with single-asic LCs
-Updated to change the LACP timeout multiplier for all 3 dst_port_id neighbors.

Also made testQosSaiPfcXonLimit use fixture change_lag_lacp_timer

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Updated to work with single-asic LCs
Updated to work on all 3 dst_port_ids
Also made testQosSaiPfcXonLimit use fixture change_lag_lacp_timer
for port_ch, port_intf in dst_mgfacts['minigraph_portchannels'].items():
for member in port_intf['members']:
if member in dst_interfaces:
lag_names.append( port_ch )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we get lag_names on all dst device, regardless of single-asic/multi-asic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some QOS tests send packets to multiple destinations dst_port_id, dst_port_2_id, dst_port_3_id
We disable TX on all 3 of these ports:
https://github.com/sonic-net/sonic-mgmt/blob/master/tests/saitests/py3/sai_qos_tests.py#L2543
And any of these 3 ports could be a LAG so therefor needs to have the LACP multiplier increased on the neighboring device.

@wenyiz2021
Copy link
Contributor

@arista-nwolfe , could you paste the error/failures without this PR?
also can you run it on a T0 topo as well, to make sure it won't fail on single-asic device?

@arista-nwolfe
Copy link
Contributor Author

@arista-nwolfe , could you paste the error/failures without this PR? also can you run it on a T0 topo as well, to make sure it won't fail on single-asic device?

The failure will likely manifest as something like:
AssertionError: unexpectedly PFC counter not increase, after send a few packets to trigger PFC

This is because one of the destination port-channels went down too early causing the packets that we enqueued to be dropped instead of stored in the destination port's buffer.
Without the packet in the buffer we don't hit the xoff threshold and don't trigger PFCs.

@arista-nwolfe
Copy link
Contributor Author

@arista-nwolfe , could you paste the error/failures without this PR? also can you run it on a T0 topo as well, to make sure it won't fail on single-asic device?

Regarding the T0 results, all this code is guarded to only run on broadcom-dnx:
dutTestParams["basicParams"]["platform_asic"] == "broadcom-dnx"
So because we have no broadcom-dnx platforms at the T0 level it should have no effect.

@bingwang-ms
Copy link
Collaborator

@XuChen-MSFT Can you please help review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

3 participants