Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPB] Fix a potential command failure when break out a port that is a member of portchannel. #3106

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

sandyw777
Copy link

What I did

Fix issue: sonic-net/sonic-buildimage#10005 where DPB may not be able to break out ports when these ports are port-channel member. Upon analyzing the output obtained via “redis-cli monitor” command, we found the Consumer cannot process DEL port request in time before executing HSET admin_status and mtu operation from teammgr removeLagMember function.
This sequence mismatch led to the failure in port deletion, causing Orchagent to miss the DELETE message.

How I did it

The _deletePorts function was split into two parts:
Part1: Deletion of dependencies
Part2: Deletion of port itself
Additionally an 1 second delay was added between Part1 and Part2 to ensure the serialization of ConfigDB update.

Example:

  • Original deleted config:
    ConfigMgmt: Write in DB: {'PORT': {'Ethernet0': None}, 'PORTCHANNEL_MEMBER': None}

  • Separate the deleted config into two parts (solution):

part1). 'delete dependencies'(delConfigDepToLoad):
ConfigMgmt: Write in DB: {'PORTCHANNEL_MEMBER': None}

sleep 1 sec...

part2). 'delete port'(delConfigToLoad):
ConfigMgmt: Write in DB: {'PORT': {'Ethernet0': None}}

How to verify it

Step1. Create portchannel with Ethernet0

Step2. Dynamic port breakout:
admin@sonic:~$ sudo config interface breakout Ethernet0 4x10G[10G,1G] -fy

Step3. Check console:
admin@sonic:~$ sudo config interface breakout Ethernet0 4x10G[10G,1G] -fy

Running Breakout Mode : 1x40G[40G,10G,1G]
Target Breakout Mode : 4x10G[10G,1G]

Ports to be deleted :
{
"Ethernet0": "40000"
}
Ports to be added :
{
"Ethernet0": "10000",
"Ethernet1": "10000",
"Ethernet2": "10000",
"Ethernet3": "10000"
}
Breakout process got successfully completed.
Please note loaded setting will be lost after system reboot. To preserve setting, run config save.

Step4. Check interface:

admin@sonic:~$ show interface status
  Interface            Lanes    Speed    MTU    FEC           Alias    Vlan    Oper    Admin    Type    Asym PFC
-----------  ---------------  -------  -----  -----  --------------  ------  ------  -------  ------  ----------
  Ethernet0               25      10G   9100    N/A          etp1-1  routed    down     down     N/A         N/A
  Ethernet1               26      10G   9100    N/A          etp1-2  routed    down     down     N/A         N/A
  Ethernet2               27      10G   9100    N/A          etp1-3  routed    down     down     N/A         N/A
  Ethernet3               28      10G   9100    N/A          etp1-4  routed    down     down     N/A         N/A
  Ethernet4      29,30,31,32      40G   9100    N/A    fortyGigE0/4  routed    down     down     N/A         N/A

Step5. Check syslog:
"Deleting Port Ethernet0" can be found. Orchagent received the DELETE message.

Step6. Check redis-cli monitor:
HSET admin_status/mtu and DEL port by Consumer are Separated and DEL Ethernet0 success.

Producer:
1703584555.434850 [0 unix:/var/run/redis/redis.sock] "EVALSHA" "6875900592cdd1621c6191fe038ec3b29775aa13" "3" "PORT_TABLE_CHANNEL@0" "PORT_TABLE_KEY_SET" "_PORT_TABLE:Ethernet0" "G" "Ethernet0" "mtu" "9100"
1703584555.434862 [0 lua] "SADD" "PORT_TABLE_KEY_SET" "Ethernet0"
1703584555.434877 [0 lua] "HSET" "_PORT_TABLE:Ethernet0" "mtu" "9100"
1703584555.434889 [0 lua] "PUBLISH" "PORT_TABLE_CHANNEL@0" "G"

Write in DB: {'PORTCHANNEL_MEMBER': None}
1703584555.436248 [4 127.0.0.1:33054] "DEL" "PORTCHANNEL_MEMBER|PortChannel01|Ethernet0"

Producer:
1703584555.439558 [0 unix:/var/run/redis/redis.sock] "EVALSHA" "6875900592cdd1621c6191fe038ec3b29775aa13" "3" "PORT_TABLE_CHANNEL@0" "PORT_TABLE_KEY_SET" "_PORT_TABLE:Ethernet0" "G" "Ethernet0" "admin_status" "down"
1703584555.439577 [0 lua] "SADD" "PORT_TABLE_KEY_SET" "Ethernet0"
1703584555.439584 [0 lua] "HSET" "_PORT_TABLE:Ethernet0" "admin_status" "down"

Producer:
1703584555.471847 [0 unix:/var/run/redis/redis.sock] "EVALSHA" "6875900592cdd1621c6191fe038ec3b29775aa13" "4" "PORT_TABLE_CHANNEL@0" "PORT_TABLE_KEY_SET" "_PORT_TABLE:Ethernet0" "_PORT_TABLE:Ethernet0" "G" "Ethernet0" "admin_status" "down" "mtu" "9100"
1703584555.471914 [0 lua] "SADD" "PORT_TABLE_KEY_SET" "Ethernet0"
1703584555.471920 [0 lua] "HSET" "_PORT_TABLE:Ethernet0" "admin_status" "down"
1703584555.471937 [0 lua] "HSET" "_PORT_TABLE:Ethernet0" "mtu" "9100"

Consumer:
1703584556.044427 [0 unix:/var/run/redis/redis.sock] "EVALSHA" "88270a7c5c90583e56425aca8af8a4b8c39fe757" "3" "PORT_TABLE_KEY_SET" "PORT_TABLE:" "PORT_TABLE_DEL_SET" "1024" "_"
1703584556.044438 [0 lua] "SPOP" "PORT_TABLE_KEY_SET" "1024"
1703584556.044574 [0 lua] "SREM" "PORT_TABLE_DEL_SET" "Ethernet0"
1703584556.044606 [0 lua] "HGETALL" "_PORT_TABLE:Ethernet0"
1703584556.044614 [0 lua] "HSET" "PORT_TABLE:Ethernet0" "mtu" "9100"
1703584556.044631 [0 lua] "HSET" "PORT_TABLE:Ethernet0" "admin_status" "down"
1703584556.044645 [0 lua] "DEL" "_PORT_TABLE:Ethernet0"

Write in DB: {'PORT': {'Ethernet0': None}}
1703584556.439136 [4 127.0.0.1:33066] "DEL" "PORT|Ethernet0"

Producer:
1703584556.440402 [0 unix:/var/run/redis/redis.sock] "EVALSHA" "88ba6312b8de850b3506966425174d8899aadd93" "4" "PORT_TABLE_CHANNEL@0" "PORT_TABLE_KEY_SET" "_PORT_TABLE:Ethernet0" "PORT_TABLE_DEL_SET" "G" "Ethernet0" "''" "''"
1703584556.440419 [0 lua] "SADD" "PORT_TABLE_KEY_SET" "Ethernet0"
1703584556.440435 [0 lua] "SADD" "PORT_TABLE_DEL_SET" "Ethernet0"
1703584556.440449 [0 lua] "DEL" "_PORT_TABLE:Ethernet0"
1703584556.440453 [0 lua] "PUBLISH" "PORT_TABLE_CHANNEL@0" "G"

Consumer:
1703584556.869997 [0 unix:/var/run/redis/redis.sock] "EVALSHA" "88270a7c5c90583e56425aca8af8a4b8c39fe757" "3" "PORT_TABLE_KEY_SET" "PORT_TABLE:" "PORT_TABLE_DEL_SET" "1024" "_"
1703584556.870010 [0 lua] "SPOP" "PORT_TABLE_KEY_SET" "1024"
1703584556.870060 [0 lua] "SREM" "PORT_TABLE_DEL_SET" "Ethernet0"
1703584556.870251 [0 lua] "DEL" "PORT_TABLE:Ethernet0"
1703584556.870269 [0 lua] "HGETALL" "_PORT_TABLE:Ethernet0"
1703584556.870274 [0 lua] "DEL" "_PORT_TABLE:Ethernet0"

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

@sandyw777
Copy link
Author

@alexrallen @dgsudharsan @praveen-li @zhenggen-xu
Could you help to review the commit? Thank you!

@puffc
Copy link
Contributor

puffc commented Jan 27, 2024

@alexrallen @dgsudharsan @praveen-li @zhenggen-xu Please review and merge this pr. thanks.

# -- Update deletion of ports in Config DB,
# -- verify Asic DB for port deletion,
# -- then update addition of ports in config DB.
self._shutdownIntf(delPorts)
self.writeConfigDB(delConfigDepToLoad)
tsleep(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a static value for sleep, Can we check if the PORT is removed from APPL_DB and proceed further when it is deleted

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I recall, delConfigToLoad should contain all the dependencies as well. Kindly paste an example, how delConfigDepToLoad and delConfigToLoad are different. Thanks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 466 verifies in ASIC_DB, is that not enough ? do we still need extra sleep. Sleep(1) is not deterministic.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okie, got the gist now. Except sleep(1), rest from logic POV looks good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a static value for sleep, Can we check if the PORT is removed from APPL_DB and proceed further when it is deleted

@vivekrnv Just a confirmation, are you saying a busy waiting loop to replace the sleep?

Copy link
Contributor

@vivekrnv vivekrnv Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you paste an example b/w delConfigDepToLoad and delConfigToLoad? If self.writeConfigDB(delConfigDepToLoad) is for dependencies, then there is no reliable way to check if the corresponding object is deleted in the ASIC. In that case, static sleep looks fine to me

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sandyw777, Kindly add a comment that 'delConfigToLoad' will contain only ports after this fix, if not done already.
Seems like, we have to live with static sleep as of now. @puffc @vivekrnv

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vivekrnv Here is the example of delConfigDepToLoad and delConfigToLoad:

  • delConfigDepToLoad: {'BUFFER_PG': {'Ethernet32|0': None, 'Ethernet32|1-2': None, 'Ethernet32|3-4': None, 'Ethernet32|5-7': None}, 'BUFFER_QUEUE': {'Ethernet32|0-2': None, 'Ethernet32|3-4': None, 'Ethernet32|5-7': None}, 'CABLE_LENGTH': {'AZURE': {'Ethernet32': None}}, 'PORTCHANNEL_MEMBER': {'PortChannel1|Ethernet32': None}, 'PORT_QOS_MAP': {'Ethernet32': None}, 'QUEUE': {'Ethernet32|0': None, 'Ethernet32|1': None, 'Ethernet32|2': None, 'Ethernet32|3': None, 'Ethernet32|4': None, 'Ethernet32|5': None, 'Ethernet32|6': None}}

  • delConfigToLoad: {'PORT': {'Ethernet32': None}}

In this example, we breakout the Ethernet32 which is in a portchannel.
And below is the log of ConfigMgmt in this example:

admin@sonic:~$  show logging | grep "ConfigMgmt" 
Aug 19 07:04:48.170632 sonic INFO ConfigMgmt: Reading data from Redis configDb
Aug 19 07:04:48.271659 sonic INFO ConfigMgmt: delPorts ports:['Ethernet32'] force:True
Aug 19 07:04:48.271765 sonic INFO ConfigMgmt: Start Port Deletion
Aug 19 07:04:48.271843 sonic INFO ConfigMgmt: Find dependecies for port Ethernet32
Aug 19 07:04:48.274979 sonic INFO ConfigMgmt: Deleting /sonic-portchannel:sonic-portchannel/PORTCHANNEL_MEMBER/PORTCHANNEL_MEMBER_LIST[name='PortChannel1'][port='Ethernet32']/port
Aug 19 07:04:48.275250 sonic INFO ConfigMgmt: Deleting /sonic-buffer-pg:sonic-buffer-pg/BUFFER_PG/BUFFER_PG_LIST[port='Ethernet32'][pg_num='0']/port
Aug 19 07:04:48.276100 sonic INFO ConfigMgmt: Deleting /sonic-buffer-pg:sonic-buffer-pg/BUFFER_PG/BUFFER_PG_LIST[port='Ethernet32'][pg_num='1-2']/port
Aug 19 07:04:48.276939 sonic INFO ConfigMgmt: Deleting /sonic-buffer-pg:sonic-buffer-pg/BUFFER_PG/BUFFER_PG_LIST[port='Ethernet32'][pg_num='3-4']/port
Aug 19 07:04:48.277746 sonic INFO ConfigMgmt: Deleting /sonic-buffer-pg:sonic-buffer-pg/BUFFER_PG/BUFFER_PG_LIST[port='Ethernet32'][pg_num='5-7']/port
Aug 19 07:04:48.278513 sonic INFO ConfigMgmt: Deleting /sonic-buffer-queue:sonic-buffer-queue/BUFFER_QUEUE/BUFFER_QUEUE_LIST[port='Ethernet32'][qindex='0-2']/port
Aug 19 07:04:48.279201 sonic INFO ConfigMgmt: Deleting /sonic-buffer-queue:sonic-buffer-queue/BUFFER_QUEUE/BUFFER_QUEUE_LIST[port='Ethernet32'][qindex='3-4']/port
Aug 19 07:04:48.279827 sonic INFO ConfigMgmt: Deleting /sonic-buffer-queue:sonic-buffer-queue/BUFFER_QUEUE/BUFFER_QUEUE_LIST[port='Ethernet32'][qindex='5-7']/port
Aug 19 07:04:48.280442 sonic INFO ConfigMgmt: Deleting /sonic-cable-length:sonic-cable-length/CABLE_LENGTH/CABLE_LENGTH_LIST[name='AZURE']/CABLE_LENGTH[port='Ethernet32']/port
Aug 19 07:04:48.280766 sonic INFO ConfigMgmt: Deleting /sonic-port-qos-map:sonic-port-qos-map/PORT_QOS_MAP/PORT_QOS_MAP_LIST[ifname='Ethernet32']/ifname
Aug 19 07:04:48.281273 sonic INFO ConfigMgmt: Deleting /sonic-queue:sonic-queue/QUEUE/QUEUE_LIST[ifname='Ethernet32'][qindex='0']/ifname
Aug 19 07:04:48.283087 sonic INFO ConfigMgmt: Deleting /sonic-queue:sonic-queue/QUEUE/QUEUE_LIST[ifname='Ethernet32'][qindex='1']/ifname
Aug 19 07:04:48.284947 sonic INFO ConfigMgmt: Deleting /sonic-queue:sonic-queue/QUEUE/QUEUE_LIST[ifname='Ethernet32'][qindex='2']/ifname
Aug 19 07:04:48.286838 sonic INFO ConfigMgmt: Deleting /sonic-queue:sonic-queue/QUEUE/QUEUE_LIST[ifname='Ethernet32'][qindex='3']/ifname
Aug 19 07:04:48.288563 sonic INFO ConfigMgmt: Deleting /sonic-queue:sonic-queue/QUEUE/QUEUE_LIST[ifname='Ethernet32'][qindex='4']/ifname
Aug 19 07:04:48.290267 sonic INFO ConfigMgmt: Deleting /sonic-queue:sonic-queue/QUEUE/QUEUE_LIST[ifname='Ethernet32'][qindex='5']/ifname
Aug 19 07:04:48.291955 sonic INFO ConfigMgmt: Deleting /sonic-queue:sonic-queue/QUEUE/QUEUE_LIST[ifname='Ethernet32'][qindex='6']/ifname
Aug 19 07:04:48.318297 sonic INFO ConfigMgmt: Data Validation successful
Aug 19 07:04:48.332257 sonic INFO ConfigMgmt: Generate Final Config to write in DB
Aug 19 07:04:48.335846 sonic INFO ConfigMgmt: Deleting Port: Ethernet32
Aug 19 07:04:48.360377 sonic INFO ConfigMgmt: Data Validation successful
Aug 19 07:04:48.373320 sonic INFO ConfigMgmt: Generate Final Config to write in DB
Aug 19 07:04:48.376482 sonic INFO ConfigMgmt: Start Port Addition
Aug 19 07:04:48.376587 sonic INFO ConfigMgmt: addPorts Args portjson: {'PORT': {'Ethernet32': {'alias': 'Eth5-1', 'lanes': '217,218', 'speed': '100000', 'index': '5'}, 'Ethernet34': {'alias': 'Eth5-2', 'lanes': '219,220', 'speed': '100000', 'index': '5'}, 'Ethernet36': {'alias': 'Eth5-3', 'lanes': '221,222', 'speed': '100000', 'index': '5'}, 'Ethernet38': {'alias': 'Eth5-4', 'lanes': '223,224', 'speed': '100000', 'index': '5'}}} loadDefConfig: False
Aug 19 07:04:48.458260 sonic INFO ConfigMgmt: Data Validation successful
Aug 19 07:04:48.458499 sonic INFO ConfigMgmt: Generate Final Config to write in DB
Aug 19 07:04:48.463975 sonic INFO ConfigMgmt: shutdown Interfaces: {'PORT': {'Ethernet32': {'admin_status': 'down'}}}
Aug 19 07:04:48.464167 sonic INFO ConfigMgmt: Writing in Config DB
Aug 19 07:04:48.464353 sonic INFO ConfigMgmt: Write in DB: {'PORT': {'Ethernet32': {'admin_status': 'down'}}}
Aug 19 07:04:48.464493 sonic INFO ConfigMgmt: Writing in Config DB
Aug 19 07:04:48.465082 sonic INFO ConfigMgmt: Write in DB: {'BUFFER_PG': {('Ethernet32', '0'): None, ('Ethernet32', '1-2'): None, ('Ethernet32', '3-4'): None, ('Ethernet32', '5-7'): None}, 'BUFFER_QUEUE': {('Ethernet32', '0-2'): None, ('Ethernet32', '3-4'): None, ('Ethernet32', '5-7'): None}, 'CABLE_LENGTH': {'AZURE': {'Ethernet32': None}}, 'PORTCHANNEL_MEMBER': {('PortChannel1', 'Ethernet32'): None}, 'PORT_QOS_MAP': {'Ethernet32': None}, 'QUEUE': {('Ethernet32', '0'): None, ('Ethernet32', '1'): None, ('Ethernet32', '2'): None, ('Ethernet32', '3'): None, ('Ethernet32', '4'): None, ('Ethernet32', '5'): None, ('Ethernet32', '6'): None}}
Aug 19 07:04:49.476798 sonic INFO ConfigMgmt: Writing in Config DB
Aug 19 07:04:49.476964 sonic INFO ConfigMgmt: Write in DB: {'PORT': {'Ethernet32': None}}
Aug 19 07:04:49.477238 sonic INFO ConfigMgmt: Verify Port Deletion from Asic DB, Wait...
Aug 19 07:04:49.477688 sonic INFO ConfigMgmt: Check Key in Asic DB: ASIC_STATE:SAI_OBJECT_TYPE_PORT:oid:0x1000000000153
Aug 19 07:04:50.480308 sonic INFO ConfigMgmt: Check Key in Asic DB: ASIC_STATE:SAI_OBJECT_TYPE_PORT:oid:0x1000000000153
Aug 19 07:04:50.480524 sonic INFO ConfigMgmt: Writing in Config DB
Aug 19 07:04:50.480658 sonic INFO ConfigMgmt: Write in DB: {'PORT': {'Ethernet32': {'alias': 'Eth5-1', 'lanes': '217,218', 'speed': '100000', 'index': '5'}, 'Ethernet34': {'alias': 'Eth5-2', 'lanes': '219,220', 'speed': '100000', 'index': '5'}, 'Ethernet36': {'alias': 'Eth5-3', 'lanes': '221,222', 'speed': '100000', 'index': '5'}, 'Ethernet38': {'alias': 'Eth5-4', 'lanes': '223,224', 'speed': '100000', 'index': '5'}}}
admin@sonic:~$

Copy link

@praveen-li praveen-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sandyw777 Kindly add new tests to verify that 'writeConfigDB' is called with Deps and then with ports. Rest Looks good to me. Thanks for fixing it.

# -- Update deletion of ports in Config DB,
# -- verify Asic DB for port deletion,
# -- then update addition of ports in config DB.
self._shutdownIntf(delPorts)
self.writeConfigDB(delConfigDepToLoad)
tsleep(1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sandyw777, Kindly add a comment that 'delConfigToLoad' will contain only ports after this fix, if not done already.
Seems like, we have to live with static sleep as of now. @puffc @vivekrnv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants