Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sonic-db-cli -n <asic-ns> CHASSIS_APP_DB EVAL fails intermittently after config reload or load-minigraph #17945

Closed
saksarav-nokia opened this issue Jan 30, 2024 · 6 comments
Labels
Chassis 🤖 Modular chassis support

Comments

@saksarav-nokia
Copy link
Contributor

saksarav-nokia commented Jan 30, 2024

Description

When the config reload or load-minigraph is done , swss dockers are restarted and when swss comes back up, swss.sh script cleans the SYSYTEM_NEIGH, SYSTEM_INTERFACE, SYSTEM_LAG_MEMBER_TABLE and SYSTEM_LAG_ID_TABLE entries added by this LC+asic from the CHASSIS_APP_DB table using sonic-db-cli -n CHASSIS_APP_DB EVAL command.

Even though the EVAL command is called after sonic-db-cli -n CHASSIS_APP_DB PING is successful, the EVAL command fails sometimes.

When we replaced the "sonic-db-cli -n CHASSIS_APP_DB EVAL" with "redis-cli -h 10.6.0.100 -p 6380 -n 12 EVAL", the issues are not seen.

Steps to reproduce the issue:

  1. Run full OC test or write a simple bash script which just does config reload or load-minigraph 100 times with 7 mins sleep between each config reload or load-migraph.
  2. Check the syslog for "Invalid database name input : 'CHASSIS_APP_DB' error
  3. Check for orchagent crash in the remote LC's

Describe the results you received:

  1. Error in syslog with "Invalid database name input : 'CHASSIS_APP_DB' , with Unable to connect to redis: Cannot assign requested address
  2. The orchagent crash in the remote LC's

Describe the results you expected:

No errors in syslog and no crash

Output of show version:

SONiC Software Version: SONiC.20220532.54
SONiC OS Version: 11
Distribution: Debian 11.8
Kernel: 5.10.0-23-2-amd64
Build commit: b9e6caad98
Build date: Tue Jan 9 00:13:06 UTC 2024
Built by: cloudtest@95bebd0dc000000

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@saksarav-nokia
Copy link
Contributor Author

@judyjoseph
Copy link
Contributor

judyjoseph commented Jan 31, 2024

@qiluo-msft, below is the question from Shakthi

When we replaced the "sonic-db-cli -n CHASSIS_APP_DB EVAL" with "redis-cli -h 10.6.0.100 -p 6380 -n 12 EVAL", the issues are not seen.

Is there any difference between these two calls ? This is the place where we refer to, currently we use sonic-db-cli :

num_neigh=`$SONIC_DB_CLI CHASSIS_APP_DB EVAL "

@judyjoseph
Copy link
Contributor

@saksarav-nokia just noticed you see this error "Unable to connect to redis: Cannot assign requested address" -- which means some issue in local connectivity, local interface not available at this time

Can you check if the network-config service is still setting up local interface ?
In case of namespace we have macvlan -- is it taking time to configure that locally ?

@saksarav-nokia
Copy link
Contributor Author

saksarav-nokia commented Feb 1, 2024

@judyjoseph , if it is connectivity issue, how come the redis-cli command does not fail. It worked fine with 300 config reload.
Also the ping from ns to the SUP midplane IP worked without any packet loss with flooding 250K packets during config reload.

@saksarav-nokia
Copy link
Contributor Author

saksarav-nokia commented Feb 7, 2024

@judyjoseph , We realized hostname-config.service is restarted every time the config reload or load-minigraph is done and this service updates the /etc/hosts and this is conflicting with the swss service chassis-db-cleanup. We didn't see the issue with adding "After=hostname-config.service" in [email protected] and we will run full OC this weekend with the change and update the PR if everything goes fine.

mv -f /etc/hosts /etc/hosts.old
mv -f /etc/hosts.new /etc/hosts

@saksarav-nokia
Copy link
Contributor Author

admin@lc4:$ sudo mv /etc/hosts /etc/hosts.old
admin@lc4:
$ sonic-db-cli -n asic0 CHASSIS_APP_DB PING
Invalid database name input : 'CHASSIS_APP_DB'
Unable to connect to redis: Cannot assign requested address
admin@lc4:$ sonic-db-cli CHASSIS_APP_DB PING
Invalid database name input : 'CHASSIS_APP_DB'
Unable to connect to redis: Cannot assign requested address
admin@lc4:
$ sudo mv /etc/hosts.old /etc/hosts
admin@lc4:$ sonic-db-cli CHASSIS_APP_DB PING
True
admin@lc4:
$ sonic-db-cli -n asic0 CHASSIS_APP_DB PING
True

rlhui pushed a commit that referenced this issue Apr 18, 2024
…re valid in CONIFG_DB before starting chassis-db-cleanup (#17962)

This PR fixes the issue reported in Issu #17945
We noticed that chassis db clean up is skipped sometimes when the CHASSIS_APP_DB PING fails. Also if host_name and asic_name are not written to CONIG_DB, it could pass the empty strings to CHASSIS_APP_DB EVAL commands.
The service hostname-config.service is restarted whenever the config-reload or load-minigraph is done and this services renames the file /etc/hosts to updates it with the new file. This interferes with [email protected] and when swss.sh script CHASSIS_APP_DPP when the /etc/hosts file is renamed, the error "Unable to connect to redis: Cannot assign requested address" is seen and the CHASSIS_APP_DB EVAL command fails. This causes the chassis db entries not getting cleaned up and causes orchagent crash in remote LC's.

---------

Signed-off-by: saksarav <[email protected]>
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this issue Apr 23, 2024
…re valid in CONIFG_DB before starting chassis-db-cleanup (sonic-net#17962)

This PR fixes the issue reported in Issu sonic-net#17945
We noticed that chassis db clean up is skipped sometimes when the CHASSIS_APP_DB PING fails. Also if host_name and asic_name are not written to CONIG_DB, it could pass the empty strings to CHASSIS_APP_DB EVAL commands.
The service hostname-config.service is restarted whenever the config-reload or load-minigraph is done and this services renames the file /etc/hosts to updates it with the new file. This interferes with [email protected] and when swss.sh script CHASSIS_APP_DPP when the /etc/hosts file is renamed, the error "Unable to connect to redis: Cannot assign requested address" is seen and the CHASSIS_APP_DB EVAL command fails. This causes the chassis db entries not getting cleaned up and causes orchagent crash in remote LC's.

---------

Signed-off-by: saksarav <[email protected]>
mssonicbld pushed a commit that referenced this issue Apr 23, 2024
…re valid in CONIFG_DB before starting chassis-db-cleanup (#17962)

This PR fixes the issue reported in Issu #17945
We noticed that chassis db clean up is skipped sometimes when the CHASSIS_APP_DB PING fails. Also if host_name and asic_name are not written to CONIG_DB, it could pass the empty strings to CHASSIS_APP_DB EVAL commands.
The service hostname-config.service is restarted whenever the config-reload or load-minigraph is done and this services renames the file /etc/hosts to updates it with the new file. This interferes with [email protected] and when swss.sh script CHASSIS_APP_DPP when the /etc/hosts file is renamed, the error "Unable to connect to redis: Cannot assign requested address" is seen and the CHASSIS_APP_DB EVAL command fails. This causes the chassis db entries not getting cleaned up and causes orchagent crash in remote LC's.

---------

Signed-off-by: saksarav <[email protected]>
mlok-nokia pushed a commit to mlok-nokia/sonic-buildimage that referenced this issue Jun 5, 2024
…re valid in CONIFG_DB before starting chassis-db-cleanup (sonic-net#17962)

This PR fixes the issue reported in Issu sonic-net#17945
We noticed that chassis db clean up is skipped sometimes when the CHASSIS_APP_DB PING fails. Also if host_name and asic_name are not written to CONIG_DB, it could pass the empty strings to CHASSIS_APP_DB EVAL commands.
The service hostname-config.service is restarted whenever the config-reload or load-minigraph is done and this services renames the file /etc/hosts to updates it with the new file. This interferes with [email protected] and when swss.sh script CHASSIS_APP_DPP when the /etc/hosts file is renamed, the error "Unable to connect to redis: Cannot assign requested address" is seen and the CHASSIS_APP_DB EVAL command fails. This causes the chassis db entries not getting cleaned up and causes orchagent crash in remote LC's.

---------

Signed-off-by: saksarav <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Chassis 🤖 Modular chassis support
Projects
None yet
Development

No branches or pull requests

2 participants