Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TH3] Skipp Control Plane Assist on WARM Reboot for TH3 HWSKUs #1861

Merged
merged 5 commits into from
Oct 11, 2021
Merged

[TH3] Skipp Control Plane Assist on WARM Reboot for TH3 HWSKUs #1861

merged 5 commits into from
Oct 11, 2021

Conversation

gechiang
Copy link
Contributor

@gechiang gechiang commented Oct 6, 2021

What I did

Added Code to Skip over Control Plane Assist (CPA) executed during Warm reboot operation for those HW that is not capable of handling CPA such as TH3 based HW SKUs.

How to verify it

Execute testcase "arp/test_wr_arp.py" to ensure the operation to program VxLAN Tunnel and Mirror ACLs are all skipped.
This is skipped during setup as well as finalizer time.
Syslog captured that shows the skipping is made when HW SKU of TH3 is detected:

admin@z9332f:~$ show platform summary
Platform: x86_64-dellemc_z9332f_d1508-r0
HwSKU: DellEMC-Z9332f-M-O16C64
ASIC: broadcom
ASIC Count: 1
admin@SONiC:~$ show logging

...
Oct  7 23:12:47.298940 z9332f NOTICE swss#orchagent: :- addNeighbor: Created neighbor ip 192.168.0.23, 98:03:9b:03:22:15 on Vlan1000
Oct  7 23:12:47.300419 z9332f NOTICE swss#orchagent: :- addNextHop: Created next hop 192.168.0.23 on Vlan1000
Oct  7 23:12:47.501984 z9332f NOTICE swss#orchagent: :- addNeighbor: Created neighbor ip 192.168.0.24, 98:03:9b:03:22:16 on Vlan1000
Oct  7 23:12:47.503181 z9332f NOTICE swss#orchagent: :- addNextHop: Created next hop 192.168.0.24 on Vlan1000
Oct  7 23:12:47.706207 z9332f NOTICE swss#orchagent: :- addNeighbor: Created neighbor ip 192.168.0.25, 98:03:9b:03:22:17 on Vlan1000
Oct  7 23:12:47.707813 z9332f NOTICE swss#orchagent: :- addNextHop: Created next hop 192.168.0.25 on Vlan1000
Oct  7 23:12:47.900366 z9332f NOTICE admin: DellEMC-Z9332f-M-O16C64 Not capable to support CPA. Skipping gracefully ...
Oct  7 23:12:47.902892 z9332f NOTICE admin: Pausing orchagent ...
Oct  7 23:12:47.909771 z9332f NOTICE swss#orchagent: :- addNeighbor: Created neighbor ip 192.168.0.26, 98:03:9b:03:22:18 on Vlan1000
Oct  7 23:12:47.911524 z9332f NOTICE swss#orchagent: :- addNextHop: Created next hop 192.168.0.26 on Vlan1000
Oct  7 23:12:48.042208 z9332f NOTICE swss#orchagent_restart_check: :- main: Wait time for response from orchagent set to 2000 milliseconds
Oct  7 23:12:48.042467 z9332f NOTICE swss#orchagent_restart_check: :- main: Number of retries for the request to orchagent is set to 5
Oct  7 23:12:48.047630 z9332f INFO swss#orchagent_restart_check: :- subscribe: subscribed to RESTARTCHECKREPLY
Oct  7 23:12:48.047630 z9332f NOTICE swss#orchagent_restart_check: :- main: requested orchagent to do warm restart state check, retry count: 0
Oct  7 23:12:48.047841 z9332f NOTICE swss#orchagent: :- doTask: RESTARTCHECK notification for orchagent
Oct  7 23:12:48.047887 z9332f NOTICE swss#orchagent: :- doTask: orchagent|NoFreeze:false|SkipPendingTaskCheck:false
Oct  7 23:12:48.047931 z9332f NOTICE swss#orchagent: :- warmRestartCheck: Restart check result: READY
Oct  7 23:12:48.048088 z9332f NOTICE swss#orchagent_restart_check: :- main: RESTARTCHECK success, orchagent is frozen and ready for warm restart
...
Oct  7 23:15:57.249871 z9332f NOTICE bgp#fpmsyncd: :- main: Warm-Restart timer expired.
Oct  7 23:15:57.249871 z9332f NOTICE bgp#fpmsyncd: :- reconcile: Warm-Restart: Initiating reconciliation process for bgp application.
Oct  7 23:15:57.287145 z9332f NOTICE bgp#fpmsyncd: :- reconcile: Warm-Restart reconciliation: updating entry fe80::/64 { nexthop: :: | ifname: Ethernet104 }
Oct  7 23:15:57.368196 z9332f NOTICE bgp#fpmsyncd: :- setWarmStartState: bgp warm start state changed to reconciled
Oct  7 23:15:57.368196 z9332f NOTICE bgp#fpmsyncd: :- reconcile: Warm-Restart: Concluded reconciliation process for bgp application.
Oct  7 23:15:57.368196 z9332f NOTICE bgp#fpmsyncd: :- main: Warm-Restart reconciliation processed.
Oct  7 23:16:01.960005 z9332f NOTICE root: WARMBOOT_FINALIZER : Tearing down control plane assistant ...
Oct  7 23:16:05.173116 z9332f NOTICE root: WARMBOOT_FINALIZER : Save in-memory database after warm reboot ...
Oct  7 23:16:05.498247 z9332f INFO finalize-warmboot.sh[6750]: Running command: /usr/local/bin/sonic-cfggen -d --print-data > /etc/sonic/config_db.json
Oct  7 23:16:05.906786 z9332f NOTICE root: WARMBOOT_FINALIZER : Finalizing warmboot...
Oct  7 23:16:06.218787 z9332f INFO systemd[1]: warmboot-finalizer.service: Succeeded.
Oct  7 23:16:06.218964 z9332f INFO systemd[1]: Finished Monitor warm recovery and disable warmboot when done.
...

Also added new test cases to specifically test out the new helper function I added.

Copy link
Contributor

@prsunny prsunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the neighbor_advertiser be skipped while invoking warmboot instead of modifying the script?

@gechiang
Copy link
Contributor Author

gechiang commented Oct 6, 2021

Can the neighbor_advertiser be skipped while invoking warmboot instead of modifying the script?

Reason I am doing it inside the script is because the setup for warmboot and finalizer are two seperate files and I think it is easier to have this inside the script which is called by both cases instead of spread out to multiple files.

@prsunny
Copy link
Contributor

prsunny commented Oct 7, 2021

Can the neighbor_advertiser be skipped while invoking warmboot instead of modifying the script?

Reason I am doing it inside the script is because the setup for warmboot and finalizer are two seperate files and I think it is easier to have this inside the script which is called by both cases instead of spread out to multiple files.

If you fix this in setup for warmboot, isn't finalizer a no-op? As in, even if it try to remove the entries in finalizer, it will not trigger anything since its non-existent, right?

@gechiang
Copy link
Contributor Author

gechiang commented Oct 7, 2021

Can the neighbor_advertiser be skipped while invoking warmboot instead of modifying the script?

Reason I am doing it inside the script is because the setup for warmboot and finalizer are two seperate files and I think it is easier to have this inside the script which is called by both cases instead of spread out to multiple files.

If you fix this in setup for warmboot, isn't finalizer a no-op? As in, even if it try to remove the entries in finalizer, it will not trigger anything since its non-existent, right?

Interesting idea. Let me give it a try just on the setup for warmboot to see if that works. If so, I will create a new PULL request.
Thanks!

@gechiang gechiang requested a review from prsunny October 7, 2021 23:34
@gechiang
Copy link
Contributor Author

gechiang commented Oct 7, 2021

@prsunny I have modified the code per your comment. Please help review the new changes.
Thanks!

vaibhavhd
vaibhavhd previously approved these changes Oct 8, 2021
Copy link
Contributor

@vaibhavhd vaibhavhd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions. LGTM otherwise.

@@ -487,6 +495,8 @@ case "$REBOOT_TYPE" in
;;
esac

HWSKU=$(show platform summary --json | python -c 'import sys, json; print(json.load(sys.stdin)["hwsku"])')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Changed just now.
Thanks!

debug "Setting up control plane assistant: ${ASSISTANT_IP_LIST} ..."
${ASSISTANT_SCRIPT} -s ${ASSISTANT_IP_LIST} -m set
# TH3 HW is not capable of VxLAN programming thus skipping TH3 platforms
if [[ "$HWSKU" != "DellEMC-Z9332f-M-O16C64" && "$HWSKU" != "DellEMC-Z9332f-M-O16C64-lab" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$HWSKU can be replaced by ${HWSKU} for consistency and probably bash safety too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Changed just now.
Thanks!

@gechiang
Copy link
Contributor Author

gechiang commented Oct 8, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@vaibhavhd
Copy link
Contributor

The failures seem to be coming due to a sonic-swss dvs switch creation error.
Yesterday this was supposedly fixed on sonic-swss - sonic-net/sonic-swss#1949

Buildimage update is in progress - sonic-net/sonic-buildimage#8915

I suggest to retry after buildimage PR is merged.

@gechiang
Copy link
Contributor Author

gechiang commented Oct 8, 2021

The failures seem to be coming due to a sonic-swss dvs switch creation error. Yesterday this was supposedly fixed on sonic-swss - Azure/sonic-swss#1949

Buildimage update is in progress - Azure/sonic-buildimage#8915

I suggest to retry after buildimage PR is merged.

Thanks! I was about to ask how to address the tests failure which are unrelated to my changes...
Will wait for that swss PR pulled into sonic-buildimage before attempting restart azp run.

@gechiang
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@gechiang gechiang merged commit 1edf934 into sonic-net:master Oct 11, 2021
qiluo-msft pushed a commit that referenced this pull request Oct 12, 2021
* [TH3] Skip Control Plane Assist on WARM Reboot for TH3 HWSKUs
stepanblyschak pushed a commit to stepanblyschak/sonic-utilities that referenced this pull request Apr 18, 2022
8768089 (HEAD -> 202012, origin/202012) Remove exec from platform_reboot_plugin call to handle any hang issue. (sonic-net#1879)
ae5d90c Validate input of ```config mirror_session add``` (sonic-net#1825)
44d3a3b [show][config] fix the muxcable commands for interface naming mode (sonic-net#1862)
0a4933e [TH3] Skipp Control Plane Assist on WARM Reboot for TH3 HWSKUs (sonic-net#1861)

Signed-off-by: vaibhav-dahiya <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants