-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[201911] Warm boot fail from 201811 to 201911 on Broadcom platform #5274
Comments
working on fix. |
Basically Switch Internal OID should always be accounted in Temp and Current Logic Comparison. It should never trigger remove operation. Changes is to always add Switch Internal OID to COLDVIDS (even in case of warm-boot) Signed-off-by: Abhishek Dosi <[email protected]>
@abdosi when switch is created, there is discovery logic that discovers all the oids, it does not matter whether its cold or warm boot |
… not accounted in previous image. (#654) * Fix the issue sonic-net/sonic-buildimage#5274 Basically Switch Internal OID should always be accounted in Temp and Current Logic Comparison. It should never trigger remove operation. Changes is to always add Switch Internal OID to COLDVIDS (even in case of warm-boot) Signed-off-by: Abhishek Dosi <[email protected]> * Fix the API that check NonRemovableOID to check internal OID first before Cold Bott Discover OID. Also address review Comments. * Address Review Comments Signed-off-by: Abhishek Dosi <[email protected]>
… not accounted in previous image. (#654) * Fix the issue sonic-net/sonic-buildimage#5274 Basically Switch Internal OID should always be accounted in Temp and Current Logic Comparison. It should never trigger remove operation. Changes is to always add Switch Internal OID to COLDVIDS (even in case of warm-boot) Signed-off-by: Abhishek Dosi <[email protected]> * Fix the API that check NonRemovableOID to check internal OID first before Cold Bott Discover OID. Also address review Comments. * Address Review Comments Signed-off-by: Abhishek Dosi <[email protected]>
… not accounted in previous image. (sonic-net#654) * Fix the issue sonic-net/sonic-buildimage#5274 Basically Switch Internal OID should always be accounted in Temp and Current Logic Comparison. It should never trigger remove operation. Changes is to always add Switch Internal OID to COLDVIDS (even in case of warm-boot) Signed-off-by: Abhishek Dosi <[email protected]> * Fix the API that check NonRemovableOID to check internal OID first before Cold Bott Discover OID. Also address review Comments. * Address Review Comments Signed-off-by: Abhishek Dosi <[email protected]>
When doing Warm-boot from 201811 image (SAI 3.5) to 201911 image (SAI 3.7) we are seeing below errors and warm-boot fails.
Root cause:
In 201911 image when syncd gets internal oid we are getting OID for attribute SAI_SWITCH_ATTR_DEFAULT_STP_INST_ID as part of 3.7 . In 3.5 this attribute was not supported. This particular case is not handled correctly by syncd and when it compares temp and current view the co,mparison logic results in remove operation for this internal OID which cause error (as seen in below logs).
Fix:
Internal OID should always match in comparison logic and should not trigger remove operation
Logs:
Aug 29 18:39:56.343316 str-a7050-acs-1 ERR syncd#syncd: [none] brcm_sai_remove_stp:156 STP Instance 0x0000001000000001 cannot be removed due to 3 vlans present
Aug 29 18:39:56.343361 str-a7050-acs-1 ERR syncd#syncd: :- asic_handle_generic: remove SAI_OBJECT_TYPE_STP RID: oid:0x1000000001 VID oid:0x100000000006b6 failed: SAI_STATUS_OBJECT_IN_USE
Aug 29 18:39:56.343429 str-a7050-acs-1 ERR syncd#syncd: :- asic_process_event: failed to execute api: remove, key: SAI_OBJECT_TYPE_STP:oid:0x100000000006b6, status: SAI_STATUS_OBJECT_IN_USE
Aug 29 18:39:56.343576 str-a7050-acs-1 NOTICE syncd#syncd: :- executeOperationsOnAsic: asic apply took 0.001832 sec
Aug 29 18:39:56.343663 str-a7050-acs-1 ERR syncd#syncd: :- executeOperationsOnAsic: Error while executing asic operations, ASIC is in inconsistent state: :- asic_process_event: failed to execute api: remove, key: SAI_OBJECT_TYPE_STP:oid:0x100000000006b6, status: SAI_STATUS_OBJECT_IN_USE
Aug 29 18:39:56.608242 str-a7050-acs-1 NOTICE syncd#syncd: :- syncdApplyView: apply took 3.141233 sec
Aug 29 18:39:56.608562 str-a7050-acs-1 ERR syncd#syncd: :- syncd_main: Runtime error: :- asic_process_event: failed to execute api: remove, key: SAI_OBJECT_TYPE_STP:oid:0x100000000006b6, status: SAI_STATUS_OBJECT_IN_USE
Aug 29 18:39:56.608562 str-a7050-acs-1 NOTICE syncd#syncd: :- notify_OA_about_syncd_exception: sending switch_shutdown_request notification to OA
Aug 29 18:39:56.608787 str-a7050-acs-1 NOTICE swss#orchagent: :- handle_switch_shutdown_request: switch shutdown request
The text was updated successfully, but these errors were encountered: