Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[201911] Warm boot fail from 201811 to 201911 on Broadcom platform #5274

Closed
abdosi opened this issue Aug 29, 2020 · 2 comments · Fixed by sonic-net/sonic-sairedis#654
Closed
Assignees

Comments

@abdosi
Copy link
Contributor

abdosi commented Aug 29, 2020

When doing Warm-boot from 201811 image (SAI 3.5) to 201911 image (SAI 3.7) we are seeing below errors and warm-boot fails.

Root cause:
In 201911 image when syncd gets internal oid we are getting OID for attribute SAI_SWITCH_ATTR_DEFAULT_STP_INST_ID as part of 3.7 . In 3.5 this attribute was not supported. This particular case is not handled correctly by syncd and when it compares temp and current view the co,mparison logic results in remove operation for this internal OID which cause error (as seen in below logs).

Fix:
Internal OID should always match in comparison logic and should not trigger remove operation

Logs:
Aug 29 18:39:56.343316 str-a7050-acs-1 ERR syncd#syncd: [none] brcm_sai_remove_stp:156 STP Instance 0x0000001000000001 cannot be removed due to 3 vlans present
Aug 29 18:39:56.343361 str-a7050-acs-1 ERR syncd#syncd: :- asic_handle_generic: remove SAI_OBJECT_TYPE_STP RID: oid:0x1000000001 VID oid:0x100000000006b6 failed: SAI_STATUS_OBJECT_IN_USE
Aug 29 18:39:56.343429 str-a7050-acs-1 ERR syncd#syncd: :- asic_process_event: failed to execute api: remove, key: SAI_OBJECT_TYPE_STP:oid:0x100000000006b6, status: SAI_STATUS_OBJECT_IN_USE
Aug 29 18:39:56.343576 str-a7050-acs-1 NOTICE syncd#syncd: :- executeOperationsOnAsic: asic apply took 0.001832 sec
Aug 29 18:39:56.343663 str-a7050-acs-1 ERR syncd#syncd: :- executeOperationsOnAsic: Error while executing asic operations, ASIC is in inconsistent state: :- asic_process_event: failed to execute api: remove, key: SAI_OBJECT_TYPE_STP:oid:0x100000000006b6, status: SAI_STATUS_OBJECT_IN_USE
Aug 29 18:39:56.608242 str-a7050-acs-1 NOTICE syncd#syncd: :- syncdApplyView: apply took 3.141233 sec
Aug 29 18:39:56.608562 str-a7050-acs-1 ERR syncd#syncd: :- syncd_main: Runtime error: :- asic_process_event: failed to execute api: remove, key: SAI_OBJECT_TYPE_STP:oid:0x100000000006b6, status: SAI_STATUS_OBJECT_IN_USE
Aug 29 18:39:56.608562 str-a7050-acs-1 NOTICE syncd#syncd: :- notify_OA_about_syncd_exception: sending switch_shutdown_request notification to OA
Aug 29 18:39:56.608787 str-a7050-acs-1 NOTICE swss#orchagent: :- handle_switch_shutdown_request: switch shutdown request

@abdosi abdosi self-assigned this Aug 29, 2020
@abdosi
Copy link
Contributor Author

abdosi commented Aug 29, 2020

working on fix.

abdosi added a commit to abdosi/sonic-sairedis that referenced this issue Aug 29, 2020
Basically Switch Internal OID should always be accounted in Temp and
Current Logic Comparison. It should never trigger remove operation.

Changes is to always add Switch Internal OID to COLDVIDS (even in case of
warm-boot)

Signed-off-by: Abhishek Dosi <[email protected]>
@kcudnik
Copy link
Contributor

kcudnik commented Aug 30, 2020

@abdosi when switch is created, there is discovery logic that discovers all the oids, it does not matter whether its cold or warm boot
so at new instance 3.7 SAI_SWITCH_ATTR_DEFAULT_STP_INST_ID will be discovered, as it was not present on 3.5
and this should be done in different way, SAI_SWITCH_ATTR_DEFAULT_STP_INST_ID should be added to m_default_rid_map in SaiSwitch.cpp, and then when function isSwitchObjectDefaultRid() inside isNonRemovableRid(), and isNonRemovableRid is called in ComparisonLogic, then that object will not be removed, since that is default object

abdosi added a commit to sonic-net/sonic-sairedis that referenced this issue Sep 2, 2020
… not accounted in previous image. (#654)

* Fix the issue sonic-net/sonic-buildimage#5274

Basically Switch Internal OID should always be accounted in Temp and
Current Logic Comparison. It should never trigger remove operation.

Changes is to always add Switch Internal OID to COLDVIDS (even in case of
warm-boot)

Signed-off-by: Abhishek Dosi <[email protected]>

* Fix the API that check NonRemovableOID to check internal OID first
before Cold Bott Discover OID.

Also address review Comments.

* Address Review Comments

Signed-off-by: Abhishek Dosi <[email protected]>
abdosi added a commit to sonic-net/sonic-sairedis that referenced this issue Sep 3, 2020
… not accounted in previous image. (#654)

* Fix the issue sonic-net/sonic-buildimage#5274

Basically Switch Internal OID should always be accounted in Temp and
Current Logic Comparison. It should never trigger remove operation.

Changes is to always add Switch Internal OID to COLDVIDS (even in case of
warm-boot)

Signed-off-by: Abhishek Dosi <[email protected]>

* Fix the API that check NonRemovableOID to check internal OID first
before Cold Bott Discover OID.

Also address review Comments.

* Address Review Comments

Signed-off-by: Abhishek Dosi <[email protected]>
pettershao-ragilenetworks pushed a commit to pettershao-ragilenetworks/sonic-sairedis that referenced this issue Nov 18, 2022
… not accounted in previous image. (sonic-net#654)

* Fix the issue sonic-net/sonic-buildimage#5274

Basically Switch Internal OID should always be accounted in Temp and
Current Logic Comparison. It should never trigger remove operation.

Changes is to always add Switch Internal OID to COLDVIDS (even in case of
warm-boot)

Signed-off-by: Abhishek Dosi <[email protected]>

* Fix the API that check NonRemovableOID to check internal OID first
before Cold Bott Discover OID.

Also address review Comments.

* Address Review Comments

Signed-off-by: Abhishek Dosi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants