Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Warmboot Issue when upgraded Image SAI return Switch Internal OID not accounted in previous image. #654

Merged
merged 3 commits into from
Sep 2, 2020

Conversation

abdosi
Copy link
Contributor

@abdosi abdosi commented Aug 29, 2020

Why I did:
Fix is for the issue as reported in sonic-net/sonic-buildimage#5274
Fix sonic-net/sonic-buildimage#5274

Basically Switch Internal OID should always be accounted in Temp and
Current Logic Comparison. It should never trigger remove operation.

Changes is to always add Switch Internal OID to COLDVIDS (even in case of
warm-boot)

Signed-off-by: Abhishek Dosi [email protected]

How I verfiy:

a) After change above issue is resolved
b) Dump COLDVIDS/ASIC_VIEW/HIDDEN Table and Switch Internal OID's are accounted in all 3 tables

Basically Switch Internal OID should always be accounted in Temp and
Current Logic Comparison. It should never trigger remove operation.

Changes is to always add Switch Internal OID to COLDVIDS (even in case of
warm-boot)

Signed-off-by: Abhishek Dosi <[email protected]>
@abdosi abdosi requested a review from kcudnik August 29, 2020 23:57
@abdosi abdosi added Bug Request for 201911 Branch Label for changes in master that applies for 201911 Branch also labels Aug 29, 2020
@kcudnik
Copy link
Collaborator

kcudnik commented Aug 30, 2020

Could you explain this more ? i dont understand this issue here


m_client->setDummyAsicStateObject(vid);

m_client->saveColdBootDiscoveredVids(m_switch_vid, coldVids);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when doing discovery on switch when switch is created, all those "default" objects are already discovered and put as cold vids to the database, why this is needed again ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kcudnik Closed as per below comments

@kcudnik
Copy link
Collaborator

kcudnik commented Aug 30, 2020

@abdosi when switch is created, there is discovery logic that discovers all the oids, it does not matter whether its cold or warm boot
so at new instance 3.7 SAI_SWITCH_ATTR_DEFAULT_STP_INST_ID will be discovered, as it was not present on 3.5
and this should be done in different way, SAI_SWITCH_ATTR_DEFAULT_STP_INST_ID should be added to m_default_rid_map in SaiSwitch.cpp, and then when function isSwitchObjectDefaultRid() inside isNonRemovableRid(), and isNonRemovableRid is called in ComparisonLogic, then that object will not be removed, since that is default object

syncd/SaiSwitch.h Outdated Show resolved Hide resolved
syncd/SaiSwitch.cpp Outdated Show resolved Hide resolved
@lguohan
Copy link
Contributor

lguohan commented Aug 31, 2020

can we have some kinds of unit test for this?

@abdosi
Copy link
Contributor Author

abdosi commented Aug 31, 2020

@abdosi when switch is created, there is discovery logic that discovers all the oids, it does not matter whether its cold or warm boot
so at new instance 3.7 SAI_SWITCH_ATTR_DEFAULT_STP_INST_ID will be discovered, as it was not present on 3.5
and this should be done in different way, SAI_SWITCH_ATTR_DEFAULT_STP_INST_ID should be added to m_default_rid_map in SaiSwitch.cpp, and then when function isSwitchObjectDefaultRid() inside isNonRemovableRid(), and isNonRemovableRid is called in ComparisonLogic, then that object will not be removed, since that is default object

@kcudnik : Thanks looks like logic is different in master vs 201911. I will rework this PR for 201911 branch. Master looks fine.

@kcudnik
Copy link
Collaborator

kcudnik commented Aug 31, 2020

Ok, so master works fine without any issues? just 201911 is affected?

as for unittests, this will be tricky to implement in current test mechanism, since it would require to have 2 different version of syncd (compiled with version A and then B) and cold boot from A to warm boot to B, or to have a new special configuration file for virtual switch that on warm boot it could be specified to "enable" new STP object to be existing on the VS

@abdosi
Copy link
Contributor Author

abdosi commented Aug 31, 2020

Ok, so master works fine without any issues? just 201911 is affected?

I revisited the code-path you mention and issue will be present both in master and 201911. I am verifying on 201911 as of now. Here are two issue in above approach -

Issue 1:

In Function isNonRemovableRid() we first check for isColdBootDiscoveredRid() and then isSwitchObjectDefaultRid(). In This particular case DefaultRid was dicovered as part of Warm-boot so first check will fail and return.
I think the order should be reverse. Let me know.

Issue 2:
Even after we reverse the order (as mention in Issue 1 above ) the API ComparisonLogic::checkMap() will fail. This is because since this discovered Switch Internal OID never gets updated to tempView.

I think my initial change where I add Internal OID to COLDVIDS (irrespective of Cold or Warm boot) make it clean because then populateExisting Object logic will take care of adding this to temp view with the existing logic.

as for unittests, this will be tricky to implement in current test mechanism, since it would require to have 2 different version of syncd (compiled with version A and then B) and cold boot from A to warm boot to B, or to have a new special configuration file for virtual switch that on warm boot it could be specified to "enable" new STP object to be existing on the VS

@abdosi
Copy link
Contributor Author

abdosi commented Sep 1, 2020

@kcudnik can you check my comments above

@kcudnik
Copy link
Collaborator

kcudnik commented Sep 1, 2020

ok, thats sounds good, just put a those comments in that change why we populate those tables in redis

before Cold Bott Discover OID.

Also address review Comments.
Signed-off-by: Abhishek Dosi <[email protected]>
@abdosi abdosi merged commit caa7ab2 into sonic-net:master Sep 2, 2020
@abdosi abdosi deleted the internal_oid_fix branch September 2, 2020 16:57
abdosi added a commit that referenced this pull request Sep 3, 2020
… not accounted in previous image. (#654)

* Fix the issue sonic-net/sonic-buildimage#5274

Basically Switch Internal OID should always be accounted in Temp and
Current Logic Comparison. It should never trigger remove operation.

Changes is to always add Switch Internal OID to COLDVIDS (even in case of
warm-boot)

Signed-off-by: Abhishek Dosi <[email protected]>

* Fix the API that check NonRemovableOID to check internal OID first
before Cold Bott Discover OID.

Also address review Comments.

* Address Review Comments

Signed-off-by: Abhishek Dosi <[email protected]>
pettershao-ragilenetworks pushed a commit to pettershao-ragilenetworks/sonic-sairedis that referenced this pull request Nov 18, 2022
… not accounted in previous image. (sonic-net#654)

* Fix the issue sonic-net/sonic-buildimage#5274

Basically Switch Internal OID should always be accounted in Temp and
Current Logic Comparison. It should never trigger remove operation.

Changes is to always add Switch Internal OID to COLDVIDS (even in case of
warm-boot)

Signed-off-by: Abhishek Dosi <[email protected]>

* Fix the API that check NonRemovableOID to check internal OID first
before Cold Bott Discover OID.

Also address review Comments.

* Address Review Comments

Signed-off-by: Abhishek Dosi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Included in 201911 Branch Request for 201911 Branch Label for changes in master that applies for 201911 Branch also
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[201911] Warm boot fail from 201811 to 201911 on Broadcom platform
3 participants