-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[202012] [Mellanox] Fix issue: SFP eeprom corrupted after replacing cable with different sfp type #13543
[202012] [Mellanox] Fix issue: SFP eeprom corrupted after replacing cable with different sfp type #13543
Conversation
@Junchao-Mellanox please provide PR description for the change |
@Junchao-Mellanox pls confirm this is needed only for 202012 and not needed for 202205 and above. |
@prgeor kindly reminder to review this PR |
@liat-grozovik I don't think its Mellanox specific. But worth checking if this issue exist in 202205 and above |
There is no such issue on 202205 and above because there is a refactor for xcvrd: SFP state task was changed from process to thread, so that all 3 tasks share the same memory space, they always have correct SFP type. |
Why I did it
There are 3 tasks in xcvrd:
Let assume user replaces QSFP with QSFP-DD. There are two issues:
The PR is to fix these two issues.
There is no such issue on 202205 and above because there is a refactor for xcvrd:
How I did it
It is difficult to back port latest xcvrd because there are many refactor/new features in xcvrd after 202012 release. It will be huge effort to do so. Based on that, we decided to fix the issue on Nvidia platform API side. The fix is that: refreshing SFP type before any SFP API which accessing SFP EEPROM. Refreshing SFP type before any SFP API would cause a small performance down: Due to my test on 202012 branch, accessing transceiver INFO and DOM INFO for 32 ports takes 1.7 seconds before the change. The number changes to 2.4 seconds after the change. I suppose the performance down is acceptable.
How to verify it
Which release branch to backport (provide reason below if selected)
Description for the changelog
Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)