-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sfp-refactoring] Initial support for CMIS application initialization #876
Conversation
Example Unit Test Result: |
doc/sfp-cmis/cmis.md
Outdated
|
||
This document describes functional behavior of the QSFPDD CMIS support in SONiC. | ||
|
||
The QSFPDD Common Management Interface Specification (CMIS) version 4.0 was finalized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document above references the CMIS 5.0 spec, this section is referencing CMIS 4.0. There are minimal differences, but which is being used here? Will CMIS 3.x support be included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CMIS 3 is not be supported, only CMIS 4 and 5 will be supported.
And thanks for the comments, the terms of 'version 4' will be removed.
doc/sfp-cmis/cmis.md
Outdated
in May of 2019 and provides a variety of features and support for different transceiver | ||
form factors. From a software perspective, a CMIS application initialization sequence | ||
is now mandatory as per the current dynamic port breakout mode activated. And these is | ||
also a new diagnostic support tha could be useful to SONiC users and developers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "tha"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
doc/sfp-cmis/cmis.md
Outdated
|
||
## 1.3 Warm Boot Requirements | ||
|
||
Functionality should continue to work across warm boot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An important aspect of the warm boot requirements is that the datapath initialization sequence should not be performed during WarmBoot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
doc/sfp-cmis/cmis.md
Outdated
"ext_identifier": "N/A", | ||
"ext_rateselect_compliance": "N/A", | ||
"hardware_rev": "01", | ||
"lane_count": "4", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is media lane count, right? (host lane count would be 8)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, this is the effective lane counts in the line side
With CMIS application advertisement support, **show interfaces transceiver eeprom** is now as below. | ||
|
||
``` | ||
admin@sonic:~$ show interfaces transceiver eeprom Ethernet0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current show command output is like this:
admin@sonic:~$ show interface transceiver eeprom Ethernet0
Ethernet0: SFP EEPROM detected
Application Advertisement: 400G CR8 - Copper cable
200GBASE-CR4 (Clause 136) - Copper cable
100GBASE-CR2 (Clause 136) - Copper cable
100GBASE-CR4 (Clause 92) - Copper cable
50GBASE-CR (Clause 126) - Copper cable
50GBASE-CR2 with no FEC - Copper cable
25GBASE-CR CA-N (Clause 110) - Copper cable
1000BASE -CX(Clause 39) - Copper cable
Connector: No separable connector
Encoding: Not supported for CMIS cables
Extended Identifier: Power Class 1(0.25W Max)
Extended RateSelect Compliance: Not supported for CMIS cables
Identifier: QSFP-DD Double Density 8X Pluggable Transceiver
Length Cable Assembly(m): 0.5
Nominal Bit Rate(100Mbs): Not supported for CMIS cables
Specification compliance: Not supported for CMIS cables
Vendor Date Code(YYYY-MM-DD Lot): 2021-05-14
Vendor Name: Mellanox
Vendor OUI: 00-02-c9
Vendor PN: MCP1660-W00AE30
Vendor Rev: A3
Vendor SN: MT2120VS03875
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the old one directly output the raw text fetched from the DB
doc/sfp-cmis/cmis-init.md
Outdated
- The **pmon#xcvrd** will detect the module type of the attched transceivers and post the | ||
information to the **STATE_DB** | ||
- If the transceiver is a CMIS module, the **pmon#xcvrd** will activate the appropriate | ||
application mode as per the current port mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I have commented before, I would suggest adding a knob here, not all the platforms need to config this from the NOS layer. please refer to the media setting function currently supported by xcvrd, it provides a configuration file, and only the platform which needs it will turn it on via setting the configuration file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, you're right, maybe we should enhance the media_setting.json to conditionally activate the application initialization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the suggestion to have a per-vendor/MPN setting in media_settins or a "global" setting? I'd be fairly strongly against a per-module setting since it means we must have advance knowledge that the module will be inserted in order to do CMIS-compliant initialization, or a "default" setting for it.
Stepping back a minute - to do speed control (supporting anything other than the default) on active CMIS modules, this feature is required. I have seen some cases where the modules are not fully CMIS-compliant and we have needed to override the application parsing/selection behavior (and internally we have made some changes to media_settings.json to accommodate this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the suggestion to have a per-vendor/MPN setting in media_settins or a "global" setting? I'd be fairly strongly against a per-module setting since it means we must have advance knowledge that the module will be inserted in order to do CMIS-compliant initialization, or a "default" setting for it.
I would suggest a "global" setting, which allows a vendor to disable(not start) this new function on its platform(s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the last sync up with Prince, if I'm right about it, we'll drop these exception handlers from media_settings.json until there is a true requirement/request for this. Hence the references to media_settings.json have been removed in the latest code PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the last sync up with Prince, if I'm right about it, we'll drop these exception handlers from media_settings.json until there is a true requirement/request for this. Hence the references to media_settings.json have been removed in the latest code PRs.
then how to address the requirement that some platforms may not need to activate this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but I'm not following you, the CMIS init is a feature of the transceiver, it's not platform-specific, why do we want to have it disabled on some certain platform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but I'm not following you, the CMIS init is a feature of the transceiver, it's not platform-specific, why do we want to have it disabled on some certain platform?
a possibility is that some vendors have implemented this function in a lower layer, no extra configuration is needed from SONiC, so they can skip this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy that, thanks for the clarification, will add one additional flag into the platform.json for this
doc/sfp-cmis/cmis-init.md
Outdated
This document describes functional behavior of the CMIS application initialization support in SONiC. | ||
|
||
The Common Management Interface Specification (CMIS) provides a variety of features | ||
and support for different transceiver form factors. A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: odd formatting (does not affect final markdown)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, and it's fixed
doc/sfp-cmis/cmis-init.md
Outdated
- The **pmon#xcvrd** will detect the module type of the attched transceivers and post the | ||
information to the **STATE_DB** | ||
- If the transceiver is a CMIS module, the **pmon#xcvrd** will activate the appropriate | ||
application mode as per the current port mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the suggestion to have a per-vendor/MPN setting in media_settins or a "global" setting? I'd be fairly strongly against a per-module setting since it means we must have advance knowledge that the module will be inserted in order to do CMIS-compliant initialization, or a "default" setting for it.
Stepping back a minute - to do speed control (supporting anything other than the default) on active CMIS modules, this feature is required. I have seen some cases where the modules are not fully CMIS-compliant and we have needed to override the application parsing/selection behavior (and internally we have made some changes to media_settings.json to accommodate this).
doc/sfp-cmis/cmis-init.md
Outdated
} | ||
} | ||
``` | ||
As of now, only **application_selection** is supported, and it defaults to **True** if unspecified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this interact with the module's advertising? I assume app selection would not be done for QSFP/SFP+ modules, but passive cables do not require app selection either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of checkers in the CMIS Manager
- Skip if the cage/form-factor type is not QSFPDD (Or OSFP, but I'm not sure if it shares the same form-factor)
- Skip if the media type (i.e. byte 128 or 0 of the module) is not a QSFPDD
- Skip if the memory type of the module is FLAT
Upon port change events, as we support only dynamic-port-breakout, which means sub-ports will share the same config, hence the application code could be determined by the config of anyone of the sub-port in the PORT_SET events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we also have another sanity check to skip the application initialization if
- Defaults to application 1 if none of the applications matched the current speed and lane count of the host interface
- Skip the application init if no application code update while the current datapath is activated and module state is ready.
doc/sfp-cmis/cmis-init.md
Outdated
- PortChangeEvent.PORT_DEL | ||
Invoke the event callbacks upon **DEL** operations. | ||
|
||
- As of now, the number of CmisManagerTask is hard coded to 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should take a strong look at xcvrd and consider creating a separate processing thread per xcvr. This allows for all slow i2c operations (DOM update, etc) to be done in a non-blocking manner.
Regardless, 4 is certainly not enough. If we assume 5 seconds per application selection and 32 modules, it would still take ~40 seconds to initialize the modules when it could actually be done in ~5s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agree, the CMIS transactions could take several seconds to finish (while it's possible to even scale up to minutes as per the CMIS SPEC, such modules are unlikely to be deployed in the SONIC, as of now)
doc/sfp-cmis/cmis-init.md
Outdated
|
||
- As of now, the number of CmisManagerTask is hard coded to 4 | ||
- Task 0 is dedicated to listen for **PortChangeEvent.PORT_SET** events, and put requests into the shared message queue | ||
- Task 1-N are responsible for handling the CMIS application initializations in parallel, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current design still takes lot of time in initializing all the 32 modules being served by few threads. This is because the thread is monitoring the datapath state transitioning. If you save the state machine context in sfp object, then you just need to trigger DP init and move on to next task (serve another module) and by the time you finish trigger DP init for all the module you can come to the first module to see if it has completed its DP init or failed. The basic idea is SW doesn't need to spend idle time waiting for a module to finish its DP init, instead it can do useful task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For eg. In Xcvrd, we need to wait for 2 secs for management init to complete after each module is brought out of reset, before doing 1st i2c transaction, we don't wait 2 secs * 32 = 64 secs. We just spend 2 to 3 secs for all 32 modules to complete management init delay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the delay of management init is supposed to be done at per-port module level, and the accurate delay is also module-specific, hence it's better to either add a SFP check code validation in the xcvrd before kicking off the CMIS init, or just check the module state after the interrupt is asserted.
doc/sfp-cmis/cmis-init.md
Outdated
since all the requests are from the shared message queue, only one request will be processed | ||
by a certain task at a time. | ||
|
||
![](images/002.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please capture how this below CMIS spec highlighted requirement is met in your design?
Please do note, during Xcvrd boot, it waits for port config done i.e the NPU or ASIC initialization for the particular port is done, when Xcvrd does accessing the module. The SET event in CONFIG_DB for a port doesn't guarantee the corresponding NPU or ASIC initialization is completed. In short, I don't see the above spec requirement for DP init to happen when Tx signal coming from the ASIC/Gearbox is valid being ensured in current design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the new statement in the CMIS5, it adds the OutputDisableTx=0xff in the application initialization, and the current implementation did not turn off the Tx power (It's not mention in appendix C of CMIS4) hence I'll later update the code for this
doc/sfp-cmis/cmis-init.md
Outdated
- get_cmis_application_selected() | ||
- get_cmis_application_matched() | ||
- set_cmis_application() | ||
- Add mutex lock to the follow routines to serialize CMIS application initializations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per CMIS spec, each DP initialization should happen independent of other DP init, but then won't this reset() affect all Datapaths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will, but the main purpose of this PR is for the dynamic-port-breakout support, which is having all the lanes configured in the same application mode, hence a reset() is a simple and safest way to ensure the entire module is reset to a known state. I'll later update the code to remove the reset()
doc/sfp-cmis/cmis-init.md
Outdated
- Add support for CMIS application initialization | ||
- get_cmis_application_selected() | ||
- get_cmis_application_matched() | ||
- set_cmis_application() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is set_cmis_application() API per data path or entire module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As communicated earlier, now we use state-based initialization, hence this routine is removed
|
||
## sonic-utilities/sfputil (modified) | ||
|
||
- Add supprot to show the CMIS application advertisement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a CLI to show current active application as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how will the CLI look like? sample output?
doc/sfp-cmis/cmis-init.md
Outdated
|
||
## sonic-platform-common/sonic_platform_base/sonic_xcvr/api/public/cmis.py (modified) | ||
|
||
- Add support for software reset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Module reset is already done by the platform, why is this done here? It will affect the case where we need to initialize individual datapath at different point in time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As communicated earlier, the software reset will be removed
doc/sfp-cmis/cmis-init.md
Outdated
**Example:** | ||
``` | ||
{ | ||
"CMIS_MEDIA_SETTINGS":{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SONiC will be supporting all CMIS compliant module. why this platform specific check? All platforms in SONiC should be using Xcvrd for initializing the CMIS compliant modules. I am checking with NVIDIA why other platforms should be explicit in specifying the data path init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As communicated earlier, this module-specific configuration is removed
doc/sfp-cmis/cmis-init.md
Outdated
![](images/001.png) | ||
|
||
- Add **PortChangeEvent.PORT_SET** and **PortChangeEvent.PORT_DEL** events to **xcvrd_utilities/port_mapping.py** | ||
- PortChangeEvent.PORT_SET |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the design for handling default application or non-default application as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As communicated earlier, in the latest design, we'll support default application for now
Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
doc/sfp-cmis/cmis-init.md
Outdated
* [List of Tables](#list-of-tables) | ||
* [Revision](#revision) | ||
* [About This Manual](#about-this-manual) | ||
* [Scope](#scope) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If i click the "Scope" it doesn't take me to the section "Scope". Can you fix that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do
doc/sfp-cmis/cmis-init.md
Outdated
return "/sys/bus/i2c/devices/{}-0050/eeprom".format(bus_id) | ||
``` | ||
|
||
This feature could also be disabled by the platform-specific **pmon_daemon_control.json** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, all sonic platform will have support for CMIS. I will check with platforms why it should be disabled, but for now, it should be enabled for all platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do
|
||
Functionality should continue to work across warm boot. | ||
- The CMIS application initialization should not be performed during WarmBoot. | ||
- The CMIS application initialization should be skipped if no application code updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should reconfigure the datapath during fast reboot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not necessary, the CmisMamagerTask will always double-check the current application mode against the desired application from the CONFIG_DB, hence this will be processed when the XCVR INSERTION events are replayed after fast-reboot.
|
||
## sonic-utilities/sfputil (modified) | ||
|
||
- Add supprot to show the CMIS application advertisement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how will the CLI look like? sample output?
Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su [email protected]
Why I did it
Add initial support for CMIS application initialization and loopback
How I did it
1-1. sonic-xcvr/*/cmis.py: Add support for QSFP-DD application advertising and initialization
1-2. sonic-xcvr/sfp_optoe_base.py: Add support for error description and CMIS application selection
1-3. sonic_platform_base/sfp_base.py: Add stub routines for CMIS application initialization
How to verify it
Which release branch to backport (provide reason below if selected)
Description for the changelog
A picture of a cute animal (not mandatory but encouraged)