Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[device/celestica/x86_64-cel_silverstone-r0] Fix QSFP-DD support #92

Merged
merged 17 commits into from
Nov 20, 2020
Merged

[device/celestica/x86_64-cel_silverstone-r0] Fix QSFP-DD support #92

merged 17 commits into from
Nov 20, 2020

Conversation

ds952811
Copy link

@ds952811 ds952811 commented Nov 16, 2020

- What I did

  1. This is built on top of PR91, which carries forward the changes in PR91 with some modifications
    i.e. This is a replacement of PR91, and you don't need PR91 with this PR merged.
  2. switchboard.c: FPGA-I2C: Enable high-speed mode (i.e. BIT6)
  3. switchboard.c: FPGA-I2C: Fix the turnaround issues in jiffies comparison
  4. switchboard.c: FPGA-I2C: Yield the CPU resources during the busy-wait for I2C ready bit
  5. sfputil.py: Revise get_transceiver_change_event() to poll the presence flags and generate change events accordingly, such that SFP insertion events will surely be generated at every boot attempt of 'pmon' instead of physical transceiver insertion/removal.
  6. sfputil.py: Add QSFPDD initialization sequence for 'InnoLight T-DP4CNT-N00' to correctly enable 4x100G in case that HwSKU=Silverstone-128x100
  7. sfputil.py: Automatically reset the QSFPDD transceiver if its state != ModuleReady for 3+ seconds.
  8. sfputil.py: Add support of displaying QSFP-DD application advertisements, and the current application mode.
  9. Silverstone-128x100/port_config.ini: Fix the interface naming conflicts (It looks like typos)
    10.sfputil.py: Monitoring the QSFP-DD initialization state, and reinitiate it if necessary

- How I did it
See - What I did

- How to verify it

  1. Bring up the DUT with the default config, and ensure it's in 'Silverstone-128x100'
  2. Attach the QSFP-DD transceivers for the link up/down tests

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@ds952811 ds952811 changed the title Silverstone qsfpdd [device/celestica/x86_64-cel_silverstone-r0] Fix QSFP-DD support Nov 16, 2020
@ds952811
Copy link
Author

Just a heads up, the following xcvrd crash observed after multiple attempts of QSFPDD reinsertions, and hence we'll also need PR5 of sonic-platform-common for this.

i.e.
zhenggen-xu/sonic-platform-common#5

Nov 18 00:57:55.488171 sonic INFO pmon#supervisord: xcvrd Traceback (most recent call last):
Nov 18 00:57:55.488171 sonic INFO pmon#supervisord: xcvrd File "/usr/bin/xcvrd", line 683, in
Nov 18 00:57:55.488171 sonic INFO pmon#supervisord: xcvrd sys.exit(main())
Nov 18 00:57:55.488171 sonic INFO pmon#supervisord: xcvrd File "/usr/bin/xcvrd", line 606, in main
Nov 18 00:57:55.488188 sonic INFO pmon#supervisord: xcvrd rc = post_port_sfp_info_to_db(logical_port, int_tbl)
Nov 18 00:57:55.488188 sonic INFO pmon#supervisord: xcvrd File "/usr/bin/xcvrd", line 265, in post_port_sfp_info_to_db
Nov 18 00:57:55.488206 sonic INFO pmon#supervisord: xcvrd port_info_dict = platform_sfputil.get_transceiver_info_dict(physical_port)
Nov 18 00:57:55.488206 sonic INFO pmon#supervisord: xcvrd File "/usr/local/lib/python2.7/dist-packages/sonic_platform_base/sonic_sfp/sfputilbase.py", line 726, in get_transceiver_info_dict
Nov 18 00:57:55.501232 sonic INFO pmon#supervisord: xcvrd sfp_vendor_pn_raw = self._read_eeprom_specific_bytes(sysfsfile_eeprom, (offset + OSFP_VENDOR_PN_OFFSET), XCVR_VENDOR_PN_WIDTH)
Nov 18 00:57:55.501232 sonic INFO pmon#supervisord: xcvrd File "/usr/local/lib/python2.7/dist-packages/sonic_platform_base/sonic_sfp/sfputilbase.py", line 314, in _read_eeprom_specific_bytes
Nov 18 00:57:55.501390 sonic INFO pmon#supervisord: xcvrd print("Error: reading sysfs file %s" % sysfs_sfp_i2c_client_eeprom_path)
Nov 18 00:57:55.501390 sonic INFO pmon#supervisord: xcvrd NameError: global name 'sysfs_sfp_i2c_client_eeprom_path' is not defined

In the current implementation, the SFP change events are actually software
simulated by polling the presence flags, and the INSERTION events will be
be generated upon pmon restart and reboot.

And hence, significant packet drops could be observed upon both pmon restart
and a warm-reboot, as we'll always fire a software reset to the transceivers.

i.e.
It's better to drop the QSFPDD software reset from the insertion event

Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
@zhenggen-xu
Copy link
Owner

Can you please share the testing log for Innolight and Eoptolink optics 4x100G mode with Silverstone-128x100 HWSKU? I would like to see the at least link status (up), plug out/in still works, show interface transceiver eeprom -d etc.

if name.upper() != 'INNOLIGHT' or part.upper() != 'T-DP4CNT-N00':
return True

log_info("PORT {0}: _init_cmis_module".format(port_num))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log port_num and module name for better visibility.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

@ds952811
Copy link
Author

ds952811 commented Nov 19, 2020 via email

1. Add sanity check to application advertisement decoder to prevent
   infinite loop

2. Create a function wrapper for module-specific CMIS init

Signed-off-by: Dante Su <[email protected]>
@ds952811
Copy link
Author

The UT result of 'sfputil.py: address review comments':

$ grep pmon /var/log/syslog
......
Nov 19 02:51:05.383744 sonic INFO pmon#xcvrd: Got SFP inserted event
Nov 19 02:51:05.707443 sonic INFO pmon#xcvrd: get sfp info successfully Ethernet246, push to db
Nov 19 02:51:05.707636 sonic INFO pmon#xcvrd: Got SFP inserted event
Nov 19 02:51:06.031486 sonic INFO pmon#xcvrd: get sfp info successfully Ethernet248, push to db
Nov 19 02:51:06.031686 sonic INFO pmon#xcvrd: Got SFP inserted event
Nov 19 02:51:06.355503 sonic INFO pmon#xcvrd: get sfp info successfully Ethernet250, push to db
Nov 19 02:51:06.355850 sonic INFO pmon#xcvrd: Got SFP inserted event
Nov 19 02:51:06.679629 sonic INFO pmon#xcvrd: get sfp info successfully Ethernet252, push to db
Nov 19 02:51:06.679854 sonic INFO pmon#xcvrd: Got SFP inserted event
Nov 19 02:51:07.003463 sonic INFO pmon#xcvrd: get sfp info successfully Ethernet254, push to db
Nov 19 02:51:07.003689 sonic INFO pmon#xcvrd: State transition from 0 to 1
Nov 19 02:51:48.627401 sonic INFO pmon#sfputil.py: PORT 31: INNOLIGHT T-DP4CNT-N00: _init_cmis_module_custom
root@sonic:#
root@sonic:
#
root@sonic:#
root@sonic:
#
root@sonic:# hexdump -C -n 256 -s 0x880 /sys/bus/i2c/devices/40-0050/eeprom
00000880 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000890 00 21 21 25 25 29 29 2d 2d ff 00 00 33 33 33 33 |.!!%%))--...3333|
000008a0 ff ff 22 22 22 22 00 00 00 00 22 22 22 22 00 00 |..""""....""""..|
000008b0 00 00 00 00 10 10 10 10 10 10 10 10 ff 00 00 33 |...............3|
000008c0 33 33 33 ff ff 22 22 22 22 00 00 00 00 22 22 22 |333..""""...."""|
000008d0 22 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |"...............|
000008e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000900 44 44 44 44 00 00 ff 00 00 00 00 00 0f 00 0f 00 |DDDD............|
00000910 0f 00 0f 0f 00 00 00 00 00 00 3e db 3e 51 3c ca |..........>.>Q<.|
00000920 3d 09 00 00 00 00 00 00 00 00 9c 40 9c 40 9c 40 |=..........@.@.@|
00000930 9c 40 00 00 00 00 00 00 00 00 3a ad 54 00 31 b3 |.@........:.T.1.|
00000940 35 58 00 00 00 00 00 00 00 00 11 11 11 11 21 21 |5X............!!|
00000950 25 25 29 29 2d 2d ff 00 00 33 33 33 33 ff ff 22 |%%))--...3333.."|
00000960 22 22 22 00 00 00 00 22 22 22 22 00 00 00 00 00 |"""...."""".....|
00000970 11 12 13 14 00 00 00 00 11 12 13 14 00 00 00 00 |................|
00000980
root@sonic:
#

@zhenggen-xu
Copy link
Owner

You can drag and drop the txt files into comments or better in PR description, I would like to see the output of "show interface status", "show interface transciever eeprom -d" with multiple optical modules in place.

@ds952811
Copy link
Author

ds952811 commented Nov 19, 2020

  1. Can you remove both cmis-init.sh from the PR?
    Done, please refer to commit bf811ae
  2. Attached is the UT result
    QSFPDD_UT.log

Note:

  • Eoptolink: Only 2 out of 4 links were UP, as we only had 2 cables physically connected, and it's also evidence that all of the 100G links on the Eoptolink could work independently in the default application mode (i.e. AppSel=1, 1x400G mode)
  • InnoLight: Please check the last result of 'show interface status', while the Ethernet242 was down, Ethernet240, Ethernet244 and Ethernet246 were still up. And it was running at application 2, which is 4x100G mode.

@ds952811
Copy link
Author

Not sure why the inline response is not displayed on the conversation page, here is my response to the last question:

My question was, without later module inserts events, would this code even be called later? I,E, the module may stuck at bad state for long time?

The get_transceiver_change_event() is invoked by the infinite loop of the xcvrd, and no matter if a insertion/removal event generated or not, it will always poll the module state and trigger a software reset when the failure count > 2.
Note:

The insertion/removal event is generated at line 503, there is an in-memory presence table for the change events
Both the module state checker and the QSFPDD config error checker are active when the module is attached, it does not matter if the insertion/removal event is generated or not. It's outside the if-statement at line 502

zhenggen-xu pushed a commit that referenced this pull request Dec 15, 2020
e1842b2 Change STATE_DB key (PCIE_STATUS|PCIE_DEVICES -> PCIE_DEVICES) (#93)
a6c0071 [thermalctld] Fix issue: fan status should not be True when fan is absent (#92)
zhenggen-xu pushed a commit that referenced this pull request Dec 15, 2020
Commits included:

* src/sonic-py-swsssdk 748c404...1ea30d2 (1):
  > Fix bug: ConfigDBConnector.get_table does not work in python3 (#92)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants