Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Chassis] Orchagent crashes with 400g link #108

Closed
arlakshm opened this issue Nov 12, 2024 · 3 comments
Closed

[Chassis] Orchagent crashes with 400g link #108

arlakshm opened this issue Nov 12, 2024 · 3 comments

Comments

@arlakshm
Copy link

During the nightly test we see this orchagent crash every time after reboot

2024 Nov 11 22:16:04.834585 str3-7800-lc3-1 NOTICE pmon#xcvrd: Retrieving media settings for port Ethernet144 speed 400000 num_lanes 8, using key {'vendor_key': 'CLOUD LIGHT     -7123-G37-05     ', 'media_key': 'QSFP-DD-active_cable_media_interface', 'lane_speed_key': 'speed:400GAUI-8'}
2024 Nov 11 22:16:04.834700 str3-7800-lc3-1 NOTICE pmon#xcvrd: Publishing ASIC-side SI setting for port Ethernet144 in APP_DB:
2024 Nov 11 22:16:04.834700 str3-7800-lc3-1 NOTICE pmon#xcvrd: start_lane_idx + lane_count (8) is beyond length of {'lane0': '0x55', 'lane1': '0x55', 'lane2': '0x50', 'lane3': '0x50'}, default start_lane_idx to 0 as a best effort
2024 Nov 11 22:16:04.834853 str3-7800-lc3-1 NOTICE pmon#xcvrd: 0:(main,0x55,0x55,0x50,0x50)
2024 Nov 11 22:16:04.834924 str3-7800-lc3-1 NOTICE pmon#xcvrd: start_lane_idx + lane_count (8) is beyond length of {'lane0': '0xffffffe7', 'lane1': '0xffffffe7', 'lane2': '0xffffffe9', 'lane3': '0xffffffe9'}, default start_lane_idx to 0 as a best effort
2024 Nov 11 22:16:04.835020 str3-7800-lc3-1 NOTICE pmon#xcvrd: 1:(post1,0xffffffe7,0xffffffe7,0xffffffe9,0xffffffe9)
2024 Nov 11 22:16:04.835091 str3-7800-lc3-1 NOTICE pmon#xcvrd: start_lane_idx + lane_count (8) is beyond length of {'lane0': '0x0', 'lane1': '0x0', 'lane2': '0x0', 'lane3': '0x0'}, default start_lane_idx to 0 as a best effort
2024 Nov 11 22:16:04.835175 str3-7800-lc3-1 NOTICE pmon#xcvrd: 2:(post2,0x0,0x0,0x0,0x0)
2024 Nov 11 22:16:04.835299 str3-7800-lc3-1 NOTICE pmon#xcvrd: start_lane_idx + lane_count (8) is beyond length of {'lane0': '0x0', 'lane1': '0x0', 'lane2': '0x0', 'lane3': '0x0'}, default start_lane_idx to 0 as a best effort
2024 Nov 11 22:16:04.835319 str3-7800-lc3-1 NOTICE pmon#xcvrd: 3:(post3,0x0,0x0,0x0,0x0)
2024 Nov 11 22:16:04.835510 str3-7800-lc3-1 NOTICE pmon#xcvrd: start_lane_idx + lane_count (8) is beyond length of {'lane0': '0xfffffff9', 'lane1': '0xfffffff9', 'lane2': '0xfffffffb', 'lane3': '0xfffffffb'}, default start_lane_idx to 0 as a best effort
2024 Nov 11 22:16:04.835510 str3-7800-lc3-1 NOTICE pmon#xcvrd: 4:(pre1,0xfffffff9,0xfffffff9,0xfffffffb,0xfffffffb)
2024 Nov 11 22:16:04.835613 str3-7800-lc3-1 NOTICE pmon#xcvrd: start_lane_idx + lane_count (8) is beyond length of {'lane0': '0x0', 'lane1': '0x0', 'lane2': '0x0', 'lane3': '0x0'}, default start_lane_idx to 0 as a best effort
2024 Nov 11 22:16:04.835613 str3-7800-lc3-1 NOTICE pmon#xcvrd: 5:(pre2,0x0,0x0,0x0,0x0)
2024 Nov 11 22:16:04.841275 str3-7800-lc3-1 WARNING syncd0#syncd: [06:00.0] SAI_API_UNSPECIFIED:sai_bulk_object_get_stats:809 Unsupported object type type 21
2024 Nov 11 22:16:04.843759 str3-7800-lc3-1 NOTICE pmon#xcvrd: Notify media setting: Published ASIC-side SI setting for lport Ethernet144 in APP_DB
2024 Nov 11 22:16:04.844418 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0x77120600001077 does not has supported counters
2024 Nov 11 22:16:04.845081 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0x31000600001028 does not has supported counters
2024 Nov 11 22:16:04.855951 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0x1000600001016 does not has supported counters
2024 Nov 11 22:16:04.856613 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0x2000600001017 does not has supported counters
2024 Nov 11 22:16:04.857315 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0x3000600001018 does not has supported counters
2024 Nov 11 22:16:04.867402 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0x4000600001019 does not has supported counters
2024 Nov 11 22:16:04.868134 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0x500060000101a does not has supported counters
2024 Nov 11 22:16:04.878444 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0x600060000101b does not has supported counters
2024 Nov 11 22:16:04.879135 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0x700060000101c does not has supported counters
2024 Nov 11 22:16:04.879786 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0x800060000101d does not has supported counters
2024 Nov 11 22:16:04.885174 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0x900060000101e does not has supported counters
2024 Nov 11 22:16:04.891290 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0xa00060000101f does not has supported counters
2024 Nov 11 22:16:04.891895 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0xb000600001020 does not has supported counters
2024 Nov 11 22:16:04.902135 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0xc000600001021 does not has supported counters
2024 Nov 11 22:16:04.903685 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0xd000600001022 does not has supported counters
2024 Nov 11 22:16:04.904332 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0xe000600001023 does not has supported counters
2024 Nov 11 22:16:04.914358 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0x14120600001014 does not has supported counters
2024 Nov 11 22:16:04.914996 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0x11100600003011 does not has supported counters
2024 Nov 11 22:16:04.915623 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0x12100600003012 does not has supported counters
2024 Nov 11 22:16:04.925927 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0x13100600003013 does not has supported counters
2024 Nov 11 22:16:04.926551 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0x14100600003014 does not has supported counters
2024 Nov 11 22:16:04.927159 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0x15100600003015 does not has supported counters
2024 Nov 11 22:16:04.937201 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0x16100600003016 does not has supported counters
2024 Nov 11 22:16:04.937815 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0x17100600003017 does not has supported counters
2024 Nov 11 22:16:04.942739 str3-7800-lc3-1 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0x18100600003018 does not has supported counters
2024 Nov 11 22:16:04.948871 str3-7800-lc3-1 NOTICE swss1#orchagent: :- setHostTxReady: Setting host_tx_ready status = false, alias = Ethernet144, port_id = 0x101000000000001
2024 Nov 11 22:16:04.949081 str3-7800-lc3-1 NOTICE swss1#orchagent: :- setPortAdminStatus: Set admin status DOWN host_tx_ready to false for port Ethernet144
2024 Nov 11 22:16:04.970679 str3-7800-lc3-1 ERR syncd1#syncd: [07:00.0] SAI_API_PORT:brcm_sai_create_port_serdes:10926 Port lane count 4 is different from supported lane count 8
2024 Nov 11 22:16:04.970781 str3-7800-lc3-1 ERR syncd1#syncd: :- sendApiResponse: api SAI_COMMON_API_CREATE failed in syncd mode: SAI_STATUS_INVALID_ATTRIBUTE_MAX
2024 Nov 11 22:16:04.971359 str3-7800-lc3-1 ERR syncd1#syncd: :- processQuadEvent: attr: SAI_PORT_SERDES_ATTR_PORT_ID: oid:0x101000000000001
2024 Nov 11 22:16:04.971391 str3-7800-lc3-1 ERR syncd1#syncd: :- processQuadEvent: attr: SAI_PORT_SERDES_ATTR_TX_FIR_PRE1: 4:-7,-7,-5,-5
2024 Nov 11 22:16:04.971424 str3-7800-lc3-1 ERR syncd1#syncd: :- processQuadEvent: attr: SAI_PORT_SERDES_ATTR_TX_FIR_PRE2: 4:0,0,0,0
2024 Nov 11 22:16:04.971424 str3-7800-lc3-1 ERR syncd1#syncd: :- processQuadEvent: attr: SAI_PORT_SERDES_ATTR_TX_FIR_MAIN: 4:85,85,80,80
2024 Nov 11 22:16:04.971509 str3-7800-lc3-1 ERR syncd1#syncd: :- processQuadEvent: attr: SAI_PORT_SERDES_ATTR_TX_FIR_POST1: 4:-25,-25,-23,-23
2024 Nov 11 22:16:04.971509 str3-7800-lc3-1 ERR syncd1#syncd: :- processQuadEvent: attr: SAI_PORT_SERDES_ATTR_TX_FIR_POST2: 4:0,0,0,0
2024 Nov 11 22:16:04.971527 str3-7800-lc3-1 ERR syncd1#syncd: :- processQuadEvent: attr: SAI_PORT_SERDES_ATTR_TX_FIR_POST3: 4:0,0,0,0
2024 Nov 11 22:16:04.971884 str3-7800-lc3-1 ERR swss1#orchagent: :- create: create status: SAI_STATUS_INVALID_ATTRIBUTE_MAX
2024 Nov 11 22:16:04.971884 str3-7800-lc3-1 NOTICE syncd1#syncd: :- processNotifySyncd: Invoking SAI failure dump
2024 Nov 11 22:16:04.971944 str3-7800-lc3-1 ERR swss1#orchagent: :- setPortSerdesAttribute: Failed to create port serdes for port 0x101000000000001
2024 Nov 11 22:16:04.971969 str3-7800-lc3-1 ERR swss1#orchagent: :- handleSaiCreateStatus: Encountered failure in create operation, exiting orchagent, SAI API: SAI_API_PORT, status: SAI_STATUS_INVALID_ATTRIBUTE_MAX

If I removed the media_settings_json file from the hardware directory this crash is not see anymore.

@arlakshm
Copy link
Author

Image details
admin@str3-7800-lc3-1:~$ show vers

SONiC Software Version: SONiC.internal-202405.107587339-62ab6b6719
SONiC OS Version: 12
Distribution: Debian 12.6
Kernel: 6.1.0-22-2-amd64
Build commit: 62ab6b6719
Build date: Thu Nov 7 08:03:13 UTC 2024
Built by: azureuser@f4061679c000000

Platform: x86_64-arista_7800r3a_36dm2_lc
HwSKU: Arista-7800R3A-36DM2-D36
ASIC: broadcom
ASIC Count: 2
Serial Number: SGD22203294
Model Number: 7800R3A-36DM2-LC
Hardware Revision: 2a.05
Uptime: 01:35:51 up 3:22, 1 user, load average: 1.75, 2.00, 2.06
Date: Tue 12 Nov 2024 01:35:51

@arlakshm
Copy link
Author

kenneth-arista, @patrickmacarthur for viz..

lguohan pushed a commit to sonic-net/sonic-buildimage that referenced this issue Nov 15, 2024
Copied tuning values for sm_media_interface to nm_850_media_interface and active_cable_media_interface

This should fix issues like aristanetworks/sonic#108

Will ask MSFT to confirm this works with their cables which use active_cable_media_interface.
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this issue Nov 15, 2024
Copied tuning values for sm_media_interface to nm_850_media_interface and active_cable_media_interface

This should fix issues like aristanetworks/sonic#108

Will ask MSFT to confirm this works with their cables which use active_cable_media_interface.
mssonicbld pushed a commit to sonic-net/sonic-buildimage that referenced this issue Nov 15, 2024
Copied tuning values for sm_media_interface to nm_850_media_interface and active_cable_media_interface

This should fix issues like aristanetworks/sonic#108

Will ask MSFT to confirm this works with their cables which use active_cable_media_interface.
aidan-gallagher pushed a commit to aidan-gallagher/sonic-buildimage that referenced this issue Nov 16, 2024
Copied tuning values for sm_media_interface to nm_850_media_interface and active_cable_media_interface

This should fix issues like aristanetworks/sonic#108

Will ask MSFT to confirm this works with their cables which use active_cable_media_interface.
@kenneth-arista
Copy link

Closing as sonic-net/sonic-buildimage#20774 has merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants