Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

determine-reboot-cause fails due to TypeError seen during during boot up #6009

Closed
vaibhavhd opened this issue Nov 23, 2020 · 11 comments
Closed
Assignees

Comments

@vaibhavhd
Copy link
Contributor

vaibhavhd commented Nov 23, 2020

Description

Reboot cause determination fails and errors are seen in syslog.
The errors seem to be coming due to Python3 script is using Python2 based library from different vendors at
https://github.com/Azure/sonic-buildimage/tree/master/platform/broadcom

The Python3 jump for determine-reboot-cause and process-reboot-cause has been made with the platform specific lib still based out in Python2.

Steps to reproduce the issue:

  1. Install latest SONiC master image on Arista device. I hit the issue in 500
  2. Issue is seen in syslog at the beginning stage where DUT is booting into the new image.

Describe the results you received:

Nov 23 17:10:34 sonic root: SONiC version master.500-e0781f46 starting up...
Nov 23 17:10:34 sonic root: SONiC version master.500-e0781f46 starting up...
Nov 23 17:10:34 sonic rc.local[469]: + cut -d , -f2
Nov 23 17:10:34 sonic rc.local[469]: + cut -d , -f2
Nov 23 17:10:34 sonic determine-reboot-cause[599]: Traceback (most recent call last):
Nov 23 17:10:34 sonic determine-reboot-cause[599]: Traceback (most recent call last):
Nov 23 17:10:34 sonic determine-reboot-cause[599]:   File "/usr/local/bin/determine-reboot-cause", line 236, in <module>
Nov 23 17:10:34 sonic determine-reboot-cause[599]:   File "/usr/local/bin/determine-reboot-cause", line 236, in <module>
Nov 23 17:10:34 sonic determine-reboot-cause[599]:     main()
Nov 23 17:10:34 sonic determine-reboot-cause[599]:     main()
Nov 23 17:10:34 sonic determine-reboot-cause[599]:   File "/usr/local/bin/determine-reboot-cause", line 185, in main
Nov 23 17:10:34 sonic determine-reboot-cause[599]:   File "/usr/local/bin/determine-reboot-cause", line 185, in main
Nov 23 17:10:34 sonic determine-reboot-cause[599]:     (hardware_reboot_cause, additional_reboot_info) = find_hardware_reboot_cause()
Nov 23 17:10:34 sonic determine-reboot-cause[599]:     (hardware_reboot_cause, additional_reboot_info) = find_hardware_reboot_cause()
Nov 23 17:10:34 sonic determine-reboot-cause[599]:   File "/usr/local/bin/determine-reboot-cause", line 120, in find_hardware_reboot_cause
Nov 23 17:10:34 sonic determine-reboot-cause[599]:   File "/usr/local/bin/determine-reboot-cause", line 120, in find_hardware_reboot_cause
Nov 23 17:10:34 sonic determine-reboot-cause[599]:     hardware_reboot_cause_major, hardware_reboot_cause_minor = get_reboot_cause_from_platform()
Nov 23 17:10:34 sonic determine-reboot-cause[599]:     hardware_reboot_cause_major, hardware_reboot_cause_minor = get_reboot_cause_from_platform()
Nov 23 17:10:34 sonic determine-reboot-cause[599]:   File "/usr/local/bin/determine-reboot-cause", line 107, in get_reboot_cause_from_platform
Nov 23 17:10:34 sonic determine-reboot-cause[599]:   File "/usr/local/bin/determine-reboot-cause", line 107, in get_reboot_cause_from_platform
Nov 23 17:10:34 sonic determine-reboot-cause[599]:     platform  = sonic_platform.platform.Platform()
Nov 23 17:10:34 sonic determine-reboot-cause[599]:     platform  = sonic_platform.platform.Platform()
Nov 23 17:10:34 sonic determine-reboot-cause[599]:   File "/usr/lib/python3/dist-packages/arista/utils/sonic_platform/platform.py", line 16, in __init__
Nov 23 17:10:34 sonic determine-reboot-cause[599]:   File "/usr/lib/python3/dist-packages/arista/utils/sonic_platform/platform.py", line 16, in __init__
Nov 23 17:10:34 sonic determine-reboot-cause[599]:     self._chassis = Chassis(self._platform)
Nov 23 17:10:34 sonic determine-reboot-cause[599]:     self._chassis = Chassis(self._platform)
Nov 23 17:10:34 sonic determine-reboot-cause[599]:   File "/usr/lib/python3/dist-packages/arista/utils/sonic_platform/chassis.py", line 48, in __init__
Nov 23 17:10:34 sonic determine-reboot-cause[599]:   File "/usr/lib/python3/dist-packages/arista/utils/sonic_platform/chassis.py", line 48, in __init__
Nov 23 17:10:34 sonic determine-reboot-cause[599]:     self._sfp_list[index - 1] = Sfp(index, sfp)
Nov 23 17:10:34 sonic determine-reboot-cause[599]:     self._sfp_list[index - 1] = Sfp(index, sfp)
Nov 23 17:10:34 sonic determine-reboot-cause[599]:   File "/usr/lib/python3/dist-packages/arista/utils/sonic_platform/sfp.py", line 45, in __init__
Nov 23 17:10:34 sonic determine-reboot-cause[599]:   File "/usr/lib/python3/dist-packages/arista/utils/sonic_platform/sfp.py", line 45, in __init__
Nov 23 17:10:34 sonic determine-reboot-cause[599]:     self._eepromPath = EEPROM_PATH.format(sfp.addr.bus, sfp.addr.address)
Nov 23 17:10:34 sonic determine-reboot-cause[599]:     self._eepromPath = EEPROM_PATH.format(sfp.addr.bus, sfp.addr.address)
Nov 23 17:10:34 sonic determine-reboot-cause[599]:   File "/usr/lib/python3/dist-packages/arista/components/scd.py", line 47, in bus
Nov 23 17:10:34 sonic determine-reboot-cause[599]:   File "/usr/lib/python3/dist-packages/arista/components/scd.py", line 47, in bus
Nov 23 17:10:34 sonic determine-reboot-cause[599]:     return self.scd_.i2cOffset + self.bus_
Nov 23 17:10:34 sonic determine-reboot-cause[599]:     return self.scd_.i2cOffset + self.bus_
Nov 23 17:10:34 sonic determine-reboot-cause[599]: TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'
Nov 23 17:10:34 sonic determine-reboot-cause[599]: TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'

Describe the results you expected:
The TypeError should not be thrown and reboot cause should be detected.

Additional information you deem important (e.g. issue happens only occasionally):

**Output of `show version`:**
$ show version 

SONiC Software Version: SONiC.master.498-5fbb6ee6
Distribution: Debian 10.6
Kernel: 4.19.0-9-2-amd64
Build commit: 5fbb6ee6
Build date: Fri Nov 20 14:36:29 UTC 2020
Built by: johnar@jenkins-worker-3

Platform: x86_64-arista_7260cx3_64
HwSKU: Arista-7260CX3-Q64
ASIC: broadcom
**Attach debug file `sudo generate_dump`:**

```
(paste your output here)
```
@vaibhavhd
Copy link
Contributor Author

This now does not seem to be an Arista platform issue. The reboo-cause determination fails on other platforms too.
The errors seen also seem to be due to Python3 conversion done recently - #5933
https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-host-services/scripts/determine-reboot-cause#L1

The error trace on Dell clearly points out at the Python3 error due to Python2 code:

Nov 24 22:38:43 sonic determine-reboot-cause: /proc/cmdline indicates reboot type: warm-reboot
Nov 24 22:38:43 sonic determine-reboot-cause[1240]: Traceback (most recent call last):
Nov 24 22:38:43 sonic determine-reboot-cause[1240]: Traceback (most recent call last):
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:   File "/usr/local/bin/determine-reboot-cause", line 236, in <module>
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:   File "/usr/local/bin/determine-reboot-cause", line 236, in <module>
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:     main()
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:     main()
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:   File "/usr/local/bin/determine-reboot-cause", line 185, in main
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:   File "/usr/local/bin/determine-reboot-cause", line 185, in main
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:     (hardware_reboot_cause, additional_reboot_info) = find_hardware_reboot_cause()
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:     (hardware_reboot_cause, additional_reboot_info) = find_hardware_reboot_cause()
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:   File "/usr/local/bin/determine-reboot-cause", line 120, in find_hardware_reboot_cause
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:   File "/usr/local/bin/determine-reboot-cause", line 120, in find_hardware_reboot_cause
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:     hardware_reboot_cause_major, hardware_reboot_cause_minor = get_reboot_cause_from_platform()
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:     hardware_reboot_cause_major, hardware_reboot_cause_minor = get_reboot_cause_from_platform()
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:   File "/usr/local/bin/determine-reboot-cause", line 107, in get_reboot_cause_from_platform
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:   File "/usr/local/bin/determine-reboot-cause", line 107, in get_reboot_cause_from_platform
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:     platform  = sonic_platform.platform.Platform()
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:     platform  = sonic_platform.platform.Platform()
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:   File "/usr/local/lib/python3.7/dist-packages/sonic_platform/platform.py", line 25, in __init__
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:   File "/usr/local/lib/python3.7/dist-packages/sonic_platform/platform.py", line 25, in __init__
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:     self._chassis = Chassis()
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:     self._chassis = Chassis()
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:   File "/usr/local/lib/python3.7/dist-packages/sonic_platform/chassis.py", line 61, in __init__
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:   File "/usr/local/lib/python3.7/dist-packages/sonic_platform/chassis.py", line 61, in __init__
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:     self._eeprom = Eeprom()
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:     self._eeprom = Eeprom()
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:   File "/usr/local/lib/python3.7/dist-packages/sonic_platform/eeprom.py", line 51, in __init__
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:   File "/usr/local/lib/python3.7/dist-packages/sonic_platform/eeprom.py", line 51, in __init__
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:     total_length = (ord(eeprom[9]) << 8) | ord(eeprom[10])
Nov 24 22:38:43 sonic determine-reboot-cause[1240]:     total_length = (ord(eeprom[9]) << 8) | ord(eeprom[10])
Nov 24 22:38:43 sonic determine-reboot-cause[1240]: TypeError: ord() expected string of length 1, but int found
Nov 24 22:38:43 sonic determine-reboot-cause[1240]: TypeError: ord() expected string of length 1, but int found

@vaibhavhd vaibhavhd self-assigned this Nov 24, 2020
@vaibhavhd
Copy link
Contributor Author

Issue is also seen in 502.

@vaibhavhd
Copy link
Contributor Author

The sonic-platform-common library has recently made updates to support Python3. As part of the PR sonic-net/sonic-platform-common#127, read_eeprom() now returns a bytearray instead of a string.

The Dell S6100 error reported above is due to the Dell library is not updated to expect bytearray eeprom data.
https://github.com/Azure/sonic-buildimage/blob/master/platform/broadcom/sonic-platform-modules-dell/s6100/sonic_platform/eeprom.py#L51

  File "/usr/local/lib/python3.7/dist-packages/sonic_platform/eeprom.py", line 51, in __init__
    total_length = (ord(eeprom[9]) << 8) | ord(eeprom[10])
    total_length = (ord(eeprom[9]) << 8) | ord(eeprom[10])
  TypeError: ord() expected string of length 1, but int found

@jleveque should this fix be dealt by Dell?

@jleveque
Copy link
Contributor

Yes. The fix is here: #5960. I was waiting to merge it until after #5926 merges, but the PR has taken longer than expected to be merge-ready. It should now be ready to merge, just waiting on the vsimage test to pass, but since this is causing a regression, I can merge it now and rebase my PR on top of it.

@vaibhavhd
Copy link
Contributor Author

Thanks Joe!
I am now looking at Arista issue posted in this issue's description. That seems to have percolated after this commit in Arista repo: aristanetworks/sonic@05a02d7

Particularly this new line that got added:
https://github.com/aristanetworks/sonic/blob/master/arista/utils/sonic_platform/sfp.py#L45

Using some extra debug prints I found out that at the time of determine-reboot-cause (which takes places after rc.local completion), the kernel I2C buses are not ready. Which causes i2cBusFromName() to return None
https://github.com/aristanetworks/sonic/blob/master/arista/drivers/scd/driver.py#L29

This happens during driver setup after reboot
https://github.com/aristanetworks/sonic/blob/master/arista/drivers/scd/driver.py#L80

Hence self.scd_.i2cOffset is set to None which causes problem seen in the logs:

Nov 23 17:10:34 sonic determine-reboot-cause[599]:   File "/usr/lib/python3/dist-packages/arista/components/scd.py", line 47, in bus
Nov 23 17:10:34 sonic determine-reboot-cause[599]:     return self.scd_.i2cOffset + self.bus_

@jleveque should someone from Arista provide a Fix?

@jleveque
Copy link
Contributor

@Staphylo: Can you please investigate the issue in the comment above?

@Staphylo
Copy link
Collaborator

Sure Joe. The error seen above most often happens when the platform is not initialized yet but APIs are being called.
We have a systemd Before=process-reboot-cause rule to prevent this from happening but since the systemd service was renamed it doesn't apply anymore. We'll work on a fix.
https://github.com/aristanetworks/sonic/blob/3943e69510b471e65797d6054dd5f47cff3012c3/systemd/platform-arista-reboot-cause.service#L5

@Staphylo
Copy link
Collaborator

Just confirming that adding the dependency solves the issue.
However process-reboot-cause.service now crashes though it's not platform related.
I'll perform a submodule update with the fix in the coming days.

root@gd231:/home/admin# journalctl -u process-reboot-cause.service
-- Logs begin at Wed 2020-11-25 21:40:07 UTC, end at Wed 2020-11-25 21:41:44 UTC. --
Nov 25 21:41:38 gd231 systemd[1]: Started Reboot cause determination service.
Nov 25 21:41:38 gd231 systemd[4489]: process-reboot-cause.service: Failed to execute command: No such file or directory
Nov 25 21:41:38 gd231 systemd[4489]: process-reboot-cause.service: Failed at step EXEC spawning /usr/bin/process-reboot-cause: No such file or directory
Nov 25 21:41:38 gd231 systemd[1]: process-reboot-cause.service: Main process exited, code=exited, status=203/EXEC
Nov 25 21:41:38 gd231 systemd[1]: process-reboot-cause.service: Failed with result 'exit-code'.
root@gd231:/home/admin# journalctl -u determine-reboot-cause.service 
-- Logs begin at Wed 2020-11-25 21:40:07 UTC, end at Wed 2020-11-25 21:41:44 UTC. --
Nov 25 21:40:14 gd231 systemd[1]: Started Reboot cause determination service.
Nov 25 21:40:14 gd231 determine-reboot-cause[1087]: Starting up...
Nov 25 21:40:14 gd231 determine-reboot-cause[1087]: No reboot cause found from /proc/cmdline
Nov 25 21:40:14 gd231 determine-reboot-cause[1087]: Platform api returns reboot cause Non-Hardware, reboot, time: 2020-11-25 21:39:37 UTC
Nov 25 21:40:14 gd231 determine-reboot-cause[1087]: No reboot cause found from platform api
Nov 25 21:40:14 gd231 determine-reboot-cause[1087]: /host/reboot-cause/reboot-cause.txt indicates the reboot cause: User issued 'reboot' command [User: admin, Time: Wed 25 Nov 2020 09:39:29 PM UTC]
Nov 25 21:40:14 gd231 systemd[1]: determine-reboot-cause.service: Succeeded.

@Staphylo
Copy link
Collaborator

The path for ExecStart in process-reboot-cause.service should be /usr/local/bin/process-reboot-cause.
I just tried it and it seems to be working.

@jleveque
Copy link
Contributor

The path for ExecStart in process-reboot-cause.service should be /usr/local/bin/process-reboot-cause.
I just tried it and it seems to be working.

@sujinmkang: Can you please open a PR to fix this?

@vaibhavhd
Copy link
Contributor Author

Closing this issue now.
The error logged in this issue is not seen in both Arista and Dell platforms. However, the reboot cause determination still is flaky and incorrect (for different errors). I am tracking this in separate tickets now:
Dell #6082
More general issue - #6101

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants