From afd4e871cedcc54349dd01a5b148ef6ee55c1a66 Mon Sep 17 00:00:00 2001 From: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com> Date: Wed, 25 Mar 2020 15:28:07 +0800 Subject: [PATCH 1/3] Update psud design to a separate document and add PSU led enhancement --- doc/psud/PSU_daemon_design.md | 152 ++++++++++++++++++++++++++++++++++ 1 file changed, 152 insertions(+) create mode 100644 doc/psud/PSU_daemon_design.md diff --git a/doc/psud/PSU_daemon_design.md b/doc/psud/PSU_daemon_design.md new file mode 100644 index 0000000000..989d893345 --- /dev/null +++ b/doc/psud/PSU_daemon_design.md @@ -0,0 +1,152 @@ +# SONiC PSU Daemon Design # + +### Rev 0.1 ### + +### Revision ### + + | Rev | Date | Author | Change Description | + |:---:|:-----------:|:------------------:|-----------------------------------| + | 0.1 | | Chen Junchao | Initial version | + + +## 1. Overview + +When I write this document, SONiC already has a PSU daemon called psud running in pmon docker. There is already a high level design document which contains existing psud [design](https://github.com/Azure/SONiC/blob/master/doc/pmon/pmon-enhancement-design.md), but some of its content is out of date. This document will update the design of current PSU daemon implementation and add an enhancement about PSU led management. + +The purpose of PSU daemon is to collect platform PSU data and trigger proper actions if necessary. Major functions of psud include: + +- Collect constant PSU data during daemon boot up, such as PSU number. +- Collect variable PSU data periodically. +- Monitor PSU event, set LED color and trigger syslog according to event type. + +## 2. PSU data collection + +PSU daemon data collection flow diagram: + +![](https://github.com/keboliu/SONiC/blob/master/doc/pmon/daemon-flow.svg) + +Now psud collects PSU data via platform API, and it also support platform plugin for backward compatible. All PSU data will be saved to redis database for further usage. + +## 3. DB schema for PSU + +PSU number is stored in chassis table. Please refer to this [document](https://github.com/Azure/SONiC/blob/master/doc/pmon/pmon-enhancement-design.md), section 1.5.2. + +PSU information is stored in PSU table: + + ; Defines information for a psu + key = PSU_INFO|psu_name ; information for the psu + ; field = value + presence = BOOLEAN ; presence of the psu + model = STRING ; model name of the psu + serial = STRING ; serial number of the psu + status = BOOLEAN ; status of the psu + change_event = STRING ; change event of the psu + fan = STRING ; fan_name of the psu + led_status = STRING ; led status of the psu + +Now psud only collect and update "presence" and "status" field. + +## 4. PSU command + +There is a sub command "psustatus" under "show platform" + +``` +admin@sonic:~$ show platform ? +Usage: show platform [OPTIONS] COMMAND [ARGS]... + + Show platform-specific hardware info + +Options: + -?, -h, --help Show this message and exit. + +Commands: + fan Show fan status information + firmware Show firmware status information + mlnx Show Mellanox platform information + psustatus Show PSU status information + ssdhealth Show SSD Health information + summary Show hardware platform information + syseeprom Show system EEPROM information + temperature Show device temperature information\ +``` + +The current output for "show platform psustatus" looks like: + +``` +admin@sonic:~$ show platform psustatus +PSU Status +----- -------- +PSU 1 OK +PSU 2 OK +``` + +## 5. PSU LED management + +The purpose of PSU LED management is to notify user about PSU event by PSU LED or syslog. Current PSU daemon psud need to monitor PSU event (PSU voltage out of range, PSU too hot) and trigger proper actions if necessary. + +### 5.1 PSU event definition + +We define a few abnormal PSU events here. When any PSU event happens, syslog should be triggered with "Alert Message", PSU LED should be set to "PSU LED color"; when any PSU restores from previous abnormal state, syslog should be triggered with "Recover Message". PSU LED should be set to green only if there is no any abnormal PSU event happens. + +#### 5.1.1 PSU voltage out of range + + Alert Message: PSU voltage warning: voltage out of range, current voltage=, valid range=[, ]. + + PSU LED color: red. + + Recover Message: PSU voltage warning cleared: voltage is back to normal. + +#### 5.1.2 PSU tenmperature too hot + + Alert Message: PSU temperature warning: temperature too hot, temperature=, threshold=. + + PSU LED color: red. + + Recover Message: PSU temperature warning cleared: temperature is back to normal. + +#### 5.1.3 Power absence + + Alert Message: Power absence warning: is out of power. + + PSU LED color: red. + + Recover Message: Power absence warning cleared: power is back to normal. + +#### 5.1.4 PSU absence + + Alert Message: PSU absence warning: is not present. + + PSU LED color: red. (PSU LED might not be available at this point) + + Recover Message: PSU absence warning cleared: is inserted back. + +### 5.2 Platform API change + +Some abstract member methods need to be added to [psu_base.py](https://github.com/Azure/sonic-platform-common/blob/master/sonic_platform_base/psu_base.py) and vendor should implement these methods. + +```python + +class PsuBase(device_base.DeviceBase): + ... + def get_temperature(self): + raise NotImplementedError + + def get_temperature_high_threshold(self): + raise NotImplementedError + + def get_voltage_high_threshold(self): + raise NotImplementedError + + def get_voltage_low_threshold(self): + raise NotImplementedError + ... + +``` + +### 6. PSU daemon flow + +Supervisord takes charge of this daemon. This daemon will loop every 3 seconds and get the data from psuutil/platform API and then write it the Redis DB. + +- The psu_num will store in "chassis_info" table. It will just be invoked one time when system boot up or reload. The key is chassis_name, the field is "psu_num" and the value is from get_psu_num(). +- The psu_status and psu_presence will store in "psu_info" table. It will be updated every 3 seconds. The key is psu_name, the field is "presence" and "status", the value is from get_psu_presence() and get_psu_num(). +- The daemon query PSU event every 10 seconds via platform API. If any event detects, it should set PSU LED color accordingly and trigger proper syslog. From 1e14514f1541f63ec8e1f572939c8a473af65995 Mon Sep 17 00:00:00 2001 From: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com> Date: Mon, 30 Mar 2020 10:06:06 +0800 Subject: [PATCH 2/3] 1. Using daemon flow image in community not personal fork; 2. Remove un-relevant description from overview and move it to PR description --- doc/psud/PSU_daemon_design.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/doc/psud/PSU_daemon_design.md b/doc/psud/PSU_daemon_design.md index 989d893345..adbaa8aaa6 100644 --- a/doc/psud/PSU_daemon_design.md +++ b/doc/psud/PSU_daemon_design.md @@ -11,8 +11,6 @@ ## 1. Overview -When I write this document, SONiC already has a PSU daemon called psud running in pmon docker. There is already a high level design document which contains existing psud [design](https://github.com/Azure/SONiC/blob/master/doc/pmon/pmon-enhancement-design.md), but some of its content is out of date. This document will update the design of current PSU daemon implementation and add an enhancement about PSU led management. - The purpose of PSU daemon is to collect platform PSU data and trigger proper actions if necessary. Major functions of psud include: - Collect constant PSU data during daemon boot up, such as PSU number. @@ -23,7 +21,7 @@ The purpose of PSU daemon is to collect platform PSU data and trigger proper act PSU daemon data collection flow diagram: -![](https://github.com/keboliu/SONiC/blob/master/doc/pmon/daemon-flow.svg) +![](https://github.com/Azure/SONiC/blob/master/doc/pmon/daemon-flow.svg) Now psud collects PSU data via platform API, and it also support platform plugin for backward compatible. All PSU data will be saved to redis database for further usage. From e0381d4ba03d3b04c83778d4e5f4ccabd9f0fdc8 Mon Sep 17 00:00:00 2001 From: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com> Date: Mon, 21 Sep 2020 09:03:13 +0800 Subject: [PATCH 3/3] Fix typo --- doc/psud/PSU_daemon_design.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/psud/PSU_daemon_design.md b/doc/psud/PSU_daemon_design.md index adbaa8aaa6..cc65015e9f 100644 --- a/doc/psud/PSU_daemon_design.md +++ b/doc/psud/PSU_daemon_design.md @@ -94,7 +94,7 @@ We define a few abnormal PSU events here. When any PSU event happens, syslog sho Recover Message: PSU voltage warning cleared: voltage is back to normal. -#### 5.1.2 PSU tenmperature too hot +#### 5.1.2 PSU temperature too hot Alert Message: PSU temperature warning: temperature too hot, temperature=, threshold=.