-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit
Updated with 2 approaches to collect thermal info
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -320,18 +320,21 @@ Thermalctld is monitoring temperatures, monitoring fan speed and allowing polici | |
* Temperature sensors are on the control-card | ||
* Temperature sensors are on the line-card | ||
* Temperature sensors are on the SFMs. | ||
2. The FAN control is limited to the control-card | ||
2. All thermal sensor info should be available to the control-card. | ||
3. The FAN control is limited to the control-card. The Fan algorithm could be implemented as part of thermal-policy or by the platform. | ||
|
||
![Tempearature and Fan Control](pmon-chassis-images/pmon-chassis-distributed-thermalctld.png) | ||
|
||
#### Proposal | ||
1. Chassisd notified line-card up/down events are subscribed up Thermalctld. | ||
2. All local temperatures sensors are recorded on both control and line-cards for monitoring. The control-card monitors temperature sensors of SFMs. | ||
3. Chassisd on control-card will periodically fetch the summary-info from each of the line-cards. Alternately, the thermalctld on control-card can subscribe for the line-card sensors updates. | ||
3. Chassisd on control-card will periodically fetch/subscribe the thermal-sensors info from each of the line-cards. Alternately, the thermalctld on line-card can directly update the DB on the control-card. | ||
5. The local-temperatures of control-card, line-cards and fabric-cards are passed onto the Fan-Control algorithm. | ||
6. The fan-control algorithm can be implemented ina PMON or ina the platform-driver. | ||
|
||
Changes ina thermalctld is to have a TemperatureUpdater class for each line-card. Each of the updater class will fetch the values for all temperature senosors of the line-card from the REDIS-DB of the line-card. | ||
Changes in thermalctld would follow one of the 2 approaches: | ||
1. Have a TemperatureUpdater class for each line-card. Each of the updater class will fetch the values for all temperature senosors of the line-card from the REDIS-DB of the line-card and update the DB on the control-card. | ||
This comment has been minimized.
Sorry, something went wrong. |
||
2. The TemperatureUpdater class in each line-card will update the local-DB on its card as well as the global-DB on the control-card. | ||
This comment has been minimized.
Sorry, something went wrong.
shyam77git
Contributor
|
||
|
||
``` | ||
In src/sonic-platform-daemons/sonic-thermalctld/scripts/thermalctld: | ||
|
@@ -371,6 +374,9 @@ class ThermalInfo(ThermalPolicyInfoBase): | |
def collect(self, chassis): | ||
#Vendor specific calculation from all available sensor values on chassis | ||
``` | ||
|
||
There could be 2 approaches for where the Fan-Control algorithm could be implemented. | ||
|
||
In approach-1, the thermal_policy.json can provide additional action to check if line-card temperature exceeded the threshold etc. The thermalctld.run_policy() will match the required condition and take the appropriate action to set fan speed. | ||
|
||
In approach-2, the sensors information could be passed on the platform-driver which can then control the fan speed. | ||
|
1 comment
on commit 106c511
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating the SONiC PMON HLD for Modular chassis scenarios handling.
Can you please clarify and look into updating pmon-chassis-distributed-thermalctld.png workflow sequence in this PMON HLD?
a) Per the update in thermalctld section , thermalctrld (on LC) to update both local REDIS-DB (on LC) and global REDIS-DB (on CC/Supervisor). Its not REDIS-LC sensors update to CC's ThermalCTLd-CP.
b) Per the PMON HLD review meeting: thermalctld is for monitoring temperatures and managing fan ctrl algorithm. With thermalctld in place, temperature sensors would be displayed via show platform temperature.
As a result, show environment CLI would be deprecated.
In that case, show environment should be removed from this flow diagram.
c) Don't see mention of voltage and current sensor categories anywhere!
Which entity (service) would cater to them
d)Which DB and show CLI would cater to voltage and current sensors?
Recommend updating "update the DB on the control-card" to "update the global REDIS-DB on the control-card"