-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PMON enhancements for Chassis HLD #646
PMON enhancements for Chassis HLD #646
Conversation
Initial version for review
Comment-1) Comment-2) Comment-3) Comment-4) Comment-5) |
Comment-6) Section 2.2.3 (Chassis monitoring daemon) Comment-7) Section 2.2.4 Chassis Midplane Connectivity Comment-8) Does SONiC support or need/plan to support config shutdown of a card (CC/LC/FC)? Comment-9) SONiC planning to support periodic punching of HW watchdog? |
This commit includes restructure of the document to include requirements and also detailed workflow to describe chassis specific callflows, etc.
Some inline comments. Comment1: Comment 2: Comment 3: Comment 4: Comment 5: Comment 6: Comment-7) Both to be implemented. Comment-8) Thats right. SONiC will support operations per linecard such as shutdown, reboot wherever platform driver supports. Comment-9) SONiC planning to support periodic punching of HW watchdog? |
Addressing review comments from 8/19 chassis-subgroup meeting
Updated with 2 approaches to collect thermal info
This would be the maximum consumed power by the LC. So, would remain constant per LC-type. |
That is correct. Each LC is running a separate SONiC instance. 'show platform temperature' will cater to the local card information.
|
As concluded in our last HLD review, we will go with Option-2. We will introduce a "GLOBAL_STATE_DB" where all the LCs will push their information to. |
That is correct- pmon is mainly used for monitoring. Device-manager is a Nokia specific place holder for platform code. A vendor could have user-space or kernel drivers or a mix of both. If sysfs cannot be used, any IPC could be used to get/set the information. |
Unfortunately, this is out of scope of this HLD at present. We have listed them in "Future Items" section for tracking. |
'Yes' for all of the questions. |
It is stored in local redis DB in the CC as whatever existing sonic state-db schema. |
This will be provided by existing module_base class. There is a type to differentiate to say what type the module is. looks for get_type(). |
Introducing chassisd to monitor status of cards on a modular chassis HLD: sonic-net/SONiC#646 **-What I did** Introducing a new process to monitor status of control, line and fabric cards. **-How I did it** Support of monitoring of line-cards and fabric-cards. This runs in the main thread periodically. It updates the STATE_DB with the status information. 'show platform chassis-modules' will read from the STATE_DB Support of handling configuration of moving the cards to administratively up/down state. The handling happens as part of a separate thread that waits on select() for config event from a CHASSIS_MODULE table in CONFIG_DB.
PSUd changes to computer power-budget for Modular chassis HLD: sonic-net/SONiC#646 PSUd will introduce power requirements calculations. Platform APIs are introduced to provide consumers and total consumed power. Number of PSUs will help provide total supplied power **Output of STATE-DB:** ``` "CHASSIS_INFO|chassis_power_budget 1": { "expireat": 1603182970.639244, "ttl": -0.001, "type": "hash", "value": { "SUPERVISOR consumed_power": "80.0", "FABRIC-CARD consumed_power": "185.0", "FAN consumed_power": "999", "LINE-CARD consumed_power": "1000.0", "PSU supplied_power": "9000.0" } }, ```
Enhance thermalctld to write to chassis state-DB on a modular chassis HLD: sonic-net/SONiC#646 In a modular chassis, the thermal information from all line-cards will be updated to the chassis state-DB in the control-card. Additionally, minimum and maximum temperatures will be recorded. The fan control algorithm used by certain vendors will require this information.
sonic-platform-base: Changes to introduce APIs for modular chassis for power-consumption and supplied HLD: sonic-net/SONiC#646 PSUd APIs for power requirement calculations get_maximum_supplied_power() - per PSU get_status_master_led() - get master psu led status. Class method. set_status_master_led() - set master psu led status. Class method. get_maximum_consumed_power(self) - per consumer API. Consumers are modules, Fans
sonic-platform-base: Changes to introduce APIs for modular chassis for thermalctld HLD: sonic-net/SONiC#646 Introducing thermal APIs to get min and max temperatures of each sensors - get_minimum_recorded() - get_maximum_recorded()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am adding some minor style comments found while reading through it.
Noticed that in this morning community presentation the doc did not indicate which option was picked even though the agreement is option 2. Please update the doc to reflect this. |
Done. thanks. |
Updates after sonic community review
HLD: sonic-net/SONiC#646 Introducing chassisd process to monitor status of the control, line and fabric cards in a modular chassis. - Why I did it Modular Chassis has control-cards, line-cards and fabric-cards along with other peripherals. Chassisd will be a central entity that has visibility of the entire chassis. - How I did it Chassisd process will monitor cards in the main thread. Another configuation_handling_task is created to listen to CONFIG_DB for admin_status up/down events. The monitored status is persisted in REDIS-DB.
HLD: sonic-net/SONiC#646 In modular chassis, add CHASSIS_STATE_DB on control card Why I did it Modular Chassis has control-cards, line-cards and fabric-cards along with other peripherals. Control-Card CHASSIS_STATE_DB will be the central DB to maintain any state information of cards that is accessible to control-card/ How I did it Adding another DB on an existing REDIS instance running on port 6380.
Enhance chassisd to monitor midplane status of the cards in modular chassis HLD: sonic-net/SONiC#646 -What I did Add monitoring of the midplane or internal ethernet network between supervisor and line-card modules. -How I did it Along with status monitoring, also monitor the midplane reachability between supervisor and modules. It updates the STATE_DB with the status information. 'show chassis-modules midplane-status' will read from the STATE_DB
Enhance chassisd to monitor midplane status of the cards in modular chassis HLD: sonic-net/SONiC#646 -What I did Add monitoring of the midplane or internal ethernet network between supervisor and line-card modules. -How I did it Along with status monitoring, also monitor the midplane reachability between supervisor and modules. It updates the STATE_DB with the status information. 'show chassis-modules midplane-status' will read from the STATE_DB
@mprabhu-nokia, I see most of the comments resolved, Thanks. |
Fixed and referenced the preferred approach document in there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but some minor nits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM,
@jleveque Would you take a quick look as well ..thanks |
@Staphylo, @keboliu, @Junchao-Mellanox: Please review, as well. |
Chassis subgroup meeting 5/12: |
This is a design document proposal for Chassis support and PMON enhancements for Chassis from the Nokia-SONiC team