-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support external power source: BMC/IPMI/HMC #644
Comments
you mean add a new model that we can obtain power data through those devices and put into current kepler model |
The power consumption of a platform (i.e. server) can be reported by the BMC/IPMI/HMC. In our current implementation of Kepler, we collect the platform power consumption from the motherboard sensor (HMC), which is available in most modern servers. This sensor provides data on the power consumption of components directly attached to the motherboard, such as the CPU and memory. However, it may not include the power consumption of components like disks and GPUs. Access to the motherboard sensor is possible via the ACPI interface within the machine or through IPMI, which reads the motherboard sensor via the BMC. We currently use the ACPI interface in Kepler, but in cases where the ACPI interface is disabled and IPMI is enabled, IPMI could be used instead. It's important to note that the power consumption data obtained through IPMI may differ if the source is BMC or |
ok, make sense to me , appreciate the detailed info~ |
Please see the joint message from IPMI promoters here. |
I am ok with any direction
We do extrapolation (current power * elapsed time) and aggregate it in Kepler. |
Let's focus on Redfish first. |
Some questions,
|
That out-of-band architecture looks more secure than giving BMC access to each node
If we consider external power source as anything that powers the machine (BMC for BM or hypervisor level power source for VM), then this architecture could work for both BM and VM |
A prior study on BMC power calibration can be found here |
|
05/09 meeting:
Potential implementations:
|
Do you have any update on this issue? I've just compared power value from Kepler and Redfish. Even though the difference of them is not large, I think it's better to fill the gap if I can. Is anyone working on it? Brief report
|
Thanks @tiwatsuka, very interesting work. Which color is Kepler and Redfish? Blue and green, respectively? Is this the Kepler node power or sum of all containers? Can you share your prometheus query? The OTHER part is the total power from the motherboard sensor (using ACPI API) less the RAPL power. Given that you're running a CPU-intensive application, the "OTHER" part of the power consumption should ideally be minimal and relatively constant. A disk or network-intensive workload might potentially impact the "OTHER" power consumption if the power drawn by the disk and network components is being accounted for by the motherboard sensor. However, I haven't personally tested this scenario. |
thank you @tiwatsuka! This is a very cool study. Kepler (blue) appears to match with redfish (green) most of the time but when there are major transitions, there are some lags. This is likely due to the report interval differences between BMC and RAPL. On my setup (dell), the report interval is 1 min. # redfishtool -r xxxx -u xxxx -p xxxx raw GET /redfish/v1/Chassis/System.Embedded.1/Power/PowerControl
{
"@odata.context": "/redfish/v1/$metadata#Power.Power",
"@odata.id": "/redfish/v1/Chassis/System.Embedded.1/Power#/PowerControl/0",
"@odata.type": "#Power.v1_6_1.PowerControl",
"MemberId": "0",
"Name": "System Power Control",
"PowerAllocatedWatts": 1536,
"PowerAvailableWatts": 0,
"PowerCapacityWatts": 1536,
"PowerConsumedWatts": 389,
"PowerLimit": {
"CorrectionInMs": 0,
"LimitException": "HardPowerOff",
"LimitInWatts": 485
},
"PowerMetrics": {
"AverageConsumedWatts": 389,
"IntervalInMin": 1,
"MaxConsumedWatts": 415,
"MinConsumedWatts": 386
},
"PowerRequestedWatts": 1097,
"RelatedItem": [
{
"@odata.id": "/redfish/v1/Chassis/System.Embedded.1"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1"
}
],
"[email protected]": 2
} I have explored different ways of support redfish, including the open API approach and gofish. But both appear to be overkill for our use case. I am going to just support the Power API in kepler. |
@marceloamaral Here is the query. I simply copied from the dashboard of Kepler.
AFAIK, power from BMC is AC power consumption and one from RAPL is DC power consumption. When DC power required by CPU increase, the loss of AC-DC conversion also increase. If it is true and Kepler considers this, the lost should be included in "OTHER" part, I guess. @rootfs
In my observation, power from BMC usually lag several soconds (even when I use ipmi-tool). However I didn't verify it on so many hardware neither find specification about it. The lag might lead wrong estimation when the load on a node changes frequently. |
@tiwatsuka thanks for the info. We are working on the BMC support, it is still early but would you help review and test on your environment? I don't have any HPE servers yet. |
BMC support is finished. |
Current Kepler Architecture
Out of band external power source support
The text was updated successfully, but these errors were encountered: