Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support external power source: BMC/IPMI/HMC #644

Closed
2 tasks done
rootfs opened this issue Apr 24, 2023 · 17 comments
Closed
2 tasks done

Support external power source: BMC/IPMI/HMC #644

rootfs opened this issue Apr 24, 2023 · 17 comments
Labels
kind/feature New feature or request

Comments

@rootfs
Copy link
Contributor

rootfs commented Apr 24, 2023

Current Kepler Architecture

  • HMC
  • BMC
    image

Out of band external power source support

image

@rootfs rootfs converted this from a draft issue Apr 24, 2023
@jichenjc
Copy link
Collaborator

you mean add a new model that we can obtain power data through those devices and put into current kepler model
so like previously we only consider machine itself but now we need consider related device energy ?

@marceloamaral
Copy link
Collaborator

marceloamaral commented Apr 25, 2023

The power consumption of a platform (i.e. server) can be reported by the BMC/IPMI/HMC.

In our current implementation of Kepler, we collect the platform power consumption from the motherboard sensor (HMC), which is available in most modern servers. This sensor provides data on the power consumption of components directly attached to the motherboard, such as the CPU and memory. However, it may not include the power consumption of components like disks and GPUs.

Access to the motherboard sensor is possible via the ACPI interface within the machine or through IPMI, which reads the motherboard sensor via the BMC. We currently use the ACPI interface in Kepler, but in cases where the ACPI interface is disabled and IPMI is enabled, IPMI could be used instead.

It's important to note that the power consumption data obtained through IPMI may differ if the source is BMC or out-of-band management systems that can consolidate the power consumption of different components, including the platform, disk, and GPU.

@jichenjc
Copy link
Collaborator

ok, make sense to me , appreciate the detailed info~

@jiere
Copy link
Collaborator

jiere commented May 9, 2023

Please see the joint message from IPMI promoters here.
Even IPMI v2.0 is a 10+ years-old spec, there are various of open-source projects related to IPMI metrics exporter.
Shall we directly support BMC-Redfish integration for OOB power monitoring?
Another question is about the metrics usage, since BMC data is some kinds of runtime transient power, not aggregate, how could it be used in Kepler then?

@marceloamaral
Copy link
Collaborator

IPMI or Redfish

I am ok with any direction

BMC data is some kinds of runtime transient power, not aggregate, how could it be used in Kepler

We do extrapolation (current power * elapsed time) and aggregate it in Kepler.

@rootfs
Copy link
Contributor Author

rootfs commented May 9, 2023

Let's focus on Redfish first.

@eklee15
Copy link

eklee15 commented May 9, 2023

Some questions,

  1. Are the users ok with giving BMC access to Kepler? (out-of-band)?
  2. Are we only assuming the BM Kepler use case?

@rootfs
Copy link
Contributor Author

rootfs commented May 9, 2023

Some questions,

  1. Are the users ok with giving BMC access to Kepler? (out-of-band)?

That out-of-band architecture looks more secure than giving BMC access to each node

  1. Are we only assuming the BM Kepler use case?

If we consider external power source as anything that powers the machine (BMC for BM or hypervisor level power source for VM), then this architecture could work for both BM and VM

@rootfs
Copy link
Contributor Author

rootfs commented May 9, 2023

A prior study on BMC power calibration can be found here

@eklee15
Copy link

eklee15 commented May 9, 2023

  1. I'm a bit confused about the BMC access. AFAIK, out-of-band access would need to give Kepler access to the BMC. Would you please clarify which architecture you are referring to? Perhaps, we can deep dive into this during the community meeting.
  2. Yes, if we use out-of-band measurements, we can have both node and VM/BM power measurements through Kepler, but there is a chance that would double-count the idle power. VM-BM mapping should be carefully tracked so as not to double-count the idle power.

@rootfs
Copy link
Contributor Author

rootfs commented May 10, 2023

05/09 meeting:

  • whether sidecar can access the external source, reads the stats, and calculate the power.
  • out-of-band has a sync issue (ns vs ms vs s depending on the BMC config or access delays).
  • Daemonset level BMC access has a security and overhead issues.
  • Current HMC implementation: HMC provides endpoint for access, so it can provide info to kepler and calculate the energy inside kepler.

Potential implementations:

  • Use Redfish BMC exporter (BMC models vary in terms of access overhead).
  • Investigate direct exporter access overhead and prometheus access scalability access Redfish first
  • Correlation between application usage and BMC metrics. Need to know what to report (accelerators/network/storage). Baseline method: based on ground truth (i.e. HW specs, but needs visibility of HW components including fan/board/CPU/GPU/DRAM).

@tiwatsuka
Copy link
Contributor

Do you have any update on this issue?

I've just compared power value from Kepler and Redfish. Even though the difference of them is not large, I think it's better to fill the gap if I can. Is anyone working on it?

Brief report

  • Environment

    • Node
      • HPE ProLiant ML30 Gen10 Plus
        • iLO Restful API for HPE iLO 5 is enabled
    • kepler
      • v0.5
      • minimum composition (no model server & no estimator)
  • Load

    • incrementally add load to CPU
      • by stress-ng -c <n> (n=1..4)
  • Value of power

    • Redfish
      • value of PowerCoonsumedWatts
    • Kepler
      • sum of all power values ("PKG", "DRAM" and "OTHER")
  • Findings

    • there was a time gap of around 15 seconds between Refdish and Kepler
      • Redfish was usually delayed
    • power values were different around 3 watt on average
      • the value from kepler was higher than the one from Redfish when CPU load was low
        • this was contrary when cpu load was high
      • "OTHER" part of the value of Kepler had not been changed by the load, but actually it should had been, I guess
  • Screenshot of graph

    • please ignore yellow line
      Image

@marceloamaral
Copy link
Collaborator

Thanks @tiwatsuka, very interesting work.

Which color is Kepler and Redfish? Blue and green, respectively?

Is this the Kepler node power or sum of all containers? Can you share your prometheus query?
Since the Prometheus query takes the average of a time window we can expect some variations.

The OTHER part is the total power from the motherboard sensor (using ACPI API) less the RAPL power. Given that you're running a CPU-intensive application, the "OTHER" part of the power consumption should ideally be minimal and relatively constant. A disk or network-intensive workload might potentially impact the "OTHER" power consumption if the power drawn by the disk and network components is being accounted for by the motherboard sensor. However, I haven't personally tested this scenario.

@rootfs
Copy link
Contributor Author

rootfs commented Jun 13, 2023

thank you @tiwatsuka! This is a very cool study. Kepler (blue) appears to match with redfish (green) most of the time but when there are major transitions, there are some lags. This is likely due to the report interval differences between BMC and RAPL. On my setup (dell), the report interval is 1 min.

# redfishtool -r xxxx -u xxxx -p xxxx raw GET /redfish/v1/Chassis/System.Embedded.1/Power/PowerControl
{
    "@odata.context": "/redfish/v1/$metadata#Power.Power",
    "@odata.id": "/redfish/v1/Chassis/System.Embedded.1/Power#/PowerControl/0",
    "@odata.type": "#Power.v1_6_1.PowerControl",
    "MemberId": "0",
    "Name": "System Power Control",
    "PowerAllocatedWatts": 1536,
    "PowerAvailableWatts": 0,
    "PowerCapacityWatts": 1536,
    "PowerConsumedWatts": 389,
    "PowerLimit": {
        "CorrectionInMs": 0,
        "LimitException": "HardPowerOff",
        "LimitInWatts": 485
    },
    "PowerMetrics": {
        "AverageConsumedWatts": 389,
        "IntervalInMin": 1,
        "MaxConsumedWatts": 415,
        "MinConsumedWatts": 386
    },
    "PowerRequestedWatts": 1097,
    "RelatedItem": [
        {
            "@odata.id": "/redfish/v1/Chassis/System.Embedded.1"
        },
        {
            "@odata.id": "/redfish/v1/Systems/System.Embedded.1"
        }
    ],
    "[email protected]": 2
}

I have explored different ways of support redfish, including the open API approach and gofish. But both appear to be overkill for our use case. I am going to just support the Power API in kepler.

@rootfs rootfs mentioned this issue Jun 13, 2023
8 tasks
@tiwatsuka
Copy link
Contributor

@marceloamaral
The blue line is Kepler and the green is Redfish.

Here is the query. I simply copied from the dashboard of Kepler.

sum(irate(kepler_container_package_joules_total{container_namespace=~\"$namespace\"}[1m])) +
sum(irate(kepler_container_dram_joules_total{container_namespace=~\"$namespace\"}[1m])) +
sum(irate(kepler_container_other_host_components_joules_total{container_namespace=~\"$namespace\"}[1m]))

AFAIK, power from BMC is AC power consumption and one from RAPL is DC power consumption. When DC power required by CPU increase, the loss of AC-DC conversion also increase. If it is true and Kepler considers this, the lost should be included in "OTHER" part, I guess.

@rootfs
The interval is 20 on my setting. I think this affects only Average, Max and Min consumed watts. "PowerConsumedWatts" can be different from "AverageConsumedWatts".

    "PowerControl": [
        {
            "@odata.id": "/redfish/v1/Chassis/1/Power#PowerControl/0",
            "MemberId": "0",
            "PowerCapacityWatts": 500,
            "PowerConsumedWatts": 74,
            "PowerMetrics": {
                "AverageConsumedWatts": 39,
                "IntervalInMin": 20,
                "MaxConsumedWatts": 81,
                "MinConsumedWatts": 37
            }
        }
    ],

In my observation, power from BMC usually lag several soconds (even when I use ipmi-tool). However I didn't verify it on so many hardware neither find specification about it. The lag might lead wrong estimation when the load on a node changes frequently.

@rootfs
Copy link
Contributor Author

rootfs commented Jun 14, 2023

@tiwatsuka thanks for the info. We are working on the BMC support, it is still early but would you help review and test on your environment? I don't have any HPE servers yet.

#734

@sunya-ch sunya-ch added the kind/feature New feature or request label Jun 22, 2023
@rootfs rootfs added this to the kepler-release-0.6 milestone Jun 22, 2023
@rootfs
Copy link
Contributor Author

rootfs commented Jul 26, 2023

BMC support is finished.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
Status: Done
Development

No branches or pull requests

7 participants