-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support redfish #734
Support redfish #734
Conversation
1a22c96
to
7b004e2
Compare
02cb879
to
2bf807a
Compare
dc7d486
to
be078c5
Compare
pkg/collector/metric/node_metric.go
Outdated
@@ -61,6 +62,7 @@ type NodeMetrics struct { | |||
TotalEnergyInGPU *types.UInt64StatCollection | |||
TotalEnergyInOther *types.UInt64StatCollection | |||
TotalEnergyInPlatform *types.UInt64StatCollection | |||
TotalEnergyInRedfish *types.UInt64StatCollection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking, if redfish, is a kind of platform energy api, then we may don't need a new var as TotalEnergyInRedfish
but reuse TotalEnergyInPlatform
?
as redfish api is a type of Platform API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
Also, if redfish is enabled, we don't need to collect the platform power from other sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, working on the refactoring
pkg/nodecred/csv_cred.go
Outdated
return true | ||
} | ||
|
||
func getNodeName() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this function?
kepler/pkg/collector/metric/utils.go
Line 101 in f0e6459
func getNodeName() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use redfish to update the Platform Energy.
So, we should verify in the function updatePlatformEnergy
if redfish is enabled, otherwise we use the ACPI or LPAR power sources.
The PR #748 refactor the code and make it easier to include the redfish support. |
pkg/power/redfish/util.go
Outdated
|
||
func getRedfishPower(access RedfishAccessInfo, system string) (*RedfishPowerModel, error) { | ||
var power RedfishPowerModel | ||
err := getRedfishModel(access, "/redfish/v1/Chassis/"+system+"/Power/PowerControl", &power) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/Power/PowerControl
is not specified as a resource in Redfish Resource and Schema Guide and HPE iLo Redfish API does not provide it.
cf. https://hewlettpackard.github.io/ilo-rest-api-docs/ilo5/#resource-map
Instead, all of /Power/#PowerControl
, /Power#PowerControl
and /Power#/PowerControl
are available on my machine. You can explore HPE environment at https://ilorestfulapiexplorer.ext.hpe.com/ .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @tiwatsuka for the info! I am going to update the API model to work with your suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tiwatsuka can you get the output of Power#/PowerControl
on your HPE? The simulator doesn't have it, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't access on my machine right now, but it was just same as output of Power
. On second thoughts, it is just interpreted as a URL fragment that the part after #
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{
"@odata.context": "/redfish/v1/$metadata#Power.Power",
"@odata.etag": "W/\"2613CD55\"",
"@odata.id": "/redfish/v1/Chassis/1/Power",
"@odata.type": "#Power.v1_3_0.Power",
"Id": "Power",
"Name": "PowerMetrics",
"Oem": {
"Hpe": {
"@odata.context": "/redfish/v1/$metadata#HpePowerMetricsExt.HpePowerMetricsExt",
"@odata.type": "#HpePowerMetricsExt.v2_3_0.HpePowerMetricsExt",
"BrownoutRecoveryEnabled": true,
"HasCpuPowerMetering": false,
"HasDimmPowerMetering": false,
"HasGpuPowerMetering": false,
"HasPowerMetering": true,
"HighEfficiencyMode": "Balanced",
"Links": {
"PowerMeter": {
"@odata.id": "/redfish/v1/Chassis/1/Power/PowerMeter"
},
"FastPowerMeter": {
"@odata.id": "/redfish/v1/Chassis/1/Power/FastPowerMeter"
},
"SlowPowerMeter": {
"@odata.id": "/redfish/v1/Chassis/1/Power/SlowPowerMeter"
},
"FederatedGroupCapping": {
"@odata.id": "/redfish/v1/Chassis/1/Power/FederatedGroupCapping"
}
},
"MinimumSafelyAchievableCap": null,
"MinimumSafelyAchievableCapValid": false,
"SNMPPowerThresholdAlert": {
"DurationInMin": 0,
"ThresholdWatts": 0,
"Trigger": "Disabled"
}
}
},
"PowerControl": [
{
"@odata.id": "/redfish/v1/Chassis/1/Power#PowerControl/0",
"MemberId": "0",
"PowerCapacityWatts": 500,
"PowerConsumedWatts": 44,
"PowerMetrics": {
"AverageConsumedWatts": 40,
"IntervalInMin": 20,
"MaxConsumedWatts": 78,
"MinConsumedWatts": 38
}
}
],
"PowerSupplies": [
{
"@odata.id": "/redfish/v1/Chassis/1/Power#PowerSupplies/0",
"FirmwareVersion": "1.00",
"LastPowerOutputWatts": 44,
"LineInputVoltage": 105,
"LineInputVoltageType": "ACLowLine",
"Manufacturer": "LTEON",
"MemberId": "0",
"Model": "865408-B21",
"Name": "HpeServerPowerSupply",
"Oem": {
"Hpe": {
"@odata.context": "/redfish/v1/$metadata#HpeServerPowerSupply.HpeServerPowerSupply",
"@odata.type": "#HpeServerPowerSupply.v2_0_0.HpeServerPowerSupply",
"AveragePowerOutputWatts": 44,
"BayNumber": 1,
"HotplugCapable": true,
"MaxPowerOutputWatts": 58,
"Mismatched": false,
"PowerSupplyStatus": {
"State": "Ok"
},
"iPDUCapable": false
}
},
"PowerCapacityWatts": 500,
"PowerSupplyType": "AC",
"SerialNumber": "5WBXK0FLLG25CW",
"SparePartNumber": "866729-001",
"Status": {
"Health": "OK",
"State": "Enabled"
}
},
{
"@odata.id": "/redfish/v1/Chassis/1/Power#PowerSupplies/1",
"MemberId": "1",
"Oem": {
"Hpe": {
"@odata.context": "/redfish/v1/$metadata#HpeServerPowerSupply.HpeServerPowerSupply",
"@odata.type": "#HpeServerPowerSupply.v2_0_0.HpeServerPowerSupply",
"BayNumber": 2
}
},
"Status": {
"Health": "Warning",
"State": "Absent"
}
}
]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @tiwatsuka this looks consistent with the new model definition here
I'll finish it next week and hopefully get into the next release in mid July.
pkg/power/redfish/redfish.go
Outdated
// calculate the elapsed time since the last power query in seconds | ||
elapsed := now.Sub(system.timestamp).Seconds() | ||
klog.V(5).Infof("power info: %+v\n", system) | ||
power[system.system] = float64(system.consumedWatts * 1000 * int(elapsed)) // convert to mW |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line seems to calculate value in Joule (consumedWatts
x elapsed
), so is it better to name to GetEnergyFromRedfish
rather than GetPower
in func name? (like platform/source/acpi.go)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense, @rootfs can you rename this function in the Class level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never-mind, the Class already defines the right function name: GetEnergyFromPlatform()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed function name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rootfs can you change the logic to get the redfish power as the Platform power?
You can follow the logic structure in hmc and the logic in acpi.
Also when we are using the redfish to collect the platform power (i.e. the total node power), we don't need to use the ACPI or HMC. So the platform package should verify if redfish is available and use this instead of ACPI or HMC.
This should be a similar logic as we collect the component energy that can come from different power sources.
pkg/power/redfish/redfish.go
Outdated
} | ||
|
||
// set a timer to check the power info every intervalInMin minutes | ||
if intervalInMin > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IntervalInMin
in Redfish is irrelevant to the interval of updating the value of PowerConsumedWatts
.
I believe that PowerConsumedWatts
is updated every one or few seconds. So it would rather be collected more frequently than using IntervalInMin
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
pkg/power/redfish/redfish.go
Outdated
rf.mutex.Lock() | ||
now := time.Now() | ||
// calculate the elapsed time since the last power query in seconds | ||
elapsed := now.Sub(system.timestamp).Seconds() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This timestamp
is based on the ticker defined at IsPowerSupported()
, but GetPower()
is called periodically based on another ticker. Is the energy value calculated correctly? I suspect that some energy is added multiple times or lost due to asynchronous two tickers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tiwatsuka you are right, the GetPower() happens asynchronously. Since the HTTP request could take time, it is not a good idea to put it in the critical path. So I let the redfish call happen in the background and Kepler collector just reads the existing reading without having to issue HTTP calls. The accuracy and synchronization can be further improved when using a smaller probe interval window or post processing calibration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tiwatsuka thanks for the review. I have pushed the code. Will test it and ask for more reviews once I get a new system to test it again.
@marceloamaral redfish and ACPI are now in the same platform pkg. If redfish is available, then platform uses redfish reading, otherwise use ACPI. |
Signed-off-by: Huamin Chen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some suggestions to get energy correctly.
|
||
func getRedfishPower(access RedfishAccessInfo, system string) (*RedfishPowerModel, error) { | ||
var power RedfishPowerModel | ||
err := getRedfishModel(access, "/redfish/v1/Chassis/"+system+"/Power%23/PowerControl", &power) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is URL encoding required? It's not work in my environment and using '#' is working fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unit test needs %23
but I haven't tested my setup yet.
host := access.Host | ||
|
||
// Create a HTTP client and set up the basic authentication header | ||
client := &http.Client{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't it required to set TLSClientConfig
to &tls.Config{InsecureSkipVerify: true}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My old Dell setup uses http, but I haven't tried https yet. Please add it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to set InsecureSkipVerify
as an option (e.g. environment variable) because it is important to secure access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tiwatsuka @YaSuenag sounds good. can you suggest a change and I'll merge it here. thanks
Co-authored-by: Takuya Iwatsuka <[email protected]> Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: Takuya Iwatsuka <[email protected]> Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: Takuya Iwatsuka <[email protected]> Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: Takuya Iwatsuka <[email protected]> Signed-off-by: Huamin Chen <[email protected]>
@tiwatsuka @YaSuenag thank you for reviewing. My setup will be ready later this week and run the tests on it. Will keep you updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add config to skip SSL verification.
@YaSuenag Does this satisfy your request?
Co-authored-by: Takuya Iwatsuka <[email protected]> Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: Takuya Iwatsuka <[email protected]> Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: Takuya Iwatsuka <[email protected]> Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: Takuya Iwatsuka <[email protected]> Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: Takuya Iwatsuka <[email protected]> Signed-off-by: Huamin Chen <[email protected]>
Thanks @tiwatsuka to create the change, and @rootfs for merging! I completely agree with your change. BTW new parameters / environment variables will be documented? Will it happen on https://github.com/sustainable-computing-io/kepler-doc ? |
@YaSuenag yes, kepler-doc will have the redfish document. If you or your team can come up with more tests and instructions, please share with the community over there. Thanks! |
Signed-off-by: Huamin Chen <[email protected]>
if decErr != nil { | ||
if strings.HasPrefix(decErr.Error(), "json: unknown field ") { | ||
// ignore unknown field error | ||
fieldName := strings.TrimPrefix(decErr.Error(), "json: unknown field ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dell returns some fields that are not defined in the power model, ignore them here
@marceloamaral redfish is now platform energy. PTAL Here is the output: |
Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
@YaSuenag @tiwatsuka the kepler-doc update will be at this PR sustainable-computing-io/kepler-doc#74 |
redfish is now part of the platform energy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Get power info from BMCs that support Redfish. Kepler only needs to access chassis and system to get the power info. Currently the existing redfish golang bindings based on OpenAPI client or gofish are overkill. This PR uses simple access models.
#644
Tasks: