Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Set unhealthyLimitKey for logging always #5110

Merged
merged 1 commit into from
Aug 26, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 32 additions & 19 deletions controllers/machinehealthcheck_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,11 @@ const (
// EventRemediationRestricted is emitted in case when machine remediation
// is restricted by remediation circuit shorting logic.
EventRemediationRestricted string = "RemediationRestricted"

maxUnhealthyKeyLog = "max unhealthy"
unhealthyTargetsKeyLog = "unhealthy targets"
unhealthyRangeKeyLog = "unhealthy range"
totalTargetKeyLog = "total target"
)

// +kubebuilder:rbac:groups=core,resources=events,verbs=get;list;watch;create;patch
Expand Down Expand Up @@ -219,8 +224,6 @@ func (r *MachineHealthCheckReconciler) reconcile(ctx context.Context, logger log
healthy, unhealthy, nextCheckTimes := r.healthCheckTargets(targets, logger, *nodeStartupTimeout)
m.Status.CurrentHealthy = int32(len(healthy))

var unhealthyLimitKey, unhealthyLimitValue interface{}

// check MHC current health against MaxUnhealthy
remediationAllowed, remediationCount, err := isAllowedRemediation(m)
if err != nil {
Expand All @@ -231,28 +234,29 @@ func (r *MachineHealthCheckReconciler) reconcile(ctx context.Context, logger log
var message string

if m.Spec.UnhealthyRange == nil {
unhealthyLimitKey = "max unhealthy"
unhealthyLimitValue = m.Spec.MaxUnhealthy
logger.V(3).Info(
"Short-circuiting remediation",
totalTargetKeyLog, totalTargets,
maxUnhealthyKeyLog, m.Spec.MaxUnhealthy,
unhealthyTargetsKeyLog, len(unhealthy),
)
message = fmt.Sprintf("Remediation is not allowed, the number of not started or unhealthy machines exceeds maxUnhealthy (total: %v, unhealthy: %v, maxUnhealthy: %v)",
totalTargets,
len(unhealthy),
m.Spec.MaxUnhealthy)
} else {
unhealthyLimitKey = "unhealthy range"
unhealthyLimitValue = *m.Spec.UnhealthyRange
logger.V(3).Info(
"Short-circuiting remediation",
totalTargetKeyLog, totalTargets,
unhealthyRangeKeyLog, *m.Spec.UnhealthyRange,
unhealthyTargetsKeyLog, len(unhealthy),
)
message = fmt.Sprintf("Remediation is not allowed, the number of not started or unhealthy machines does not fall within the range (total: %v, unhealthy: %v, unhealthyRange: %v)",
totalTargets,
len(unhealthy),
*m.Spec.UnhealthyRange)
}

logger.V(3).Info(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we could keep a single log call here, as well as below without needing to add an if/else block. The values can just be set in variables as they were, then only have one call out to the logger. The only thing missing from the existing code was to make sure the key was set to something if remediationAllowed.

Copy link
Member Author

@enxebre enxebre Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for feedback @stmcginnis. Is this suggesting just the opposite to #5110 (comment)? @vincepri

Seems @JoelSpeed finds it more readable with the if/else #5110 (review)

I don't really have any strong opinion as far as we stop the controller from panicking.

@stmcginnis @JoelSpeed @vincepri please let me know if you want to merge it as it is or make any specific change. Feedback seems to go in both directions atm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a really strong opinion on this, but having seen both, I think I prefer the way it is now. While it's more code, it's also more explicit about what's happening and I personally find it easier to follow this way.

Also, thinking about how the logging works, with this approach the log line will include the actual line numbers of the log calls and might make it easier to trace the logging and which if-statements were entered, as opposed with today where you have to work out the values that were substituted and then work out where they were set

"Short-circuiting remediation",
"total target", totalTargets,
unhealthyLimitKey, unhealthyLimitValue,
"unhealthy targets", len(unhealthy),
)

// Remediation not allowed, the number of not started or unhealthy machines either exceeds maxUnhealthy (or) not within unhealthyRange
m.Status.RemediationsAllowed = 0
conditions.Set(m, &clusterv1.Condition{
Expand Down Expand Up @@ -282,12 +286,21 @@ func (r *MachineHealthCheckReconciler) reconcile(ctx context.Context, logger log
return reconcile.Result{Requeue: true}, nil
}

logger.V(3).Info(
"Remediations are allowed",
"total target", totalTargets,
unhealthyLimitKey, unhealthyLimitValue,
"unhealthy targets", len(unhealthy),
)
if m.Spec.UnhealthyRange == nil {
logger.V(3).Info(
"Remediations are allowed",
totalTargetKeyLog, totalTargets,
maxUnhealthyKeyLog, m.Spec.MaxUnhealthy,
unhealthyTargetsKeyLog, len(unhealthy),
)
} else {
logger.V(3).Info(
"Remediations are allowed",
totalTargetKeyLog, totalTargets,
unhealthyRangeKeyLog, *m.Spec.UnhealthyRange,
unhealthyTargetsKeyLog, len(unhealthy),
)
}

// Remediation is allowed so unhealthyMachineCount is within unhealthyRange (or) maxUnhealthy - unhealthyMachineCount >= 0
m.Status.RemediationsAllowed = remediationCount
Expand Down