Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 Log when and why a machine is marked for remediation #3385

Merged
merged 1 commit into from
Jul 23, 2020

Conversation

benmoss
Copy link

@benmoss benmoss commented Jul 23, 2020

What this PR does / why we need it:
Right now MHC logs almost nothing on the -v=0 log level. This adds a new message when a machine fails a health check and we are marking it for remediation. We store information about why the health check fails on the condition, so we can use that to surface the failure now.

Also removes the "patching machine" log line from healthy machines, this is almost always useless in my experience.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 23, 2020
@k8s-ci-robot k8s-ci-robot requested review from detiber and ncdc July 23, 2020 13:25
Copy link
Member

@detiber detiber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Tangential question about whether we should add helper methods to the Condition type to help with logging conditions, definitely not a blocker.

Comment on lines +253 to +254
condition := conditions.Get(t.Machine, clusterv1.MachineHealthCheckSuccededCondition)
logger.Info("Target has failed health check, marking for remediation", "target", t.string(), "reason", condition.Reason, "message", condition.Message)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly related to this PR, but I'm wondering if it would be good to add helper methods to the Condition type to "stringify" a condition for purposes such as this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I looked for this and was surprised we didn't already have one. String() on the condition and maybe a conditions.String(c) method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is along the lines of what I was thinking, but not sure if we can completely generalize it or should have a few different helpers for different purposes (failure vs success, brief vs verbose, etc). It might have the potential to be a bit of a rabbit hole, so likely best handled as a separate issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @benmoss. Is there an example msg when this happened?

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 23, 2020
@ncdc
Copy link
Contributor

ncdc commented Jul 23, 2020

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ncdc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 23, 2020
@k8s-ci-robot k8s-ci-robot merged commit 23aad1f into kubernetes-sigs:master Jul 23, 2020
@@ -250,7 +250,8 @@ func (r *MachineHealthCheckReconciler) reconcile(ctx context.Context, logger log
// mark for remediation
errList := []error{}
for _, t := range unhealthy {
logger.V(3).Info("Target meets unhealthy criteria, triggers remediation", "target", t.string())
condition := conditions.Get(t.Machine, clusterv1.MachineHealthCheckSuccededCondition)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this guaranteed to be non-nil? I'd assume so, just double checking

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything in unhealthy should be getting its condition set in needsRemediation()

@vincepri
Copy link
Member

/milestone v0.3.8

@k8s-ci-robot k8s-ci-robot added this to the v0.3.8 milestone Jul 23, 2020
@benmoss benmoss deleted the mhc-logging branch July 23, 2020 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants