Improve logs according to new guidelines #6994

fabriziopandini · 2022-07-29T13:48:08Z

We recently merged our logging guidelines and a first PR ensuring all the logs from a reconcile have consistent keys.

Now it is possible make a pass on our controller and improve existing log messages to take benefit of key-value pairs, and a good way to do so is to focus on a simple workflow supported by CAPI, and make sure logs are representing what happens and all the dependencies across objects.

e.g I created a management cluster with logging enabled using Tilt, I created a cluster, and take a look at the logs documenting a machine being provisioned by using the following query: {app="capi-controller-manager",controller="machine"} | json | machine_name="classy1-23696-sxcsn-2jkvs"

msg="Bootstrap provider is not ready, requeuing" v=0
msg="Infrastructure provider is not ready, requeuing" v=0
msg="Cannot reconcile Machine's Node, no valid ProviderID yet" v=0
...
msg="Set Machine's NodeRef" v=0

They are ok, but we can do better, by making more explicit what we are waiting for and adding some more details when provisioning completes. So I created a small PR that gives us the following output:

msg="Waiting for bootstrap provider to generate data secret and report status.ready, requeing" v=0 (with a key value pair for the bootstrap object)
msg="Waiting for infrastructure provider to create machine infrastructure and report status.ready, requeing" v=0 (with a key value pair for the infrastructure object)
msg="Waiting for infrastructure provider to report spec.ProviderID, requeing" v=0 (with a key value pair for the infrastructure object)
...
msg="Bootstrap provider generated data secret and reports status.ready" v=0 (with a key value pair for the bootstrap object and one for the secret)
...
msg="Infrastructure provider completed machine infrastructure provisioning and reports status.ready" v=0 (with a key value pair for the infrastructure object)
msg="Infrastructure provider reporting spec.ProviderID, Kubernetes node is now available" v=0 (with a key value pair for the infrastructure object, one for providerID and one for the node)

And the idea behind this issue is to rally the community for creating similar PRs, each one doing small, incremental improvements for one of the Cluster API workflows:

Also worth to notice this is a great opportunity for people willing to dig in into CAPI and learn how things work

/help wanted
/kind cleanup

The text was updated successfully, but these errors were encountered:

fabriziopandini · 2022-07-29T14:27:01Z

/help-wanted

furkatgofurov7 · 2022-09-05T15:32:00Z

@fabriziopandini hi, I am happy to take up a few items from the list. For the start, I can go with MD creating/deleting MS, if no one is working on it.

sbueringer · 2022-09-05T15:37:14Z

@furkatgofurov7 Sounds great! Just go ahead, first come first serve. I'll reserve the sub-task for you above

sbueringer · 2022-09-06T12:10:05Z

@fabriziopandini Can we add another sub-task for the structuredmerge package?

I think it would be good if we had more logs there to make it easier to debug.
In my specific case I wanted to know what diff was detected by SSA, i.e.. why were there changes, but no spec changes.

xref: https://kubernetes.slack.com/archives/C8TSNPY4T/p1662462731560389

valaparthvi · 2022-09-12T17:59:43Z

I'd like to pick up Cluster deletion! @sbueringer, can you please reserve that for me?

sbueringer · 2022-09-13T09:22:36Z

Sure, done!

fabriziopandini · 2022-11-02T14:11:02Z

this is long tail of activity, not blocking for the milestone

k8s-triage-robot · 2023-02-08T11:30:42Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

sbueringer · 2023-02-08T12:10:28Z

/remove-lifecycle stale

fabriziopandini · 2023-11-17T14:09:18Z

/unassign

fabriziopandini · 2024-04-12T14:27:05Z

/close
we are iteratively doing this

k8s-ci-robot · 2024-04-12T14:27:10Z

@fabriziopandini: Closing this issue.

In response to this:

/close
we are iteratively doing this

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Jul 29, 2022

fabriziopandini added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022

fabriziopandini self-assigned this Jul 29, 2022

fabriziopandini modified the milestone: v1.3 Jul 29, 2022

fabriziopandini added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Aug 1, 2022

fabriziopandini mentioned this issue Aug 5, 2022

🌱 Improve logging for the MachineSet scale up/down workflow #7026

Merged

killianmuldoon mentioned this issue Aug 5, 2022

🌱 Remove logger from ControlPlane internal #7028

Merged

furkatgofurov7 mentioned this issue Sep 5, 2022

🌱 [WIP] Improve logging for MachineDeployment scale up&down workflow #7168

Closed

fabriziopandini removed this from the v1.3 milestone Nov 2, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2023

killianmuldoon mentioned this issue Aug 17, 2023

🌱 Fix patch errors not being logged #9224

Merged

k8s-ci-robot unassigned fabriziopandini Nov 17, 2023

k8s-ci-robot closed this as completed Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve logs according to new guidelines #6994

Improve logs according to new guidelines #6994

fabriziopandini commented Jul 29, 2022 •

edited by sbueringer

Loading

fabriziopandini commented Jul 29, 2022

furkatgofurov7 commented Sep 5, 2022

sbueringer commented Sep 5, 2022

sbueringer commented Sep 6, 2022

valaparthvi commented Sep 12, 2022

sbueringer commented Sep 13, 2022

fabriziopandini commented Nov 2, 2022

k8s-triage-robot commented Feb 8, 2023

sbueringer commented Feb 8, 2023

fabriziopandini commented Nov 17, 2023

fabriziopandini commented Apr 12, 2024

k8s-ci-robot commented Apr 12, 2024

Improve logs according to new guidelines #6994

Improve logs according to new guidelines #6994

Comments

fabriziopandini commented Jul 29, 2022 • edited by sbueringer Loading

fabriziopandini commented Jul 29, 2022

furkatgofurov7 commented Sep 5, 2022

sbueringer commented Sep 5, 2022

sbueringer commented Sep 6, 2022

valaparthvi commented Sep 12, 2022

sbueringer commented Sep 13, 2022

fabriziopandini commented Nov 2, 2022

k8s-triage-robot commented Feb 8, 2023

sbueringer commented Feb 8, 2023

fabriziopandini commented Nov 17, 2023

fabriziopandini commented Apr 12, 2024

k8s-ci-robot commented Apr 12, 2024

fabriziopandini commented Jul 29, 2022 •

edited by sbueringer

Loading