Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve logs according to new guidelines #6994

Closed
12 tasks
fabriziopandini opened this issue Jul 29, 2022 · 12 comments
Closed
12 tasks

Improve logs according to new guidelines #6994

fabriziopandini opened this issue Jul 29, 2022 · 12 comments
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@fabriziopandini
Copy link
Member

fabriziopandini commented Jul 29, 2022

We recently merged our logging guidelines and a first PR ensuring all the logs from a reconcile have consistent keys.

Now it is possible make a pass on our controller and improve existing log messages to take benefit of key-value pairs, and a good way to do so is to focus on a simple workflow supported by CAPI, and make sure logs are representing what happens and all the dependencies across objects.

e.g I created a management cluster with logging enabled using Tilt, I created a cluster, and take a look at the logs documenting a machine being provisioned by using the following query: {app="capi-controller-manager",controller="machine"} | json | machine_name="classy1-23696-sxcsn-2jkvs"

msg="Bootstrap provider is not ready, requeuing" v=0
msg="Infrastructure provider is not ready, requeuing" v=0
msg="Cannot reconcile Machine's Node, no valid ProviderID yet" v=0
...
msg="Set Machine's NodeRef" v=0

They are ok, but we can do better, by making more explicit what we are waiting for and adding some more details when provisioning completes. So I created a small PR that gives us the following output:

msg="Waiting for bootstrap provider to generate data secret and report status.ready, requeing" v=0 (with a key value pair for the bootstrap object)
msg="Waiting for infrastructure provider to create machine infrastructure and report status.ready, requeing" v=0 (with a key value pair for the infrastructure object)
msg="Waiting for infrastructure provider to report spec.ProviderID, requeing" v=0 (with a key value pair for the infrastructure object)
...
msg="Bootstrap provider generated data secret and reports status.ready" v=0 (with a key value pair for the bootstrap object and one for the secret)
...
msg="Infrastructure provider completed machine infrastructure provisioning and reports status.ready" v=0 (with a key value pair for the infrastructure object)
msg="Infrastructure provider reporting spec.ProviderID, Kubernetes node is now available" v=0 (with a key value pair for the infrastructure object, one for providerID and one for the node)

And the idea behind this issue is to rally the community for creating similar PRs, each one doing small, incremental improvements for one of the Cluster API workflows:

Also worth to notice this is a great opportunity for people willing to dig in into CAPI and learn how things work

/help wanted
/kind cleanup

@k8s-ci-robot k8s-ci-robot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Jul 29, 2022
@fabriziopandini
Copy link
Member Author

/help-wanted

@fabriziopandini fabriziopandini added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022
@fabriziopandini fabriziopandini self-assigned this Jul 29, 2022
@fabriziopandini fabriziopandini modified the milestone: v1.3 Jul 29, 2022
@fabriziopandini fabriziopandini added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Aug 1, 2022
@furkatgofurov7
Copy link
Member

@fabriziopandini hi, I am happy to take up a few items from the list. For the start, I can go with MD creating/deleting MS, if no one is working on it.

@sbueringer
Copy link
Member

@furkatgofurov7 Sounds great! Just go ahead, first come first serve. I'll reserve the sub-task for you above

@sbueringer
Copy link
Member

@fabriziopandini Can we add another sub-task for the structuredmerge package?

I think it would be good if we had more logs there to make it easier to debug.
In my specific case I wanted to know what diff was detected by SSA, i.e.. why were there changes, but no spec changes.

xref: https://kubernetes.slack.com/archives/C8TSNPY4T/p1662462731560389

@valaparthvi
Copy link
Contributor

I'd like to pick up Cluster deletion! @sbueringer, can you please reserve that for me?

@sbueringer
Copy link
Member

Sure, done!

@fabriziopandini
Copy link
Member Author

this is long tail of activity, not blocking for the milestone

@fabriziopandini fabriziopandini removed this from the v1.3 milestone Nov 2, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2023
@sbueringer
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2023
@fabriziopandini
Copy link
Member Author

/unassign

@fabriziopandini
Copy link
Member Author

/close
we are iteratively doing this

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: Closing this issue.

In response to this:

/close
we are iteratively doing this

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

6 participants