Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add instructions for structured logging migration #4793

Merged
merged 1 commit into from
Jun 5, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,383 @@
# Structured Logging migration instructions

This document describes instructions for migration proposed by [Structured Logging KEP]. It describes new structured
functions introduced in `klog` (Kubernetes logging library) and how log calls should be changed to utilize new features.
This document was written for the initial migration of `kubernetes/kubernetes` repository proposed for Alpha stage, but
should be applicable at later stages or for other projects using `klog` logging library.

[Structured Logging KEP]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/1602-structured-logging

## Goal of Alpha migration

The first step is to introduce structure to the high percentage of logs generated in Kubernetes by changing only a
small number of logs API calls. Based on criteria described in the [selecting most important logs] section, the selected
22 log calls are estimated to impact 99.9% of log volume. The up to date list of these log calls is provided in the
[Enhancement Issue].

[Enhancement Issue]: https://github.com/kubernetes/enhancements/issues/1602
[selecting most important logs]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/1602-structured-logging#selecting-most-important-logs

## Structured logging in Kubernetes

With this enhancement a set of new functions were added to `klog`. Structured logging functions the follow interface
based on [logr], which has a different design than other `klog` functions which are based on [glog]. It is recommended
to familiarize yourself with [logr].

[logr]: https://github.com/go-logr/logr
[glog]: https://github.com/golang/glog

Here are the prototypes of functions added to `klog` that will be utilized during migration:
```go
package klog

// InfoS structured logs to the INFO log.
// The msg argument used to add constant description to the log line.
// The key/value pairs would be join by "=" ; a newline is always appended.
//
// Examples:
// >> klog.InfoS("Pod status updated", "pod", klog.KObj(pod), "status", "ready")
// output:
// >> I1025 00:15:15.525108 1 controller_utils.go:116] "Pod status updated" pod="kube-system/kubedns" status="ready"
func InfoS(msg string, keysAndValues ...interface{})

// ErrorS structured logs to the ERROR, WARNING, and INFO logs.
// the err argument used as "err" field of log line.
// The msg argument used to add constant description to the log line.
// The key/value pairs would be join by "=" ; a newline is always appended.
//
// Examples:
// >> klog.ErrorS(err, "Failed to update pod status")
// output:
// >> E1025 00:15:15.525108 1 controller_utils.go:114] "Failed to update pod status" err="timeout"
func ErrorS(err error, msg string, keysAndValues ...interface{})

// KObj is used to create ObjectRef when logging information about Kubernetes objects
// Examples:
// >> klog.InfoS("Pod status updated", "pod", klog.KObj(pod), "status", "ready")
// output:
// >> I1025 00:15:15.525108 1 controller_utils.go:116] "Pod status updated" pod="kube-system/kubedns" status="ready"
func KObj(obj KMetadata) ObjectRef

// KRef is used to create ObjectRef when logging information about Kubernetes objects without access to metav1.Object
// Examples:
// >> klog.InfoS("Pod status updated", "pod", klog.KRef(podNamespace, podName), "status", "ready")
// output:
// >> I1025 00:15:15.525108 1 controller_utils.go:116] "Pod status updated" pod="kube-system/kubedns" status="ready"
func KRef(namespace, name string) ObjectRef

// ObjectRef represents a reference to a kubernetes object used for logging purpose
// In text logs it is serialized into "{namespace}/{name}" or "{name}" if namespace is empty
type ObjectRef struct {
Name string `json:"name"`
Namespace string `json:"namespace,omitempty"`
}

// KMetadata is a subset of the kubernetes k8s.io/apimachinery/pkg/apis/meta/v1.Object interface
// this interface may expand in the future, but will always be a subset of the
// kubernetes k8s.io/apimachinery/pkg/apis/meta/v1.Object interface
type KMetadata interface {
GetName() string
GetNamespace() string
}
```

## Migration

1. Change log functions to structured equivalent
1. Remove string formatting from log message
1. Name arguments
1. Use `klog.KObj` and `klog.KRef` for Kubernetes object references
1. Verify log output

## Change log functions to structured equivalent

Structured logging functions follow a different logging interface design than other functions in `klog`. They follow
minimal design from [logr] thus there is no one-to-one mapping.

Simplified mapping between functions:
* `klog.Infof`, `klog.Info`, `klog.Infoln`, `klog.InfoDepth` -> `klog.InfoS`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "depth" ones are challenging because they presume hat popping up the call-stack is possible. I count about 8 such files that make these calls in k/k. Do you link we need an InfoS variant that includes depth? That would be ugly to add to logr because it tries to abstract multiple implementations, but maybe that's something we should tackle in logr and klog, if it is a missing capability...

Copy link
Contributor

@DirectXMan12 DirectXMan12 May 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the list is:

directxman12@💻 ~[go:kubernetes] $ git rev-parse HEAD
02637bb25016c5362dc945406df0b2114868ecf2
directxman12@💻 ~[go:kubernetes] $ ag 'klog\.\w+Depth'
staging/src/k8s.io/component-base/logs/logs.go
49:     klog.InfoDepth(1, string(data))

staging/src/k8s.io/kubectl/pkg/util/logs/logs.go
42:     klog.InfoDepth(1, string(data))

staging/src/k8s.io/kubectl/pkg/cmd/util/helpers.go
93:             klog.FatalDepth(2, msg)

staging/src/k8s.io/client-go/tools/clientcmd/config.go
486:            klog.FatalDepth(1, err)

staging/src/k8s.io/apiserver/pkg/storage/etcd3/logger.go
35:     klog.InfoDepth(klogWrapperDepth, args...)
39:     klog.InfoDepth(klogWrapperDepth, fmt.Sprintln(args...))
43:     klog.InfoDepth(klogWrapperDepth, fmt.Sprintf(format, args...))
47:     klog.WarningDepth(klogWrapperDepth, args...)
51:     klog.WarningDepth(klogWrapperDepth, fmt.Sprintln(args...))
55:     klog.WarningDepth(klogWrapperDepth, fmt.Sprintf(format, args...))
59:     klog.ErrorDepth(klogWrapperDepth, args...)
63:     klog.ErrorDepth(klogWrapperDepth, fmt.Sprintln(args...))
67:     klog.ErrorDepth(klogWrapperDepth, fmt.Sprintf(format, args...))
71:     klog.FatalDepth(klogWrapperDepth, args...)
75:     klog.FatalDepth(klogWrapperDepth, fmt.Sprintln(args...))
79:     klog.FatalDepth(klogWrapperDepth, fmt.Sprintf(format, args...))

staging/src/k8s.io/apiserver/pkg/server/httplog/httplog.go
161:                    klog.InfoDepth(1, fmt.Sprintf("verb=%q URI=%q latency=%v resp=%v UserAgent=%q srcIP=%q: %v%v",
168:                    klog.InfoDepth(1, fmt.Sprintf("verb=%q URI=%q latency=%v UserAgent=%q srcIP=%q: hijacked",
198:            klog.InfoDepth(1, fmt.Sprintf("Unable to convert %+v into http.Flusher", rl.w))

staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go
114:    klog.ErrorDepth(2, err)

pkg/kubelet/kubeletconfig/util/log/log.go
36:     klog.ErrorDepth(1, fmt.Sprintf(logFmt, s))
48:     klog.InfoDepth(1, fmt.Sprintf(logFmt, s))

Going through those,

  1. component-base is an implementation of io.Writer --> klog that preserves line information. Mainly used to capture stdlib logs to klog.

  2. the first kubectl is a copy of component-base (actually, I think it was vice-versa, but whatever)

  3. Conditionally calls normal logger if the log verbosity is high enough.

  4. GetConfigOrDie, attempting to provide useful info as to where the error came from. Better off calling ErrorS with a sink that produces stack traces.

  5. Extra wrapper for the etcd client to log to klog.

  6. Helper "middleware" for logging http requests. Trying to report caller of the logging helper.

  7. Helper for runtime.HandleError. Just use an implementation that reports stack traces with errors. Take care not to contemplate runtime.HandleError too deeply, or it will make you very, very sad.

  8. Inserts a prefix at the beginning of every line. Really should just go away with proper named loggers.

Several of those are semi-global logging wrappers (1, 2, 5) that are better off just having an implementation that can do caller-skip (e.g. zap). I'm not sure we want the caller-skip at the interface level, though.

3, 4, and 7 and probably just need to go away -- they're better dealt-with with implementations that capture backtraces on error.

8 should be replaced by proper prefix/logger name support, so that a wrapper is no longer needed

That just leaves 6, the middleware logger. @thockin's suggestion @ https://github.com/kubernetes/community/pull/4793/files#r431353200 is a good one, but removes the middleware part (a comment in the file suggests that the middleware-link functionality is probably superfluous). Along the same lines, something like klog.InfoS("http request", httplog.Request(req)) would work too, where httplog.Request is analagous to log.KObj.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several of those are semi-global logging wrappers (1, 2, 5) that are better off just having an implementation that can do caller-skip (e.g. zap). I'm not sure we want the caller-skip at the interface level, though.

These ones seem like valid use-cases to me. In fact, I wrote the first version of (5), I think. The ability to strip frames from the callstack seems pretty necessary for any sort of logging util helper or adapter.

I don't quite grok your comment. How would an implementation that can skip callframes be utilized under something like logr without a function for it in the interface?

In other words, should logr and klog grow new methods for InfoDepth(depth, msg, string...) and ErrorDepth(depth, err, msg, string...) ?

Copy link
Contributor

@DirectXMan12 DirectXMan12 May 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... let me rephrase a bit. I think I misinterpreted somethings, and miscommunicated others. I've rewritten this comment a few times, so it's possible there's something I'm not quite articulating correctly.

I agree that they're valid use cases, but:

All the mentioned numbers follow a pattern of "here's a set of helper functions where all the functions have a consistent caller skip". The caller skip isn't varying per-function in the file. I was originally (above) suggesting something like

var logWrapper = klog.WithSkip(4)

func Info(msg) {
  logWrapper.InfoS(prefix+msg)
}

However, that only works when we're not redirecting to logr. That's not too big a deal -- it preserves the status quo, punts on putting things into our interface, and should be doable with the way the internals of klog work.

We could consider adding the WithSkip to logr, which makes the caller skip compose and feels a lot more like the rest of logr's interface. However, this is a bit annoying to implement in some logging backends (see below the fold), and still feels a bit brittle (again, below the fold).

[below the fold] Why I'm kinda against adding the extra Depth methods

My objections/uneasiness with this has two parts:

The first is that adding depth functions feels a lot like leaking implementation details through to the user. This kinda paints us into the corner in terms of always explicitly supporting caller skip like this -- I could see a world where we decide to drop caller in favor of logger names + stack traces for errors or a world where we decide to do "capture caller into context".

The second is that that per-function caller skip is an easy interface to accidentally misuse:

  • It's easy to miss/forget the Depth in one of your functions.

  • It doesn't compose (you can't wrap helpers in helpers and still get useful information, which is something I've seen with similar types of functionality in codebases like controller-runtime).

  • It's sufficiently brittle that I'd actively encourage folks not to use it -- it relies on exact code structure not changing, which isn't something we can enforce with the compiler, lint for, etc. In the case of the etcd logger, if the internals of the etcd client change, we suddenly lose the usefulness of our logging information. That's not really an interface I want people writing against.

Moving on to practical matters: how much we care about people actually being able to implement this, or would we rather they just ignore it? Zap doesn't support per-function caller skip OOTB, but it's doable with some hacking (you'd call entry := logger.Check(...); entry.Caller = /* calculate caller yourself */, or maybe could finagle a custom zapcore.Core to do it), logrus doesn't even do caller skip AFAICT, etc. Do we want to continue to support delegating to logr? It's not clear how much of a goal that it from the KEP (I think we should, or at least have some similar functionality).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider adding the WithSkip to logr

The problem I have with this is that it doesn't feel like a property ofthe
logger object, but of the call-site. I guess if you are wrapping a Logger in
some other type, you can argue that this IS part of the logger (or as you
characterized - it doesn't change). If there's a one-off logging helper
function, you can always make a temporary object (though that can be
allocation-heav if it is done a lot). E.g.

func logHelper(log logr.Logger, foo Foo) {
  logr.SkipFrames(1).Info("foo message", "address", foo.Addr(), "phone", foo.Phone)
}

per-function caller skip is an easy interface to accidentally misuse:

OK, you convinced me.

how much we care about people actually being able to implement this, or would we rather they just ignore it?

I suspect some log libs will handle it and some will no-op it. I think that's
fine, honestly. There will be pressure or there won't, but at least it is
possible to DTRT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be alloc-heavy if it is done a lot

I would hope escape analysis and friends would prevent a heap allocation here.

* `klog.V(N).Infof`, `klog.V(N).Info`, `klog.V(N).Infoln` -> `klog.V(N).InfoS`
* `klog.Warning`, `klog.Warningf`, `klog.Warningln`, `klog.WarningDepth` -> `klog.InfoS`
* `klog.Error`, `klog.Errorf`, `klog.Errorln`, `klog.ErrorDepth` -> `klog.ErrorS`
* `klog.Fatal`, `klog.Fatalf`, `klog.Fatalln`, `klog.FatalDepth` -> `klog.ErrorS`

### Removing Depth

Functions with depth (`klog.InfoDepth`, `klog.WarningDepth`, `klog.ErrorDepth`, `klog.FatalDepth`) are used to indicate
that the source of the log (added as metadata in log) is different than the invocation of logging library. This is
usually used when implementing logging util functions. As logr interface doesn't support depth, those functions should
return logging arguments instead of calling `klog` directly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth noting that there are situations where this isn't possible (mainly "adapters" and runtime.HandleError -- see my comment above), even if we decide to punt down the road on how to handle those cases

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a todo about expanding this section with the discussion above (we'll need it before beta)?


For example
```go
func Handle(w http.ReponseWriter, r *http.Request) {
logHTTPRequest(r)
handle(w, r)
}

func logHTTPRequest(r *http.Request) {
klog.InfoDepth(1, "Received HTTP %s request", r.Method)
}
```
should be replaced with
```go
func Handle(w http.ReponseWriter, r *http.Request) {
klog.Info("Received HTTP request", httpRequestLog(r)...)
handle(w, r)
}

func httpRequestLog(r *http.Request) []interface{} {
return []interface{}{
"verb", r.Method,
}
}

```

### Using ErrorS

With `klog` structured logging borrowing the interface from [logr] it also inherits it's differences in semantic of
error function. Logs generated by `ErrorS` command may be enhanced with additional debug information
(such as stack traces) or be additionally sent to special error recording tools. Errors should be used to indicate
unexpected behaviours in code, like unexpected errors returned by subroutine function calls.

Calling `ErrorS` with `nil` as error is semi-acceptable if there is error condition that deserves a stack trace at this
origin point. For expected errors (`errors` that can happen during routine operations) please consider using
`klog.InfoS` and pass error in `err` key instead.

### Replacing Fatal calls

Use of Fatal should be discouraged and it's not available in new functions. Instead of depending on the logger to exit
the process, you should call `os.Exit()` yourself.

## Remove string formatting from log message
serathius marked this conversation as resolved.
Show resolved Hide resolved

With structured logging, log messages are no longer formatted, leaving argument marshalling up to the logging client
implementation. This allows messages to be a static description of event.

All string formatting (`%d`, `%v`, `%w`, `%s`) should be removed and log message string simplified.
Describing arguments in log messages is no longer needed and should be removed leaving only a description of what
happened.

Additionally we can improve messages to comply with good practices:
* Start from a capital letter.
* Do not end the message with a period.
* Use active voice. Use complete sentences when there is an acting subject ("A could not do B") or omit the subject if
the subject would be the program itself ("Could not do B").
* Use past tense ("Could not delete B" instead of "Cannot delete B")
* When referring to an object, state what type of object it is. ("Deleted pod" instead of "Deleted")

For example
```go
klog.Infof("delete pod %s with propagation policy %s", ...)
```
should be changed to
```go
klog.Infof("Deleted pod", ...)
```

Some logs are constructed solely from string formats. In those cases a message needs to be derived from the context of
the log call.

For example http access logs
```go
func LogHTTP(r *http.Request) {
klog.Infof("%s %s: (%v) %v%v%v [%s %s]", ...)
}
```
should be changed to
```go
func LogHTTP(r *http.Request) {
klog.InfoS("Received HTTP request", ...)
}
```

### Name arguments
serathius marked this conversation as resolved.
Show resolved Hide resolved

Even though new structured logging functions have very similar function prototype `func (string, ...interface{})` it
has different meaning for variadic arguments. Instead of just passing arguments, now we are passing key value pairs of
argument name and argument value. This means when migrating a log call we need to add an additional string before each
argument, that will be used as it's name.

For example
```go
func LogHTTP(r *http.Request) {
klog.Infof("Received HTTP request, path: %s, method: %s", r.Path, r.Method)
}
```
should be changed to
```go
func LogHTTP(r *http.Request) {
klog.InfoS("Received HTTP request", "path", r.Path, "method", r.Method)
}
```

Names of arguments should use [lowerCamelCase] and be alphanumeric. Arguments names in one log call should be unique.
Names should be picked based on semantic meaning of value itself, not the context in which is used (log message should
imply the context). For example names like `status` should be used over (`desiredStatus`, `oldStatus`, `badStatus`) thus
allowing to query and join different log lines of the `status` field.

Kubernetes objects should be referenced using only their kind, no matter their api group or version. Example argument
serathius marked this conversation as resolved.
Show resolved Hide resolved
names: `deployment`, `pod`, `node`, `replicaSet`. For objects of unknown type, is ok to log them under `object` key with
addition of `apiVersion` and `kind` fields describing the k8s object type.

In situations when we want to the log value of the same meaning twice (e.g. transition between state) it is ok to name
an additional argument based on context, but leaving one most current/correct value with canonical name.

Examples of keys (strongly suggested, will be extended when pattern emerge, no standard schema yet):
* `err` - error when using `klog.InfoS`. Used for expected errors that are not `klog.ErrorS`.
* `object` - reference to k8s objects of unknown type. Should be used with `kind` and `apiVersion`.
* `kind` - kind of k8s object of unknown type.
* `apiVersion` - API version of k8s object of unknown type.

Example:

```go
func ChangeStatus(newStatus, currentStatus string) {
err := changeStatus(newStatus)
if err != nil {
klog.ErrorS(err, "Failed changing status", "desiredStatus", newStatus, "status", currentStatus)
}
klog.InfoS("Changed status", "previousStatus", currentStatus, "status", newStatus)
}
```

[lowerCamelCase]: https://en.wiktionary.org/wiki/lowerCamelCase

### Use `klog.KObj` and `klog.KRef` for Kubernetes objects

As part of structured logging migration we want to ensure that kubernetes objects references are consistent within the
codebase. Two new utility functions were introduced to klog `klog.KObj` and `klog.KRef`. Any reference
(name, uid, namespace) to Kubernetes Object (Pod, Node, Deployment, CRD) should be rewritten to utilize those functions.
In situations when object `UID` is would be beneficial for log, it should be added as separate field with `UID` suffix.

For example
```go
func updatePod(pod *covev1.Pod) {
...
klog.Infof("Updated pod %s in namespace %s", pod.Name, pod.Namespace)
}
```
should be changed to
```go
func updatePod(pod *covev1.Pod) {
...
klog.InfoS("Updated pod", "pod", klog.KObj(pod))
}
```
And
```go
func updatePod(pod *covev1.Pod) {
...
klog.Infof("Updated pod with uid: %s", pod.Uid)
}
```
should be changed to
```go
func updatePod(pod *covev1.Pod) {
...
klog.InfoS("Updated pod", "pod", klog.KObj(pod), "podUID", pod.Uid)
}
```

`klog.KObj` requires passing a kubernetes object (struct implementing `metav1.Object` interface). In situations where
the object is not available, we can use `klog.KRef`. Still it is suggested to rewrite the code to use object pointer
instead of strings where possible.

```go
func updatePod(podName, podNamespace string) {
...
klog.InfoS("Updated pod", "pod", klog.KRef(podNamespace, podName))
}
```

For non-namespaced object we can pass empty string to namespace argument

```go
func updateNode(nodeName string) {
...
klog.InfoS("Updated node", "node", klog.KRef("", nodeName))
}
```

### Verify log output

With the introduction of structured functions log arguments will be formatted automatically instead of depending on the
caller. This means that we can remove the burden of picking the format by caller and ensure greater log consistency, but during
migration it's important to ensure that we avoid degradation of log quality. We should ensure that during migration we
preserve properties like:
* meaning of event described by log
* verbosity of stored information

PRs migrating logs should include examples of outputted logs before and after the change, thus helping reviewers
understand the impact of change.

Example code to compare [httplog.go#168](https://github.com/kubernetes/kubernetes/blob/15c3f1b11/staging/src/k8s.io/apiserver/pkg/server/httplog/httplog.go#L168)
```
package main

import (
"fmt"
"k8s.io/klog/v2"
"net/http"
"time"
)

type respLogger struct {
status int
statusStack string
addedInfo string
req *http.Request
}

func (rl *respLogger) Log(latency time.Duration) {
klog.InfoDepth(1, fmt.Sprintf("verb=%q URI=%q latency=%v resp=%v UserAgent=%q srcIP=%q: %v%v",
rl.req.Method, rl.req.RequestURI,
latency, rl.status,
rl.req.UserAgent(), rl.req.RemoteAddr,
rl.statusStack, rl.addedInfo,
))
}

func (rl *respLogger) LogArgs(latency time.Duration) []interface{} {
return []interface{}{
"verb", rl.req.Method,
"URI", rl.req.RequestURI,
"latency", latency,
"resp", rl.status,
"userAgent", rl.req.UserAgent(),
"srcIP", rl.req.RemoteAddr,
}
}

func main() {
klog.InitFlags(nil)

// Setup
rl := respLogger{
status: 200,
req: &http.Request{
Method: "GET",
Header: map[string][]string{"User-Agent": {"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0. 2272.118 Safari/537.36."}},
RemoteAddr: "127.0.0.1",
RequestURI: "/metrics",
},
}
latency := time.Second

// Before migration
rl.Log(latency)

// After migration
klog.InfoS("Received HTTP request", rl.LogArgs(latency)...)
}
```

Log output before migration
```
I0528 19:15:22.737538 47512 logtest.go:52] verb="GET" URI="/metrics" latency=1s resp=200 UserAgent="Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0. 2272.118 Safari/537.36." srcIP="127.0.0.1":
```
After
```
I0528 19:15:22.737588 47512 logtest.go:55] "Received HTTP request" verb="GET" URI="/metrics" latency="1s" resp=200 userAgent="Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0. 2272.118 Safari/537.36." srcIP="127.0.0.1"
```