util/log: convert log from a singleton #1779

BramGruneir · 2015-07-23T15:39:29Z

Log being a singleton has caused a number of issues during testing. We should convert it from a singleton to a non-singleton and add it to the base context.

By the way, looking up a proper name for non-singleton:
https://stackoverflow.com/questions/233107/is-there-a-name-meaning-not-a-singleton

bdarnell · 2015-07-23T18:11:53Z

This will be very difficult to do; we currently have many places where we log but do not have access to a context object. It will also require changes to our dependencies; etcd/raft has a singleton logger which we bridge to our own, and we have a generic bridge to the stdlib's log module (not sure if anything actually uses this though).

tbg · 2015-07-24T12:17:08Z

an easy route is to make the logger mockable. All package-level methods become vars, the logger is restructured so that you can actually create one (but the package-level vars use a package-level singleton by default).
Then in tests where it matters, you can create your own logger, change the package level logger functions and get what you need. That way we don't need to mess with all the code just for some tests.

petermattis · 2016-02-12T18:31:35Z

@BramGruneir Is log being a singleton still problematic? Do you have a specific test it is causing issues for?

BramGruneir · 2016-02-12T18:39:54Z

I think we worked around the issue. If I recall correctly this was something about running the tests in parallel.

tbg · 2016-07-02T15:44:51Z

I think we should reconsider this. The logger should be passed around with the relevant Context objects (base.Context, storage.StoreContext etc, not necessarily (but maybe actually) context.Context).

The reason I think this is useful is because tests which run multiple servers concurrently in the same process are very hard to debug right now. With per-entity loggers, we can prefix the log messages. For example, #7600 is very hard to debug - five servers, all spewing stuff all over the place. A per-node logger would go a long way here. This is only going to be come more relevant with all the multinode stuff @andreimatei and @WillHaack are building.

As a quickfix, we can introduce a prefixed logger which still uses the global internally, but prefixes messages as configured. Phasing out the access to global log entities is then a boring refactor.

We could also standardize around logging functions which are of form logging.Method(ctx, msg, ...args) which use the logger encapsulated in the context, and a global logger otherwise. This would play together nicely with tracing (we want to consolidate there anyway - everything that goes to stderr should go to a trace too, if there is one around).

But the short term goal here has to be making our tests easier to disambiguate.

tbg · 2016-07-02T16:33:43Z

While I'm at it, I think we can build in better support for verbosity levels with lazy evaluation like this:

package main

import (
    "fmt"
)

func Log(arg interface{}) {
    if f, ok := arg.(func() string); ok {
        fmt.Println(f())
    }
}

func main() {
    Log(func() string { return "Hello, playground" })
}

This allows stuff like

logger.Log("hello %s", desc.String)
logger.V(3).Log("hello", expensiveFnWhichIsntEvaluatedUnlessV3)

The unified tracing/logging API could be something like this:

type Logger interface {
  Prefix(string) Logger
  V(int) Logger
  Depth(int) Logger // offset stack frames
  // Exact set here as usual is debatable. All of these also log to the trace if there is one in the context.
  Logf(context.Context, string, ...interface{})
  Errorf(context.Context, string, ...interface{})
  Fatalf(context.Context, string, ...interface{})
}

See also https://github.com/uber-go/zap (which however does completely structured logging and currently only has JSON output).

We definitely shouldn't try to reinvent the wheel here or spend too much time. The aftermath of #972 has shown that when most folks look at the logs in a terminal most of the time, structured logs catch a lot of hate.

tbg · 2016-07-02T22:58:36Z

Just going to leave this here.

Outputs
{"msg":"testing some stuff","level":"info","ts":1467498951386441514,"fields":{"NodeID":1,"desc":"asd"}}
but see uber-go/zap#59, after which the output could equally read
I160702 18:38:02.896342 NodeID=1 desc=asd: testing some stuff

package main

import (
    "github.com/uber-go/zap"
    "golang.org/x/net/context"
)

type loggerKeyType struct{}

var loggerKey loggerKeyType

func Info(ctx context.Context, msg string, fields ...zap.Field) {
    if logger := ctx.Value(loggerKey); logger != nil {
        logger.(zap.Logger).Info(msg, fields...)
    }
    // TODO: log to trace if there is one.
}

func ContextWithLogger(ctx context.Context, logger zap.Logger) context.Context {
    return context.WithValue(ctx, loggerKey, logger)
}

func main() {
    ctx := ContextWithLogger(context.Background(), zap.NewJSON(zap.Fields(zap.Int64("NodeID", 1))))
    Info(ctx, "testing some stuff", zap.String("desc", "asd"))
}

We would lose file logging, but that can easily be cribbed from glog (along with log rotation if we want to keep that). What's probably gone for good is separate outputs for log levels, --vmodule and --log-stacktrace-at (I won't miss them). What we get in exchange is lower overhead (our logger gets real slow at high throughput during verbose tests), very reasonable structured logging when required and some gimmicks (change settings at runtime, various hooks). Oh, and we'll probably phase out call sites (file.go:line), though we can also keep them if we really want to, at the expense of some allocs.

I wouldn't suggest this relatively sweeping change if we didn't have the consolidation of tracing+logging and unsingletonization on our plate. But with those items being worthwhile, it's probably also worthwhile throwing our logging (which I can only summarize as a zombie) out of the window.

petermattis · 2016-07-04T00:56:46Z

I haven't looked at zap (yet, I will), but I'm very anxious about structured logging. I think it is some imaginary beast developers chase after. We chased it ourselves for a while.

tbg · 2016-07-05T19:34:50Z

Structured logging is no doubt useful and we better support it eventually, but this isn't primarily about structured logging but usability. Looks to me that we had a bad experience in the past mostly because we look at the screen most of the time so that needs to look legible, which wasn't the case with our experiments and because we tried to home-brew all of this stuff on top of a glog zombie, which wasn't pretty (and isn't pretty now).

From a practical standpoint, the zap api makes a lot of sense. No more ever-repeating log.Warningf("range %s: this and that). Simply propagate a logger that wraps this "realm" along with the request. I'm fairly convinced that something similar to the above suggestion is going to be quite appealing in practice, be it on top of zap or not.

And again, tackling this rather sooner than later is advised, I think. It'll be good to have better process/request scoping in our log output - currently pretty lacking.

tbg · 2016-07-06T11:23:54Z

Just a heads up that upstream is going to provide a console output formatter soon: uber-go/zap#59 (comment)

petermattis · 2016-07-06T12:22:42Z

Eh? I've worked in huge systems which do not have structured logging and I'm not convinced we have to support it eventually. That said, zap does look interesting. Passing a logger as an argument down every code path that logs is a bit onerous, but that is idiomatic Go and symmetric with passing context.

tamird · 2016-07-14T15:57:11Z

We discussed this in this morning's stability war room. Things we covered:

When things go wrong, it's hard to figure out why.
- Logs are mostly useless because they often omit context
- No alerting hooks means that issues go unnoticed
  - Consistency checker is a prime candidate for external monitoring

Suggested end state:

Remove all unattached logging. All "logging" should happen via traces instead, with some hooks that we can attach to external monitoring. We need to figure out how to emit traces to stderr for unit test debugging.

@jordanlewis was interested in this, and may already be looking into a migration story.

tbg · 2016-07-14T16:12:54Z

@jordanlewis make sure to back through history. We've had code to log traces to stdout for testing a while ago. When doing so, it's important to log things as they happen (and not as they get finished) and to have a console-optimized format (or it just sucks to look at).

The good thing about logs is that they (can) go to files. As a corollary, we need a good file logging format for traces anyway (to preserve this sort of state for our failed clusters, but also for external users seeking advice). Ideally we'll pick some standard format here that could be fed into various tracer visualizations. I think that means staying close to the data structure it's stored in in basictracer since that is a format that can be pushed anywhere via various Recorder implementations. I think that's actually a protobuf upstream (if not it should be).

Rather than focusing on removing logging, I think the quick win here is to make sure nothing gets logged without also going to a trace (unless there's no trace in sight). The code snippets I suggest above are based on that idea: logging operates exclusively on a context, and everything goes to a trace when possible, and to logs based on verbosity (probably >= Warning).

If we can agree on that premise, then an intermediate step which avoids pulling out log is to allow only the log.{Level}c style families of functions (i.e. taking a context) where the context is only used to call log.Trace(ctx, stuff) so that every log goes to a trace when there is one.
I would still like to gravitate all the way towards embedding the logger in the trace as well, but the above would bring us most of the benefits already.

tbg · 2016-07-14T17:05:58Z

Oh, and whatever is left of the previous Errorc stuff (special fields in contexts, etc - that isn't used for anything and overripe for removal).

tamird · 2016-07-14T17:52:35Z

If we're going to use log.{Level}c, why not go straight to log.Trace? Emitting traces to disk is a prerequisite, obviously.

tbg · 2016-07-14T17:58:03Z

I was suggesting that (well, meant to) for existing logging only. Information which is too verbose for that can do log.Trace directly. I just want to avoid ripping out or dramatically changing the existing logging as that could be disruptive for development (which has had to rely on it for a while now).

tamird · 2016-07-14T18:06:26Z

Wouldn't that be addressed by allowing traces to go to stdout as well (in some form)?

tbg · 2016-07-14T18:12:01Z

That's pretty disruptive to the accustomed output format (you lose all the levels, all the messages look different from one day to the next). We had traces to stdout for a while and I ended up being the only person that ever used it (when I really had to), and generally changes to the log output caused more sorrow than happiness in general. No harm in making traces usable first and then ripping out the rest.

petermattis · 2016-07-14T20:08:15Z

I'm not sure if I agree with the statement that "logs are mostly useless", but their lack of context is often frustrating. I think we should take baby steps here as we've previously burned a lot of time on logging work that didn't turn out to be useful.

Btw, here is the current scope of our logging and tracing in the storage package:

~/Development/go/src/github.com/cockroachdb/cockroach/storage pmattis/gossip-first-range egrep -h -o 'log\.(Info|Error|Warning|Fatal|Trace)[^(]*' *.go | sort | uniq -c | sort -rn
  81 log.Infof
  48 log.Trace
  24 log.Warningf
  24 log.Errorf
  14 log.Fatalf
  10 log.Warningc
  10 log.Error
   8 log.Warning
   8 log.Errorc
   7 log.Infoc
   5 log.InfofDepth
   5 log.Fatalc
   5 log.ErrorfDepth
   4 log.Fatal
   2 log.WarningfDepth
   2 log.Info
   2 log.FatalfDepth

279 in total. There is an argument to be made to have some brave individual go and audit these log messages for consistency in the context provided.

RaduBerinde · 2016-07-20T15:15:13Z

I strongly agree with passing a context to logs whenever possible (and the exceptions should be rare).

The most basic "requirement" is that in go tests where we set up multiple nodes, it should be evident which log message comes from which fake node. So at the very least a node identifier needs to be available in a context.

Then of course anything that's directly related to a certain operation/query should have an operation id that can be used to associate related logs.

Finally, we should have a way to turn logs/traces on and off per-module. We are bound to hit problems which require heavy loads to reproduce and we need to be able to control the log volume.

This is a proposal for adding opaque "tags" to contexts that show up in log messages. See log_context_test for an example. Part of cockroachdb#1779.

RaduBerinde · 2016-07-23T14:25:18Z

The next step I'm working on is to deprecate all the logging functions that don't have a context; specifically we can rename them to InfofDeprecated, InfoDeprecated, etc.

Then we can create new versions for all of them that use contexts (we rename Infoc to Infof and add Info and InfofWithDepth that take contexts).

Then it will be an ongoing tail of work to get rid of the Deprecated instances. We can file multiple issues against different modules.

petermattis · 2016-07-23T15:17:10Z

@RaduBerinde In case you're not familiar with it, gorename is your friend for this type of work.

petermattis · 2016-07-23T15:18:39Z

There is also eg which could add a context.Background() parameter as the first argument to the log methods which don't currently have such an argument.

RaduBerinde · 2016-07-23T22:42:06Z

Thanks Peter! I was renaming the functions using gorename, but I like the eg idea: adding an argument will make it easier to fix up the call sites (and they will still be easy to find/audit, e.g. by using context.TODO()).

Mechanical change to require a context on all logging functions. I will file issues against various components for auditing the call sites to replace context.TODO() with a real context. Part of cockroachdb#1779.

petermattis · 2017-06-01T16:00:36Z

@RaduBerinde Is there anything left to do here? It is unlikely that we'll move to a log singleton, but we've plumbed context into all of the logging methods.

RaduBerinde · 2017-06-01T16:02:50Z

Yeah I don't think there's anything left.

petermattis changed the title ~~Convert Log from a Singleton~~ util/log: convert log from a singleton Feb 12, 2016

BramGruneir closed this as completed Feb 12, 2016

tbg reopened this Jul 2, 2016

tbg self-assigned this Jul 2, 2016

tbg mentioned this issue Jul 6, 2016

Add a TextEncoder for prettier console output uber-go/zap#59

Closed

tbg mentioned this issue Jul 6, 2016

storage: temporary block on logging output causes logic errors to appear #7646

Closed

petermattis assigned tamird and unassigned tbg Jul 10, 2016

tbg mentioned this issue Jul 13, 2016

ui: logs don't work #7779

Closed

petermattis mentioned this issue Jul 15, 2016

storage: Replica.String fosters deadlocks #7852

Closed

tbg mentioned this issue Jul 19, 2016

logging: entire db stalls on contended log I/O although data sits on non-contended storage #7906

Closed

tbg mentioned this issue Jul 19, 2016

util/log: rip out mostly defunct context logging, but make Logc family use trace #7914

Merged

tbg mentioned this issue Jul 21, 2016

Proposal for opaque log tags #7966

Merged

RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Jul 22, 2016

Opaque log tags

1f74b56

This is a proposal for adding opaque "tags" to contexts that show up in log messages. See log_context_test for an example. Part of cockroachdb#1779.

RaduBerinde mentioned this issue Jul 23, 2016

logging: consolidate logging functions to always take a context #7990

Merged

This was referenced Jul 24, 2016

sql: fix logging contexts #7992

Closed

kv: fix logging contexts #7993

Closed

storage: fix logging contexts #7994

Closed

misc: fix logging contexts #7995

Closed

This was referenced Aug 1, 2016

Check for context.TODO() in functions that have a context.Context argument. golang/lint#227

Closed

cleanup: replaces context.TODO() in places where a context.Context already exists #8194

Merged

petermattis added this to the Later milestone Feb 22, 2017

RaduBerinde closed this as completed Jun 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

util/log: convert log from a singleton #1779

util/log: convert log from a singleton #1779

BramGruneir commented Jul 23, 2015

bdarnell commented Jul 23, 2015

tbg commented Jul 24, 2015

petermattis commented Feb 12, 2016

BramGruneir commented Feb 12, 2016

tbg commented Jul 2, 2016 •

edited

Loading

tbg commented Jul 2, 2016

tbg commented Jul 2, 2016

petermattis commented Jul 4, 2016

tbg commented Jul 5, 2016 •

edited

Loading

tbg commented Jul 6, 2016

petermattis commented Jul 6, 2016

tamird commented Jul 14, 2016

tbg commented Jul 14, 2016

tbg commented Jul 14, 2016

tamird commented Jul 14, 2016

tbg commented Jul 14, 2016 •

edited

Loading

tamird commented Jul 14, 2016

tbg commented Jul 14, 2016 •

edited

Loading

petermattis commented Jul 14, 2016

RaduBerinde commented Jul 20, 2016

RaduBerinde commented Jul 23, 2016

petermattis commented Jul 23, 2016

petermattis commented Jul 23, 2016

RaduBerinde commented Jul 23, 2016

petermattis commented Jun 1, 2017

RaduBerinde commented Jun 1, 2017

util/log: convert log from a singleton #1779

util/log: convert log from a singleton #1779

Comments

BramGruneir commented Jul 23, 2015

bdarnell commented Jul 23, 2015

tbg commented Jul 24, 2015

petermattis commented Feb 12, 2016

BramGruneir commented Feb 12, 2016

tbg commented Jul 2, 2016 • edited Loading

tbg commented Jul 2, 2016

tbg commented Jul 2, 2016

petermattis commented Jul 4, 2016

tbg commented Jul 5, 2016 • edited Loading

tbg commented Jul 6, 2016

petermattis commented Jul 6, 2016

tamird commented Jul 14, 2016

tbg commented Jul 14, 2016

tbg commented Jul 14, 2016

tamird commented Jul 14, 2016

tbg commented Jul 14, 2016 • edited Loading

tamird commented Jul 14, 2016

tbg commented Jul 14, 2016 • edited Loading

petermattis commented Jul 14, 2016

RaduBerinde commented Jul 20, 2016

RaduBerinde commented Jul 23, 2016

petermattis commented Jul 23, 2016

petermattis commented Jul 23, 2016

RaduBerinde commented Jul 23, 2016

petermattis commented Jun 1, 2017

RaduBerinde commented Jun 1, 2017

tbg commented Jul 2, 2016 •

edited

Loading

tbg commented Jul 5, 2016 •

edited

Loading

tbg commented Jul 14, 2016 •

edited

Loading

tbg commented Jul 14, 2016 •

edited

Loading