Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util/log: convert log from a singleton #1779

Closed
BramGruneir opened this issue Jul 23, 2015 · 26 comments
Closed

util/log: convert log from a singleton #1779

BramGruneir opened this issue Jul 23, 2015 · 26 comments
Assignees
Milestone

Comments

@BramGruneir
Copy link
Member

Log being a singleton has caused a number of issues during testing. We should convert it from a singleton to a non-singleton and add it to the base context.

By the way, looking up a proper name for non-singleton:
https://stackoverflow.com/questions/233107/is-there-a-name-meaning-not-a-singleton

@bdarnell
Copy link
Contributor

This will be very difficult to do; we currently have many places where we log but do not have access to a context object. It will also require changes to our dependencies; etcd/raft has a singleton logger which we bridge to our own, and we have a generic bridge to the stdlib's log module (not sure if anything actually uses this though).

@tbg
Copy link
Member

tbg commented Jul 24, 2015

an easy route is to make the logger mockable. All package-level methods become vars, the logger is restructured so that you can actually create one (but the package-level vars use a package-level singleton by default).
Then in tests where it matters, you can create your own logger, change the package level logger functions and get what you need. That way we don't need to mess with all the code just for some tests.

@petermattis petermattis changed the title Convert Log from a Singleton util/log: convert log from a singleton Feb 12, 2016
@petermattis
Copy link
Collaborator

@BramGruneir Is log being a singleton still problematic? Do you have a specific test it is causing issues for?

@BramGruneir
Copy link
Member Author

I think we worked around the issue. If I recall correctly this was something about running the tests in parallel.

@tbg
Copy link
Member

tbg commented Jul 2, 2016

I think we should reconsider this. The logger should be passed around with the relevant Context objects (base.Context, storage.StoreContext etc, not necessarily (but maybe actually) context.Context).

The reason I think this is useful is because tests which run multiple servers concurrently in the same process are very hard to debug right now. With per-entity loggers, we can prefix the log messages. For example, #7600 is very hard to debug - five servers, all spewing stuff all over the place. A per-node logger would go a long way here. This is only going to be come more relevant with all the multinode stuff @andreimatei and @WillHaack are building.

As a quickfix, we can introduce a prefixed logger which still uses the global internally, but prefixes messages as configured. Phasing out the access to global log entities is then a boring refactor.

We could also standardize around logging functions which are of form logging.Method(ctx, msg, ...args) which use the logger encapsulated in the context, and a global logger otherwise. This would play together nicely with tracing (we want to consolidate there anyway - everything that goes to stderr should go to a trace too, if there is one around).

But the short term goal here has to be making our tests easier to disambiguate.

@tbg tbg reopened this Jul 2, 2016
@tbg tbg self-assigned this Jul 2, 2016
@tbg
Copy link
Member

tbg commented Jul 2, 2016

While I'm at it, I think we can build in better support for verbosity levels with lazy evaluation like this:

package main

import (
    "fmt"
)

func Log(arg interface{}) {
    if f, ok := arg.(func() string); ok {
        fmt.Println(f())
    }
}

func main() {
    Log(func() string { return "Hello, playground" })
}

This allows stuff like

logger.Log("hello %s", desc.String)
logger.V(3).Log("hello", expensiveFnWhichIsntEvaluatedUnlessV3)

The unified tracing/logging API could be something like this:

type Logger interface {
  Prefix(string) Logger
  V(int) Logger
  Depth(int) Logger // offset stack frames
  // Exact set here as usual is debatable. All of these also log to the trace if there is one in the context.
  Logf(context.Context, string, ...interface{})
  Errorf(context.Context, string, ...interface{})
  Fatalf(context.Context, string, ...interface{})
}

See also https://github.com/uber-go/zap (which however does completely structured logging and currently only has JSON output).

We definitely shouldn't try to reinvent the wheel here or spend too much time. The aftermath of #972 has shown that when most folks look at the logs in a terminal most of the time, structured logs catch a lot of hate.

@tbg
Copy link
Member

tbg commented Jul 2, 2016

Just going to leave this here.

Outputs
{"msg":"testing some stuff","level":"info","ts":1467498951386441514,"fields":{"NodeID":1,"desc":"asd"}}
but see uber-go/zap#59, after which the output could equally read
I160702 18:38:02.896342 NodeID=1 desc=asd: testing some stuff

package main

import (
    "github.com/uber-go/zap"
    "golang.org/x/net/context"
)

type loggerKeyType struct{}

var loggerKey loggerKeyType

func Info(ctx context.Context, msg string, fields ...zap.Field) {
    if logger := ctx.Value(loggerKey); logger != nil {
        logger.(zap.Logger).Info(msg, fields...)
    }
    // TODO: log to trace if there is one.
}

func ContextWithLogger(ctx context.Context, logger zap.Logger) context.Context {
    return context.WithValue(ctx, loggerKey, logger)
}

func main() {
    ctx := ContextWithLogger(context.Background(), zap.NewJSON(zap.Fields(zap.Int64("NodeID", 1))))
    Info(ctx, "testing some stuff", zap.String("desc", "asd"))
}

We would lose file logging, but that can easily be cribbed from glog (along with log rotation if we want to keep that). What's probably gone for good is separate outputs for log levels, --vmodule and --log-stacktrace-at (I won't miss them). What we get in exchange is lower overhead (our logger gets real slow at high throughput during verbose tests), very reasonable structured logging when required and some gimmicks (change settings at runtime, various hooks). Oh, and we'll probably phase out call sites (file.go:line), though we can also keep them if we really want to, at the expense of some allocs.

I wouldn't suggest this relatively sweeping change if we didn't have the consolidation of tracing+logging and unsingletonization on our plate. But with those items being worthwhile, it's probably also worthwhile throwing our logging (which I can only summarize as a zombie) out of the window.

@petermattis
Copy link
Collaborator

I haven't looked at zap (yet, I will), but I'm very anxious about structured logging. I think it is some imaginary beast developers chase after. We chased it ourselves for a while.

@tbg
Copy link
Member

tbg commented Jul 5, 2016

Structured logging is no doubt useful and we better support it eventually, but this isn't primarily about structured logging but usability. Looks to me that we had a bad experience in the past mostly because we look at the screen most of the time so that needs to look legible, which wasn't the case with our experiments and because we tried to home-brew all of this stuff on top of a glog zombie, which wasn't pretty (and isn't pretty now).

From a practical standpoint, the zap api makes a lot of sense. No more ever-repeating log.Warningf("range %s: this and that). Simply propagate a logger that wraps this "realm" along with the request. I'm fairly convinced that something similar to the above suggestion is going to be quite appealing in practice, be it on top of zap or not.

And again, tackling this rather sooner than later is advised, I think. It'll be good to have better process/request scoping in our log output - currently pretty lacking.

@tbg
Copy link
Member

tbg commented Jul 6, 2016

Just a heads up that upstream is going to provide a console output formatter soon: uber-go/zap#59 (comment)

@petermattis
Copy link
Collaborator

Eh? I've worked in huge systems which do not have structured logging and I'm not convinced we have to support it eventually. That said, zap does look interesting. Passing a logger as an argument down every code path that logs is a bit onerous, but that is idiomatic Go and symmetric with passing context.

@tamird
Copy link
Contributor

tamird commented Jul 14, 2016

We discussed this in this morning's stability war room. Things we covered:

  • When things go wrong, it's hard to figure out why.
    • Logs are mostly useless because they often omit context
    • No alerting hooks means that issues go unnoticed
      • Consistency checker is a prime candidate for external monitoring

Suggested end state:

Remove all unattached logging. All "logging" should happen via traces instead, with some hooks that we can attach to external monitoring. We need to figure out how to emit traces to stderr for unit test debugging.

@jordanlewis was interested in this, and may already be looking into a migration story.

@tbg
Copy link
Member

tbg commented Jul 14, 2016

@jordanlewis make sure to back through history. We've had code to log traces to stdout for testing a while ago. When doing so, it's important to log things as they happen (and not as they get finished) and to have a console-optimized format (or it just sucks to look at).

The good thing about logs is that they (can) go to files. As a corollary, we need a good file logging format for traces anyway (to preserve this sort of state for our failed clusters, but also for external users seeking advice). Ideally we'll pick some standard format here that could be fed into various tracer visualizations. I think that means staying close to the data structure it's stored in in basictracer since that is a format that can be pushed anywhere via various Recorder implementations. I think that's actually a protobuf upstream (if not it should be).

Rather than focusing on removing logging, I think the quick win here is to make sure nothing gets logged without also going to a trace (unless there's no trace in sight). The code snippets I suggest above are based on that idea: logging operates exclusively on a context, and everything goes to a trace when possible, and to logs based on verbosity (probably >= Warning).

If we can agree on that premise, then an intermediate step which avoids pulling out log is to allow only the log.{Level}c style families of functions (i.e. taking a context) where the context is only used to call log.Trace(ctx, stuff) so that every log goes to a trace when there is one.
I would still like to gravitate all the way towards embedding the logger in the trace as well, but the above would bring us most of the benefits already.

@tbg
Copy link
Member

tbg commented Jul 14, 2016

Oh, and whatever is left of the previous Errorc stuff (special fields in contexts, etc - that isn't used for anything and overripe for removal).

@tamird
Copy link
Contributor

tamird commented Jul 14, 2016

If we're going to use log.{Level}c, why not go straight to log.Trace? Emitting traces to disk is a prerequisite, obviously.

@tbg
Copy link
Member

tbg commented Jul 14, 2016

I was suggesting that (well, meant to) for existing logging only. Information which is too verbose for that can do log.Trace directly. I just want to avoid ripping out or dramatically changing the existing logging as that could be disruptive for development (which has had to rely on it for a while now).

@tamird
Copy link
Contributor

tamird commented Jul 14, 2016

Wouldn't that be addressed by allowing traces to go to stdout as well (in some form)?

@tbg
Copy link
Member

tbg commented Jul 14, 2016

That's pretty disruptive to the accustomed output format (you lose all the levels, all the messages look different from one day to the next). We had traces to stdout for a while and I ended up being the only person that ever used it (when I really had to), and generally changes to the log output caused more sorrow than happiness in general. No harm in making traces usable first and then ripping out the rest.

@petermattis
Copy link
Collaborator

I'm not sure if I agree with the statement that "logs are mostly useless", but their lack of context is often frustrating. I think we should take baby steps here as we've previously burned a lot of time on logging work that didn't turn out to be useful.

Btw, here is the current scope of our logging and tracing in the storage package:

~/Development/go/src/github.com/cockroachdb/cockroach/storage pmattis/gossip-first-range egrep -h -o 'log\.(Info|Error|Warning|Fatal|Trace)[^(]*' *.go | sort | uniq -c | sort -rn
  81 log.Infof
  48 log.Trace
  24 log.Warningf
  24 log.Errorf
  14 log.Fatalf
  10 log.Warningc
  10 log.Error
   8 log.Warning
   8 log.Errorc
   7 log.Infoc
   5 log.InfofDepth
   5 log.Fatalc
   5 log.ErrorfDepth
   4 log.Fatal
   2 log.WarningfDepth
   2 log.Info
   2 log.FatalfDepth

279 in total. There is an argument to be made to have some brave individual go and audit these log messages for consistency in the context provided.

@RaduBerinde
Copy link
Member

I strongly agree with passing a context to logs whenever possible (and the exceptions should be rare).

The most basic "requirement" is that in go tests where we set up multiple nodes, it should be evident which log message comes from which fake node. So at the very least a node identifier needs to be available in a context.

Then of course anything that's directly related to a certain operation/query should have an operation id that can be used to associate related logs.

Finally, we should have a way to turn logs/traces on and off per-module. We are bound to hit problems which require heavy loads to reproduce and we need to be able to control the log volume.

RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Jul 22, 2016
This is a proposal for adding opaque "tags" to contexts that show up in log
messages. See log_context_test for an example.

Part of cockroachdb#1779.
@RaduBerinde
Copy link
Member

The next step I'm working on is to deprecate all the logging functions that don't have a context; specifically we can rename them to InfofDeprecated, InfoDeprecated, etc.

Then we can create new versions for all of them that use contexts (we rename Infoc to Infof and add Info and InfofWithDepth that take contexts).

Then it will be an ongoing tail of work to get rid of the Deprecated instances. We can file multiple issues against different modules.

@petermattis
Copy link
Collaborator

@RaduBerinde In case you're not familiar with it, gorename is your friend for this type of work.

@petermattis
Copy link
Collaborator

There is also eg which could add a context.Background() parameter as the first argument to the log methods which don't currently have such an argument.

@RaduBerinde
Copy link
Member

Thanks Peter! I was renaming the functions using gorename, but I like the eg idea: adding an argument will make it easier to fix up the call sites (and they will still be easy to find/audit, e.g. by using context.TODO()).

RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Jul 24, 2016
Mechanical change to require a context on all logging functions.

I will file issues against various components for auditing the call sites to
replace context.TODO() with a real context.

Part of cockroachdb#1779.
@petermattis petermattis added this to the Later milestone Feb 22, 2017
@petermattis
Copy link
Collaborator

@RaduBerinde Is there anything left to do here? It is unlikely that we'll move to a log singleton, but we've plumbed context into all of the logging methods.

@RaduBerinde
Copy link
Member

Yeah I don't think there's anything left.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants