Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate limited errors #195

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

joe-elliott
Copy link
Contributor

This PR attempts to address concerns in #192. It provides

  • A rate limited logger
  • A method of configuring the logger for high volume http errors through the server
  • Fixed possible race condition

While working on this PR I noticed this:

func (l Log) logWithRequest(r *http.Request) logging.Interface {
	traceID, ok := ExtractTraceID(r.Context())
	if ok {
		l.Log = l.Log.WithField("traceID", traceID)
	}

	return user.LogWith(r.Context(), l.Log)
}

Particularly the line:
l.Log = l.Log.WithField("traceID", traceID)
I'm unsure why the struct var is being overwritten in logWithRequest. This seems like a possible race condition to me? I assume there's one instance of the middleware that is shared by all requests in the pipeline?

@bboreham
Copy link
Collaborator

why the struct var is being overwritten

it's a local variable, so not likely to cause a problem, but this was made less confusing in #193

} else if statusCode == http.StatusBadGateway || statusCode == http.StatusServiceUnavailable {
l.logHighVolumeError(r, "%s %s (%d) %s", r.Method, uri, statusCode, time.Since(begin))
if l.LogRequestHeaders && headers != nil {
l.logHighVolumeError(r, "ws: %v; %s", IsWSHandshakeRequest(r), string(headers))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think we want this on a different line to the method?
(I see it did this in the code before, but only at debug level)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's ok. this way the user can configure LogRequestHeaders and HighVolumeErrorLog independently and any combination works.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm thinking is: suppose you want to look at logs for a specific request; you can grep for the URI. But the headers are on the next line, which is awkward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, that's a good point. i hadn't considered that. i'll push a small change to unify them

func (l *rateLimitedLogger) WithFields(f Fields) Interface {
return &rateLimitedLogger{
next: l.next.WithFields(f),
limiter: rate.NewLimiter(l.limiter.Limit(), 0),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we get a new limiter here?
Also, does it work with a burst size of zero?

Copy link
Contributor Author

@joe-elliott joe-elliott Sep 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the burst limit was wrong. Good catch. Added a test and fixed.

As far as the new limiter I figured calling WithFields() created a new logger that should have an independent rate limit, but I can go either way on this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thing is, code often calls WithFields() on every operation, e.g. the example you gave in the description, or this one:

entry := user.LogWith(ctx, s.Log).WithFields(logging.Fields{"method": info.FullMethod, "duration": time.Since(begin)})

I expect the case you're trying to hit is where a high volume of logs from from lots of operations, so you would want the same rate-limit to be applied across them all. Try it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Thanks man 👍

Fixed.

@bboreham
Copy link
Collaborator

Sorry I haven't been back here for a while. Code looks ok now; do you still want to merge it?
Have you tried it out for real?

@bboreham
Copy link
Collaborator

bboreham commented Dec 1, 2020

@joe-elliott ping on questions

@joe-elliott
Copy link
Contributor Author

Honestly, this has kind of fallen off of my radar. The initial need for this no longer something I am no longer involved in. Having said that I do think the rate limited logger has some value perhaps in weaveworks/common if we want to keep it.

The original reason this existed was to catch errors returned from Loki in the cortex-gw when the query frontend crashed. @cyriltovena @owen-d what do you think? is there still a need for this?

@bboreham
Copy link
Collaborator

bboreham commented Mar 1, 2021

Right now I have a high rate of errors like this:

level=warn ts=2021-03-01T11:35:02.01609324Z caller=grpc_logging.go:38 method=/cortex.Ingester/Push duration=2.741131ms err="rpc error: code = Code(429) desc = user=redacted: per-metric series limit (local limit: 300000 global limit: 0 actual local limit: 300000) exceeded for series {redacted}" msg="gRPC\n"

I think this PR as it stands would not help me, because it is focused on one specific http error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants