-
-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: endless matching loop #199
Conversation
This happens for example with the proxy_protocol matcher and handler. If we receive a proxy proto header from a not allowed source the matcher matches, but the handler does not consume any of the prefetched bytes. Since the handler is not terminal we jump back to the beginning of the matching loop and match the same matcher/handler combo again. It never stops because we never call cx.prefetch again. Since `isFirstPrefetch = false` is unreachable in this situation (the statement is at the wrong place). It always must be set false after the first iteration. This bug was caused by the fix for #194
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also switched from an endless loop to a limited for loop in hope future bugs can be prevented by this.
Excellent. I was thinking as I read your post that this would be a good idea. 👍
The 10k iteration limit was chosen by feeling. I have no data on how many prefetch calls are realistically needed in practice.
I'm guessing that's needlessly high, but maybe let's leave it for now, and if things are working well, we can try reducing it.
Thanks so much for the patch. It's quite elegant!
@@ -184,5 +183,8 @@ func (routes RouteList) Compile(logger *zap.Logger, matchingTimeout time.Duratio | |||
} | |||
} | |||
} | |||
|
|||
logger.Error("matching connection", zap.String("remote", cx.RemoteAddr().String()), zap.Error(errors.New("number of prefetch calls exhausted"))) | |||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should return the error here. But this is also probably fine. Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's that way because I wanted to differentiate errors during matching vs errors from the handlers. "matching connection" vs. "handling connection". But I agree it looks bad/wrong.
We could swap the positions. Log "matching connection" in server.go#L168 and "handling connection" in routes.go#L173. Then the HandlerFunc
in Compile
could return errors directly everywhere.
But the logic for only logging with warning level for ErrDeadlineExceeded
should probably be kept in the compiled function. Otherwise it must be duplicated to all other places that call Handle
on the compiled route. Like the listener wrapper or the subroute handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so it's in a function that acts as a handler representing the entire route, which is both matching and handling. That's a good question, I guess, whether we should treat their errors separately or differently. A matcher should only return an error if a decision (true/false) couldn't be reached; that should be rare, and notable, since it indicates a problem with the connection. Maybe we should treat their errors equally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed response. For now, I would leave it as it is. I think the different messages are useful to differentiate errors during the routing/matching phase and caddy actually knowing what to do with the connection. Since errors during matching are likely caused by clients sending garbage, disconnecting early or configuration problems. But errors during handing likely indicate real problems for real/valid users.
The arbitray loop limit is the wrong approach to the bug, see here. The reason your example blocks is that the wierd holder loops you set, When provisioning, you use the This if is not executed is that the buffer is not fully comsumed in the first place, proxy protocol matcher only needs a few data to determine the type, and in the handler these data is not used at all. |
The limit is not the fix for this bug. The |
In your example, this connection is handled by the handler, the handler just didn't do anything about it (no op). What I'm trying to say is that your approach doesn't handle this type of case. |
The config in the initial comment does not make much sense in general. Since the proxy_protocol handler is not a terminal handler. It was just a minimal reproducer. @mholt Can we please merge this PR. Like I said in the initial comment this is just a fix for a rather critical bug I accidentally introduced while fixing #194. This bug was in master far too long already and it has nothing to do with the changed behavior of the listener wrapper (let's discuss this in #207). |
@ydylla Absolutely, sorry I lost track of things with summertime... will merge :) And we can iterate on this to improve it if needed, no problem. Thanks for your dedication to this!! It's super awesome! (Also I appreciate your feedback @WeidiDeng ) |
But you didn't fix the underlying problem. Adding an iteration counter is a hack. The proxy protocol handler is called in the example but you don't acknowledge the fact and still retry the match loop. That's the underlying problem. |
Sorry I can't follow you. Can you please provide a config/example to reproduce the problem you still see with this? |
@mholt Sadly it is not over yet. Now I also saw a runaway caddy with max cpu load and normal traffic. But timeline wise it is impossible that it is the same as #193. Since I could trace back this one to a bug in the fix for #194.
It happens for example with the proxy_protocol matcher and handler. If we receive a proxy proto header from a not allowed source the matcher matches, but the handler does not consume any of the prefetched bytes. Since the handler is not terminal we jump back to the beginning of the matching loop and match the same matcher/handler combo again. It never stops because we never call
cx.prefetch
again. SinceisFirstPrefetch = false
is unreachable in this situation (the statement is at the wrong place). It always must be set false after the first iteration.You can reproduce it with this config:
and
echo "PROXY TCP4 255.255.255.255 255.255.255.255 65535 65535" | nc 127.0.0.1 8080
.The connections opened by this command will never finish and will fully utilize one core.
I also switched from an endless loop to a limited for loop in hope future bugs can be prevented by this. The 10k iteration limit was chosen by feeling. I have no data on how many prefetch calls are realistically needed in practice. If you don't like it we can also keep the endless for loop.