fix: endless matching loop #199

ydylla · 2024-06-06T22:12:44Z

@mholt Sadly it is not over yet. Now I also saw a runaway caddy with max cpu load and normal traffic. But timeline wise it is impossible that it is the same as #193. Since I could trace back this one to a bug in the fix for #194.

It happens for example with the proxy_protocol matcher and handler. If we receive a proxy proto header from a not allowed source the matcher matches, but the handler does not consume any of the prefetched bytes. Since the handler is not terminal we jump back to the beginning of the matching loop and match the same matcher/handler combo again. It never stops because we never call cx.prefetch again. Since isFirstPrefetch = false is unreachable in this situation (the statement is at the wrong place). It always must be set false after the first iteration.

You can reproduce it with this config:

{
  "admin": {
    "disabled": true
  },
  "logging": {
    "logs": {
      "default": {"level":"DEBUG", "encoder": {"format":"console"}}
    }
  },
  "apps": {
    "layer4": {
      "servers": {
        "loop": {
          "listen": ["0.0.0.0:8080"],
          "routes": [
            {
              "match": [
                {"proxy_protocol": {}}
              ],
              "handle": [
                {
                  "handler": "proxy_protocol",
                  "allow": ["1.1.1.1/32"]
                }
              ]
            }
          ]
        }
      }
    }
  }
}

and echo "PROXY TCP4 255.255.255.255 255.255.255.255 65535 65535" | nc 127.0.0.1 8080.
The connections opened by this command will never finish and will fully utilize one core.

I also switched from an endless loop to a limited for loop in hope future bugs can be prevented by this. The 10k iteration limit was chosen by feeling. I have no data on how many prefetch calls are realistically needed in practice. If you don't like it we can also keep the endless for loop.

This happens for example with the proxy_protocol matcher and handler. If we receive a proxy proto header from a not allowed source the matcher matches, but the handler does not consume any of the prefetched bytes. Since the handler is not terminal we jump back to the beginning of the matching loop and match the same matcher/handler combo again. It never stops because we never call cx.prefetch again. Since `isFirstPrefetch = false` is unreachable in this situation (the statement is at the wrong place). It always must be set false after the first iteration. This bug was caused by the fix for #194

mholt

I also switched from an endless loop to a limited for loop in hope future bugs can be prevented by this.

Excellent. I was thinking as I read your post that this would be a good idea. 👍

The 10k iteration limit was chosen by feeling. I have no data on how many prefetch calls are realistically needed in practice.

I'm guessing that's needlessly high, but maybe let's leave it for now, and if things are working well, we can try reducing it.

Thanks so much for the patch. It's quite elegant!

layer4/routes.go

mholt · 2024-06-07T04:12:47Z

layer4/routes.go

@@ -184,5 +183,8 @@ func (routes RouteList) Compile(logger *zap.Logger, matchingTimeout time.Duratio
 				}
 			}
 		}
+
+		logger.Error("matching connection", zap.String("remote", cx.RemoteAddr().String()), zap.Error(errors.New("number of prefetch calls exhausted")))
+		return nil


I wonder if we should return the error here. But this is also probably fine. Any thoughts?

It's that way because I wanted to differentiate errors during matching vs errors from the handlers. "matching connection" vs. "handling connection". But I agree it looks bad/wrong.
We could swap the positions. Log "matching connection" in server.go#L168 and "handling connection" in routes.go#L173. Then the HandlerFunc in Compile could return errors directly everywhere.
But the logic for only logging with warning level for ErrDeadlineExceeded should probably be kept in the compiled function. Otherwise it must be duplicated to all other places that call Handle on the compiled route. Like the listener wrapper or the subroute handler.

Yeah, so it's in a function that acts as a handler representing the entire route, which is both matching and handling. That's a good question, I guess, whether we should treat their errors separately or differently. A matcher should only return an error if a decision (true/false) couldn't be reached; that should be rare, and notable, since it indicates a problem with the connection. Maybe we should treat their errors equally?

Sorry for the delayed response. For now, I would leave it as it is. I think the different messages are useful to differentiate errors during the routing/matching phase and caddy actually knowing what to do with the connection. Since errors during matching are likely caused by clients sending garbage, disconnecting early or configuration problems. But errors during handing likely indicate real problems for real/valid users.

WeidiDeng · 2024-06-27T07:28:37Z

The arbitray loop limit is the wrong approach to the bug, see here.

The reason your example blocks is that the wierd holder loops you set, When provisioning, you use the nopNextHandler which just handles the connection to next provided one, which is this thing here, then it just loops and loops and no way to progress. Unlike nopHandler, this type handles the connection in a hacky way. Idealy we should use the correct fallback handler in this case.

This if is not executed is that the buffer is not fully comsumed in the first place, proxy protocol matcher only needs a few data to determine the type, and in the handler these data is not used at all.

ydylla · 2024-06-27T18:33:15Z

The arbitray loop limit is the wrong approach to the bug

The limit is not the fix for this bug. The isFirstPrefetch = false was at the wrong place. A fix would also be possible by just moving the line. The use of a limited for loop was just to prevent future bugs similar to this one (and it looks nicer without the bool flag).

WeidiDeng · 2024-06-28T00:03:38Z

In your example, this connection is handled by the handler, the handler just didn't do anything about it (no op). What I'm trying to say is that your approach doesn't handle this type of case. ~~Did the matcher return false, nil because it requires more data or is the match failed by that point?~~ The second pass is unnecessary because it's handled by the handler but your code couldn't distinguish that case.

ydylla · 2024-06-28T16:26:50Z

In your example, this connection is handled by the handler, the handler just didn't do anything about it (no op)

The config in the initial comment does not make much sense in general. Since the proxy_protocol handler is not a terminal handler. It was just a minimal reproducer.

@mholt Can we please merge this PR. Like I said in the initial comment this is just a fix for a rather critical bug I accidentally introduced while fixing #194. This bug was in master far too long already and it has nothing to do with the changed behavior of the listener wrapper (let's discuss this in #207).

mholt · 2024-06-28T16:36:01Z

@ydylla Absolutely, sorry I lost track of things with summertime... will merge :) And we can iterate on this to improve it if needed, no problem. Thanks for your dedication to this!! It's super awesome! (Also I appreciate your feedback @WeidiDeng )

WeidiDeng · 2024-06-28T18:31:24Z

The config in the initial comment does not make much sense in general. Since the proxy_protocol handler is not a terminal handler. It was just a minimal reproducer.

In your example, this connection is handled by the handler, the handler just didn't do anything about it (no op)

The config in the initial comment does not make much sense in general. Since the proxy_protocol handler is not a terminal handler. It was just a minimal reproducer.

But you didn't fix the underlying problem. Adding an iteration counter is a hack. The proxy protocol handler is called in the example but you don't acknowledge the fact and still retry the match loop. That's the underlying problem.

ydylla · 2024-06-28T22:36:58Z

But you didn't fix the underlying problem. Adding an iteration counter is a hack

Sorry I can't follow you. Can you please provide a config/example to reproduce the problem you still see with this?

ydylla requested a review from mholt June 6, 2024 22:13

mholt approved these changes Jun 7, 2024

View reviewed changes

layer4/routes.go Show resolved Hide resolved

mholt reviewed Jun 7, 2024

View reviewed changes

mholt mentioned this pull request Jun 10, 2024

High CPU usage #193

Open

WeidiDeng mentioned this pull request Jun 27, 2024

Non blocking matchers & matching timeout breaking l4 as a listener wrapper #207

Open

WeidiDeng mentioned this pull request Jun 28, 2024

remove unnecessary routes matching and restore old fallback next hand… #208

Merged

mholt approved these changes Jun 28, 2024

View reviewed changes

mholt merged commit ca3e2f3 into master Jun 28, 2024
6 checks passed

mholt deleted the fix-endless-loop branch June 28, 2024 16:36

ydylla mentioned this pull request Jun 30, 2024

Collection of interesting configs #209

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: endless matching loop #199

fix: endless matching loop #199

ydylla commented Jun 6, 2024

mholt left a comment

mholt Jun 7, 2024

ydylla Jun 7, 2024

mholt Jun 7, 2024

ydylla Jun 24, 2024

WeidiDeng commented Jun 27, 2024

ydylla commented Jun 27, 2024

WeidiDeng commented Jun 28, 2024 •

edited

Loading

ydylla commented Jun 28, 2024

mholt commented Jun 28, 2024

WeidiDeng commented Jun 28, 2024

ydylla commented Jun 28, 2024

fix: endless matching loop #199

fix: endless matching loop #199

Conversation

ydylla commented Jun 6, 2024

mholt left a comment

Choose a reason for hiding this comment

mholt Jun 7, 2024

Choose a reason for hiding this comment

ydylla Jun 7, 2024

Choose a reason for hiding this comment

mholt Jun 7, 2024

Choose a reason for hiding this comment

ydylla Jun 24, 2024

Choose a reason for hiding this comment

WeidiDeng commented Jun 27, 2024

ydylla commented Jun 27, 2024

WeidiDeng commented Jun 28, 2024 • edited Loading

ydylla commented Jun 28, 2024

mholt commented Jun 28, 2024

WeidiDeng commented Jun 28, 2024

ydylla commented Jun 28, 2024

WeidiDeng commented Jun 28, 2024 •

edited

Loading