Non blocking matchers & matching timeout #72

ydylla · 2022-09-14T20:58:53Z

Hi,
this is my best attempt to solve the blocking matchers problem when clients send not enough data (also discussed in #68).
In the end it is a full rewrite of the layer4 connection. It now has a prefetch function which tries to load all data a client sends during connection setup. It does this in chunks of 1024 bytes up to 8 KiB and stops on the first short read.
During matching a ErrConsumedAllPrefetchedBytes is returned if a matcher requests more data than currently available. If the routing code see this error it is ignored and the next matcher is tried.

The only matcher that does not play well with this is the http matcher because http.ReadRequest forces us to use a bufio.Reader. That's why I also added a MatchingBytes function which allows matcher to get a view of all available bytes. The http matcher uses this to pre check if the data looks like http before calling http.ReadRequest and also to configure the buffer size for the bufio.Reader so it does not produce the ErrConsumedAllPrefetchedBytes error.
It's a bit hacky and can probably be improved.

The matching timeout is implemented by setting SetReadDeadline before each matcher and can be configured per route.

There are still many things to do, this is just so you can take a peek.

I am not sure if the nested timeout config is necessary, Maybe move it up and rename it to matcher_timeout?
Should the max prefetch/matching byte size be configurable?
Fix the tests. net.Pipe behaves very differently from a real connection, it does not return short reads and so the prefetching breaks.
And probably write some more tests 😄

mholt · 2022-09-19T20:38:03Z

Can't wait to review this!

Please hold tight while I get Caddy 2.6 released. Should be this week if all goes well.

mholt · 2022-10-04T17:51:40Z

Ok, now I'm working through my backlog from the 2.6 release, so, hang tight 😁

ydylla · 2022-11-08T22:31:14Z

Just played a bit more with this. The tests are now fixed and simple configs should already work.
Sadly http2 matching is broken. The first prefetch only receives the http1 upgrade request but no http2 frames which it needs for matching (tested with curl, browsers may behave different).
I think it is fixable but it gets uglier and uglier 😢

mholt · 2022-11-10T16:35:36Z

@ydylla Thanks for working on this! I'm still backlogged and have been taking a little time for my mental health lately but I will be back to this as soon as I can :)

I sympathize with the complexity of this... I hope we can eventually solve these in an efficient way. Really appreciate your contributions 💯

mholt

I finally had a chance to look at this! This is really impressive, and overall I think I like where this is going. Bear with me as I have some questions though, as I haven't looked at this code in quite a while! So don't mind my questions -- they are not criticisms -- I am just trying to understand this as well as you do.

It now has a prefetch function which tries to load all data a client sends during connection setup. It does this in chunks of 1024 bytes up to 8 KiB and stops on the first short read.

What's the advantage of this over simply reading bytes as the matchers need them (as we currently do)? ... [several minutes later] ... thinking on it, is it because we might as well read all the bytes (with a cap) the client sends to establish the connection, since reading all the bytes is what the server has to do anyways? Might as well do it all at once, I guess?

During matching a ErrConsumedAllPrefetchedBytes is returned if a matcher requests more data than currently available. If the routing code see this error it is ignored and the next matcher is tried.

Ok, I think I get this part. The client only sends so many bytes at the beginning of a connection (within a deadline) and those are what the matcher has to work with, period.

The only matcher that does not play well with this is the http matcher because http.ReadRequest forces us to use a bufio.Reader.

It's a bit hacky and can probably be improved.

That does sound a bit complicated. Let's collaborate on this and see if we can come up with something simpler.

The matching timeout is implemented by setting SetReadDeadline before each matcher and can be configured per route.

I wonder if this should be set between each Read()? I think usually it is conventional to enforce read timeouts by extending the deadline after each read.

I am not sure if the nested timeout config is necessary, Maybe move it up and rename it to matcher_timeout?

I like where it is for now, let's see how people use it.

Should the max prefetch/matching byte size be configurable?

Almost certainly, yes. Not a showstopper but I think that will be a good idea.

Thanks @ydylla -- hopefully you haven't given up on me yet with how long I'm taking 🙃

layer4/connection.go

ydylla · 2022-11-21T23:32:40Z

Thanks for the feedback, I will try to answer your questions.

What's the advantage of this over simply reading bytes as the matchers need them

The main reason for the chunking is the detection of short reads. If we ask Read for x bytes but it returns less than x bytes we can be pretty sure that the next Read call will block. This would allow us to stop reading without the need of a timeout (ReadDeadline) which would always delay the connection matching (even if it's only for milliseconds).
At least I thought that this is always the case until I noticed it is not quite true for http2 for example. In reality it also seems to depend on how the client sends the data. Curl for example seems to send the http2 upgrade request first and after that the http2 frames. Instead of both together with the same "flush". This means we get a short read but the next read could be a full read again.
Also yes the server has to read the bytes anyway.

Regarding http.ReadRequest and bufio.Reader. Yes a way to parse http request without the need of bufio would be nice. We already have all bytes in memory so it's unnecessary.
But right now I really do not see a way, maybe with another library.

I wonder if this should be set between each Read()?

The current timeout was only intended as a total matching timeout, not a read timeout. If no route could be matched in x seconds it's very likely not a legitimate client, so the server should close the connection. After a route is selected Read's are allowed to take longer (indefinitely) like before. But I agree a general read timeout would also be a good idea. Then the deadline has to be refreshed after each Read.

Regarding the max prefetch size. I was mostly unsure how to access config that is on server level from the connection.
Either the config value has to be duplicated into the connection or connections need a reference to their server.

mholt · 2022-12-05T18:42:58Z

This is looking good I think. Did you mean to leave it in draft state?

ydylla · 2022-12-05T23:26:58Z

This is looking good I think. Did you mean to leave it in draft state?

Thanks. Yes this is still a draft because http2 matching is not working in all cases. It depends on how the client sends the data. See my last comment:

I noticed it is not quite true for http2 for example. In reality it also seems to depend on how the client sends the data. Curl for example seems to send the http2 upgrade request first and after that the http2 frames. Instead of both together with the same "flush". This means we get a short read but the next read could be a full read again.

I also did not find the time to test it with real web browser traffic, which I planned to do before undrafting it.

mholt · 2022-12-06T17:05:18Z

Gotcha. That is tricky. I will let you know if I have any ideas!

mholt · 2023-12-13T15:27:45Z

@ydylla Any interest in finishing this up? Let me know if I can help.

mholt · 2024-05-02T16:41:48Z

Hi @ydylla -- I might actually merge this sooner rather than later, and see if we can figure out the HTTP/2 part in a separate PR, or if you want we can finish up the HTTP/2 tweaks in this one. Up to you; I know you're very busy. (I am too!) If I don't hear from you next time I circle back to this I'll probably just go forward with the merge. 😃

ydylla · 2024-05-03T16:32:06Z

@mholt Sorry apparently I forgot to answer to your previous message 😅
I don't think this can be merged like it is. It may look good but will probably not work in many cases. I think a better approach would be a global matching timeout (not per route like now) and a loop that keeps trying all routes/matchers until the timeout is reached. So the flow would be:

prefetch until first short read
try matchers only on prefetched data
if nothing matched prefetch again, which will either add more data (to the next short read) or block until the timeout is reached

I will experiment with this within the next couple of days and report back to you.

ydylla · 2024-05-06T21:10:20Z

@mholt I completely rewrote it and opened a new PR, see #192

ydylla mentioned this pull request Sep 14, 2022

Add socks5 support #68

Merged

mholt mentioned this pull request Oct 4, 2022

l4xmpp: Add support for matching XMPP connections, match TLS-ALPN #33

Merged

ydylla force-pushed the prefetching branch from 6279cb7 to 898fd29 Compare November 8, 2022 22:21

mholt reviewed Nov 18, 2022

View reviewed changes

layer4/connection.go Show resolved Hide resolved

ydylla mentioned this pull request Nov 21, 2022

Fixes and Logging #71

Merged

ydylla added 5 commits December 1, 2022 19:43

feat: add matching timeout

2af63a9

test: add test for matching timeout

81a3bee

feat: matchers do not block when there is not enough data

6135fe1

test: adapt tests to changed WrapConnection

3fae9e6

fix: workaround buffered reader in http matcher

0f0cea0

ydylla force-pushed the prefetching branch from 898fd29 to 0f0cea0 Compare December 1, 2022 18:44

mholt mentioned this pull request Aug 8, 2023

Not able to connect to database #136

Closed

ydylla mentioned this pull request May 6, 2024

Non blocking matchers & matching timeout #192

Merged

ydylla closed this May 6, 2024

mholt deleted the prefetching branch May 8, 2024 22:42

This was referenced Jun 27, 2024

Non blocking matchers & matching timeout breaking l4 as a listener wrapper #207

Open

Collection of interesting configs #209

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non blocking matchers & matching timeout #72

Non blocking matchers & matching timeout #72

ydylla commented Sep 14, 2022

mholt commented Sep 19, 2022

mholt commented Oct 4, 2022 •

edited

Loading

ydylla commented Nov 8, 2022

mholt commented Nov 10, 2022

mholt left a comment

ydylla commented Nov 21, 2022

mholt commented Dec 5, 2022

ydylla commented Dec 5, 2022

mholt commented Dec 6, 2022

mholt commented Dec 13, 2023

mholt commented May 2, 2024

ydylla commented May 3, 2024 •

edited

Loading

ydylla commented May 6, 2024

Non blocking matchers & matching timeout #72

Non blocking matchers & matching timeout #72

Conversation

ydylla commented Sep 14, 2022

mholt commented Sep 19, 2022

mholt commented Oct 4, 2022 • edited Loading

ydylla commented Nov 8, 2022

mholt commented Nov 10, 2022

mholt left a comment

Choose a reason for hiding this comment

ydylla commented Nov 21, 2022

mholt commented Dec 5, 2022

ydylla commented Dec 5, 2022

mholt commented Dec 6, 2022

mholt commented Dec 13, 2023

mholt commented May 2, 2024

ydylla commented May 3, 2024 • edited Loading

ydylla commented May 6, 2024

mholt commented Oct 4, 2022 •

edited

Loading

ydylla commented May 3, 2024 •

edited

Loading