-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
patch/optimize(bpf): improve wan tcp hijack datapath performance #481
Conversation
这个优化非常令人兴奋,这或许已经是当前 linux 系统下的最优性能方案(代理 wan 的情况下)。通过 socket 重定向直接将路径缩至最短,非常极致的优化! 针对这次优化,是否需要更高版本的内核?如果是,我们或许需要增加一些判断和提示(像之前的代码那样),以及更新一些文档。 |
CI 测过了 5.10 貌似是好的。 dae 目前要求 >=5.8,我自己编译一个 5.8 试试 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brilliant code!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I've tested the changes on my end, all works fine. Thanks for proposing the solution, it indeed optimizes the throughput.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧪 Since the PR has been fully tested, please consider merging it.
编译了 5.8 (妈的这版本 EOL 了我手动改了 objtool/elf.c 才编过,还把我磁盘占满了),不能运行,报错 但是考虑到以后我可能很难测试 5.8,如果可以稍微提高内核要求到 5.10 就更好了,5.10 是一个 LTS 版本,要 31 Dec 2026 才停止支持 ( https://endoflife.date/linux ) ,目前的 CI Kernel-test 也有测它。 |
使用该版本 dae 遇到一个问题。抽象出来应该是这种情况:
此时,在 A 容器中访问 http://172.17.0.1:b/ 理应能够访问到 B 容器的 web 服务,但使用该 PR 的 build,这个请求会无响应。 |
@amtoaer dae 是不是设置了 lan_interface: docker0 |
@jschwinger233 是的,我的配置是:
|
@amtoaer 好 我忘了这个场景了 能处理 |
6f73108
Previous check `if (!bpf_map_lookup_elem(&routing_tuples_map, &rev_tuple))` can also add local LAN connection via docker0, this patches exclude these traffic by checking `!routing_result->pid`.
6f73108
to
c15733d
Compare
@jschwinger233 正常工作了,感谢! |
@jschwinger233 可以的,提高到5.10没问题 |
@jschwinger233 麻烦在相关的代码和文档中将要求提高到 5.10,谢谢 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Background
这个 PR 引入了两个新的 bpf 程序来加速 WAN TCP。
This PR introduces two new BPF programs to accelerate WAN TCP.
总体来说,原本的 WAN TCP 劫持路径的数据平面如下图:
In general, the data plane of the original WAN TCP interception path is as shown in the following diagram:
这个 PR 把上述路径优化为:
This PR optimizes the above path to:
优化成果见 Benchmark。
The optimization results can be seen in the Benchmark.
实现细节
需要联合使用两个 bpf:
routing_tuples_map
来判断一个 socket 是否是 WAN 代理的 socket,如果是的话就用bpf_sock_hash_update
把 socket 加入 sockmap。bpf_msg_redirect_hash
实现 TCP segment 的直接投递。注意 TCP 握手和挥手依然走内核栈,这部分是不加速的,只有建立连接后才可以
Implementation Details
Two BPF programs need to be used in conjunction:
routing_tuples_map
to determine if a socket is a WAN proxy socket. If it is, we usebpf_sock_hash_update
to add the socket to the sockmap.bpf_msg_redirect_hash
to directly deliver TCP segments.Note that TCP handshakes and tear-downs still go through the kernel stack and are not accelerated. Only after the connection is established can acceleration take place.
Benchmark
使用 sockperf 测试 latency
To test latency using sockperf,
dae-0.4.0 结果是
dae-0.4.0 Results
这个 PR 的结果是
Results with this PR
TCP latency 提升 6%
TCP latency is improved by 6%
但 latency 只是性能的一部分,如果是 iperf 跑 tcp rr (round-trip) 在我虚拟机上会直接把内存跑炸
However, latency is just one aspect of performance. If running iperf for TCP round-trip (RR) tests on my virtual machine, it would directly cause excessive memory usage.
在实际场景中,比如 redis-server 和 redis-benchmark 中的表现往往能达到 10%+ 的 p99 提升。
In real-world scenarios, such as in Redis-server and Redis-benchmark, performance improvements of over 10% in p99 latency are often achievable.
Checklist
Full Changelogs
Issue Reference
Closes #[issue number]
Test Result