fix(bpf): Monitor use of splice to avoid kernel bug on fast WAN redirecting #507
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
由于内核 bug 导致 bpf_msg_redirect 和 splice 不能同时使用 (https://github.com/jschwinger233/bpf_msg_redirect_bug_reproducer) , #481 会导致 dae-0.6 和 glider 无法一起工作,因为 glider 会用 splice 来接收 TCP。
(感谢来自社区的 bug report )
这个 PR 来解决上述问题。整体思路是 “建立白名单 fastsock_allowlist_map,对于白名单上的进程允许使用 sockmap + msg_redirect 直连,反之使用传统的内核网络栈”。白名单是一个 bpf map, key 是进程名
task->comm
。一个进程必须完成一次 TCP 会话,在会话结束的时候检查这个进程有没有调用过 splice syscall,如果没有就标记 allow=1,下一次 TCP 会话就可以走快速通道。具体来说,考虑 curl 1.1.1.2:80 被 WAN 劫持给 dae 这个场景:
bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_STATE_CB_FLAG);
, 从而在这次 TCP 会话状态变化的时候会被回调。只有一个进程的第一次 TCP 会话会走这个探测流程,第二次就不会了,因为第二次直接查询名名单,无论 allow/!allow 都有记录。task->comm
相同就行),在 TCP established 的时候检查白名单,发现有记录,如果 allow 就走快速 TCP 通道, !allow 就继续走内核网络栈。如果是 glider 进程走上述流程,会在第一次 TCP 会话结束的时候进行就会发现 splice called,从而在白名单里标记 allow=0,不会走进内核 bug 的旋涡。
Requirements
由于需要探测是否调用 splice syscall,需要使用 syscall tracepoint,所以要求内核有编译配置
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
。(应该不会有太大问题,吧?)这个 PR 本质是内核 bug 的临时 workaround,我已经把 bug 上报给内核社区并引起了注意,等未来修复 bug 并且 backport 到了 LTS (6.6, 6.1, 5.15, 5.10) 之后,我们可以考虑再删除这些临时措施。
Checklist
Full Changelogs
Issue Reference
Closes #[issue number]
Test Result