Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: direct packet access #18

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Conversation

jschwinger233
Copy link
Owner

@jschwinger233 jschwinger233 commented Jun 23, 2024

Background

之前使用 bpf_skb_load_bytes 从 skb 读取三四层包头,这个 PR 使用了效率更高的 direct packet access,不再需要把包头读取到 bpf 函数栈,节省了大概 200 条指令(所以 bpf verifier 更高兴了,以后如果要扩展实现也更不容易撞上 verifier),性能也有了微小的提升。

Implementation FAQ

1. 为什么不保留之前的 iph, ipv6h, icmph, tcph, udph,而是使用 l3hdr, l4hdr ?

因为 clang 编译出的字节码很难通过 bpf verifier.

考虑下面的代码:

SEC("tc/ingress")
int tc_ingress(struct __sk_buff *skb)
{
    struct iphdr *ip;
    struct ipv6hdr *ip6;
[...]
    // tag1
    if (eth->h_proto == bpf_htons(ETH_P_IP)) {
        ip = (struct iphdr *)(data + offset);
        [...]
    } else if (ethh->h_proto == bpf_htons(ETH_P_IPV6)) {
        ip6 = (struct ipv6hdr *)(data + offset);
        [...]
    }
[...]
    // tag2
    if (eth->h_proto == bpf_htons(ETH_P_IP)) {
        x = ip->daddr;
    } else if (eth->h_proto == bpf_htons(ETH_P_IPV6)) {
        __builtin_memcpy(&x, &ip6->daddr, 16);
    }
}

clang 会编译出一条分支是从 tag1 ipv4 goto tag2,此时由于 tag1 ipv6 分支不执行,ip6 指针未初始化,clang 在这条分支之后的 ip6 都做了常量优化。但是 bpf verifier 会无脑遍历分支,在检查 tag1 ipv4 + tag2 ipv6 分支的时候,由于 ip6 指针被 clang 优化了,verifier 会报错。

这类问题有几种办法规避,我发现用统一的 l3hdr, l4hdr 抽象头是比较简单的做法。

2. 为什么在 prep_redirect_to_control_plane() 把 bpf_skb_store_bytes() 的调用移动到了最后?

因为 bpf_skb_store_bytes 和 bpf_skb_change_head 可能会改变 skb->data,导致之前解析到的 l3hdr, l4hdr 指针指向错误的位置。移动到最后,就算改变了 skb->data 也不影响。

3. benchmark?

我的简单测试是使用 direct packet access 比 bpf_skb_load_bytes 快一倍: https://github.com/jschwinger233/skb_access_bench , 在我们一些不涉及 route 的 hook 上有明显提升,如 lan_egress 大概有一倍的提升(跑 999999 次的时间从 25.010155ms 下降到 13.489028ms),但是在 wan_egress / lan_ingress 上提升很小,因为他们的瓶颈在 route(),我写完 route() 的 bpf 单测之后再去痛下狠手。

Checklist

Full Changelogs

  • [Implement ...]

Issue Reference

Closes #[issue number]

Test Result

@jschwinger233 jschwinger233 force-pushed the gray/direct-packet-access branch 5 times, most recently from e12d482 to daff2d2 Compare June 24, 2024 07:33
@jschwinger233 jschwinger233 force-pushed the gray/direct-packet-access branch from daff2d2 to 24fe0db Compare June 24, 2024 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant