Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use bpf_sk_assign at tproxy_wan_ingress #383

Merged
merged 8 commits into from
Jan 1, 2024

Conversation

jschwinger233
Copy link
Member

@jschwinger233 jschwinger233 commented Dec 27, 2023

Closes: #375

废弃 tcp_dst_map,废弃 NAT 模式,全面使用 routing_tuple_map,删除兼容代码。

废弃 udp_encap, 部分解决 IPv6 UDP 的问题.

当前状态:

cmd/run.go Outdated Show resolved Hide resolved
@mzz2017
Copy link
Contributor

mzz2017 commented Dec 30, 2023

Thanks for your high-quality PR.

@jschwinger233
Copy link
Member Author

由于 main 还没合并 #386 , 我在自己的 fork 上运行了全套测试,居然全部通过了!

https://github.com/jschwinger233/dae/actions/runs/7369675519/job/20055254128?pr=5

接下来我整理一下就可以正式 review 了,最迟明天,现在要去博德之门启动了!

@mzz2017

This comment was marked as off-topic.

@jschwinger233 jschwinger233 force-pushed the gray/sk_assign branch 2 times, most recently from f016441 to ac3642f Compare January 1, 2024 05:19
@jschwinger233
Copy link
Member Author

Label ready for review. Recommend rebasing main after merging #386 to double check.

@jschwinger233 jschwinger233 marked this pull request as ready for review January 1, 2024 05:26
@jschwinger233 jschwinger233 requested review from a team as code owners January 1, 2024 05:26
@jschwinger233
Copy link
Member Author

IPv6 UDP 现在状态是被内核在 __udp6_lib_rcv kfree_skb_reason(SKB_DROP_REASON_UDP_CSUM),这个很好修,因为 l4_checksum_disable 似乎也不再需要,所以只要

diff --git a/control/kern/tproxy.c b/control/kern/tproxy.c
index 688fd3c..9dc71e4 100644
--- a/control/kern/tproxy.c
+++ b/control/kern/tproxy.c
@@ -1810,8 +1810,6 @@ int tproxy_wan_egress(struct __sk_buff *skb) {
   //   }
   // }
 
-  disable_l4_checksum(skb, link_h_len, l4proto, ihl);
-
   // Redirect from egress to ingress.
   if ((ret = bpf_redirect(skb->ifindex, BPF_F_INGRESS)) == TC_ACT_SHOT) {
     bpf_printk("Shot bpf_redirect: %d", ret);

就能修好。

但是如 PR description 里所述,由于 ci 测试是 assert failure,所以等这个合并后单独提交一个 pr,避免破坏 ci。

建议顺序: #386 -> this pr -> fix IPv6 UDP

As we are going to implement tproxy hijack via bpf_sk_assign, tproxy
response won't reach wan iface at all, unless wan iface == lan iface.

The only remaining "tproxy_response" is the place returning TC_ACT_PIPE
to hand packets over from tproxy_wan_egress to tproxy_lan_egress.

This commit also deletes rev-NAT logic for tproxy response.

This commit tries to make a minimum change, otherwise file diff is
too confusing to reviewers. I'll clean it up in the next patch.
This commit merely removes the `if (false)` branch at:

```
if (false) {
  // Comments
} else {
  ...
}
```

The file diff becomes completely messed up, so I split it into a
separate patch without any functional change.
Note the necessity of separation of `assign_socket_tcp` and
`assign_socket_udp`:

As `struct bpf_sock *` has different verifier types for tcp and udp, the
code below can't pass verifier:

```
static __always_inline int
assign_socket(struct __sk_buff *skb, struct bpf_sock_tuple *tuple, __u32 len,
	      __u8 nexthdr) {
	struct bpf_sock *sk;
	switch (nexthdr) {
	case IPPROTO_TCP:
		sk = bpf_sk_lookup_tcp(skb, tuple, len, BPF_F_CURRENT_NETNS, 0);
	case IPPROTO_UDP:
		sk = bpf_sk_lookup_udp(skb, tuple, len, BPF_F_CURRENT_NETNS, 0);
	}
	if (!sk) {
		return -1;
	}

	int res = bpf_sk_assign(skb, sk, 0);
	bpf_sk_release(sk);
	return res;
}
```
We no longer need tcp_dst_map for NAT. Relevant Golang logic is also
removed.

One thing need to mention is "dst_routing_result" struct. Although
tcp_dst_map is gone, dst_routing_result struct is still in use under
userspace at https://github.com/daeuniverse/dae/blob/cab1e4290967340923d7d5ca52b80f781711c18e/control/udp.go#L69C17-L69C17.
Therefore, this commit remains this struct and make some efforts to
ensure bpf objects are compiled with it.
Previously, wan_egress has to encap UDP packets with routing info, but
it's no more necessary as we are in favor of bpf_sk_assign without NAT.
This is a must-have, otherwise packets being bpf_sk_assigned and routed
to local on wan will be dropped by kernel during fib_lookup:

```
// https://github.com/torvalds/linux/blob/v6.5/net/ipv4/fib_frontend.c#L381
static int __fib_validate_source()
...
	if (res.type != RTN_UNICAST &&
	    (res.type != RTN_LOCAL || !IN_DEV_ACCEPT_LOCAL(idev)))
		goto e_inval;
...
```
Copy link
Contributor

@sumire88 sumire88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff. Thanks for your contribution! Appreciate the efforts.

@mzz2017
Copy link
Contributor

mzz2017 commented Jan 1, 2024

Please feel free to merge. @jschwinger233

@jschwinger233 jschwinger233 merged commit 3a8f2d6 into daeuniverse:main Jan 1, 2024
24 checks passed
@sumire88 sumire88 changed the title Use bpf_sk_assign at tproxy_wan_ingress feat: use bpf_sk_assign at tproxy_wan_ingress Jan 4, 2024
jschwinger233 added a commit to jschwinger233/dae that referenced this pull request Jan 4, 2024
With daeuniverse#383, dae no longer encap
UDP, leaving this decap UDP logic obsolete.

Removal of lan_egress also ends up with several helper functions unused,
this commit deletes them as well.

As for scenario of LAN==WAN, after this commit only wan_egress will be
attached on LAN. Wan_egress therefore will pass all traffic sent from
dae because pid_is_control_plane() gives true, so it looks like no
concerns.
jschwinger233 added a commit to jschwinger233/dae that referenced this pull request Jan 22, 2024
With daeuniverse#383, dae no longer encap
UDP, leaving this decap UDP logic obsolete.

Removal of lan_egress also ends up with several helper functions unused,
this commit deletes them as well.

As for scenario of LAN==WAN, after this commit only wan_egress will be
attached on LAN. Wan_egress therefore will pass all traffic sent from
dae because pid_is_control_plane() gives true, so it looks like no
concerns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Proposal] Hijack traffic using bpf_sk_assign at tproxy_wan_ingress
3 participants