Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Workaround UDP port conflicts when another local process binds 53 #414

Merged

Conversation

jschwinger233
Copy link
Member

@jschwinger233 jschwinger233 commented Jan 5, 2024

Background

Setup an independent netns for transparent UDP socket if necessary (EADDRINUSE occurred).

Checklist

Full Changelogs

  • [Implement ...]

Issue Reference

Test Result

@jschwinger233 jschwinger233 changed the title fix: Workaround UDP port conflicts when another local process binds :53 fix: Workaround UDP port conflicts when another local process binds 53 Jan 5, 2024
@jschwinger233 jschwinger233 force-pushed the gray/fix-udp-port-conflict branch 3 times, most recently from 282c5f9 to 5e71385 Compare January 5, 2024 12:36
@jschwinger233 jschwinger233 marked this pull request as ready for review January 5, 2024 12:40
@jschwinger233 jschwinger233 requested a review from a team as a code owner January 5, 2024 12:40
@jschwinger233 jschwinger233 requested a review from mzz2017 January 5, 2024 12:40
control/netns_utils.go Outdated Show resolved Hide resolved
@mzz2017
Copy link
Contributor

mzz2017 commented Jan 5, 2024

Quite concise and powerful. Thanks for your contribution!

@jschwinger233 jschwinger233 force-pushed the gray/fix-udp-port-conflict branch from 5e71385 to 98a13a8 Compare January 5, 2024 16:11
@jschwinger233 jschwinger233 requested a review from mzz2017 January 5, 2024 16:13
@jschwinger233 jschwinger233 force-pushed the gray/fix-udp-port-conflict branch from 2ed8edd to a9fc8f9 Compare January 5, 2024 17:20
@jschwinger233
Copy link
Member Author

jschwinger233 commented Jan 6, 2024

Update:

  1. IPv6 support added, not so hard as I thought
  2. Errors returned from setupNetns have more details to help triage
  3. Checked more sysctl parameters to make sure ARP works on veth

control/netns_utils.go Outdated Show resolved Hide resolved
@jschwinger233 jschwinger233 force-pushed the gray/fix-udp-port-conflict branch 2 times, most recently from ee9bb59 to 75e9e12 Compare January 8, 2024 07:36
@sumire88 sumire88 requested a review from a team January 8, 2024 14:02
@jschwinger233
Copy link
Member Author

我加了了 link monitor,在 dae0 lladdr 变化的时候更新 neigh。

此外把 netns 用 struct 重写了,把超大函数切分成若干语义更清晰的小方法。

不过 setup netns 出错的时候我直接 fatal 了,因为测试了一下发现我不知道怎么简单正确地实现 “有错时重试的once.Do()”。。。

@mzz2017
Copy link
Contributor

mzz2017 commented Jan 10, 2024

@jschwinger233 直接用 mutex 比较好,这样写起来比较容易

dae-prow[bot]
dae-prow bot previously approved these changes Jan 10, 2024
Copy link
Contributor

@dae-prow dae-prow bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧪 Since the PR has been fully tested, please consider merging it.

@sumire88
Copy link
Contributor

@daeuniverse/qa please give it a shot. Afterward, we may proceed to merge this PR.

@jschwinger233
Copy link
Member Author

该PR在使用后会出现和PR #411 相同的情况,报错日志略有差异

日志

time="Jan 11 00:53:14" level=info msg="192.168.1.248:50421 <-> 8.8.8.8:53" _qname=mobile.events.data.microsoft.com. dialer="LAX_Pro" dscp=0 mac="30:9c:23:d4:60:59" network="udp4(DNS)" outbound=proxy pid=0 pname= policy=min_moving_avg qtype=A
time="Jan 11 00:53:14" level=warning msg="handlePkt: failed to setup dae netns: failed to add veth pair: operation not supported"
time="Jan 11 00:53:14" level=warning msg="handlePkt: failed to write cached DNS resp: failed to setup dae netns: failed to add veth pair: operation not supported"
time="Jan 11 00:53:14" level=warning msg="handlePkt: failed to write cached DNS resp: failed to setup dae netns: failed to add veth pair: operation not supported"
time="Jan 11 00:53:15" level=warning msg="handlePkt: failed to setup dae netns: failed to add veth pair: operation not supported"
time="Jan 11 00:53:15" level=warning msg="handlePkt: failed to write cached DNS resp: failed to setup dae netns: failed to add veth pair: operation not supported"
time="Jan 11 00:53:15" level=warning msg="handlePkt: failed to write cached DNS resp: failed to setup dae netns: failed to add veth pair: operation not supported"

环境

* **OS**:
NAME="OpenWrt"
VERSION="23.05.2"
ID="openwrt"
ID_LIKE="lede openwrt"
PRETTY_NAME="OpenWrt 23.05.2"
VERSION_ID="23.05.2"
HOME_URL="https://openwrt.org/"
BUG_URL="https://bugs.openwrt.org/"
SUPPORT_URL="https://forum.openwrt.org/"
BUILD_ID="r23630-842932a63d"
OPENWRT_BOARD="x86/64"
OPENWRT_ARCH="x86_64"
OPENWRT_TAINTS=""
OPENWRT_DEVICE_MANUFACTURER="OpenWrt"
OPENWRT_DEVICE_MANUFACTURER_URL="https://openwrt.org/"
OPENWRT_DEVICE_PRODUCT="Generic"
OPENWRT_DEVICE_REVISION="v0"
OPENWRT_RELEASE="OpenWrt 23.05.2 r23630-842932a63d"
* **Kernel**:
Linux OpenWrt 5.15.137 #0 SMP Tue Nov 14 13:38:11 2023 x86_64 GNU/Linux
* **Others**:
  `dae`安装在路由器(`OpenWrt`)上面, 路由器桥接到光猫上通过`PPPoE`拨号上网, 由 issue [关于代理路由器本机的疑问 #75](https://github.com/daeuniverse/dae/issues/75) 可知, `dae`无法代理该设备, 该设备发出的`DNS`请求也无法被`dae`劫持, 如果使用PR [chore: Remove dead code #411](https://github.com/daeuniverse/dae/pull/411) 中我提到的解决办法(`通过修改dnsmasq监听端口为非53端口, 路由器本机的所有DNS请求在既没有dae劫持也没有dnsmasq劫持的情况下会直接失败`)

你的内核不支持 veth: containers/podman#12246

@umlka
Copy link

umlka commented Jan 10, 2024

该PR在使用后会出现和PR #411 相同的情况,报错日志略有差异

日志

time="Jan 11 00:53:14" level=info msg="192.168.1.248:50421 <-> 8.8.8.8:53" _qname=mobile.events.data.microsoft.com. dialer="LAX_Pro" dscp=0 mac="30:9c:23:d4:60:59" network="udp4(DNS)" outbound=proxy pid=0 pname= policy=min_moving_avg qtype=A
time="Jan 11 00:53:14" level=warning msg="handlePkt: failed to setup dae netns: failed to add veth pair: operation not supported"
time="Jan 11 00:53:14" level=warning msg="handlePkt: failed to write cached DNS resp: failed to setup dae netns: failed to add veth pair: operation not supported"
time="Jan 11 00:53:14" level=warning msg="handlePkt: failed to write cached DNS resp: failed to setup dae netns: failed to add veth pair: operation not supported"
time="Jan 11 00:53:15" level=warning msg="handlePkt: failed to setup dae netns: failed to add veth pair: operation not supported"
time="Jan 11 00:53:15" level=warning msg="handlePkt: failed to write cached DNS resp: failed to setup dae netns: failed to add veth pair: operation not supported"
time="Jan 11 00:53:15" level=warning msg="handlePkt: failed to write cached DNS resp: failed to setup dae netns: failed to add veth pair: operation not supported"

环境

* **OS**:
NAME="OpenWrt"
VERSION="23.05.2"
ID="openwrt"
ID_LIKE="lede openwrt"
PRETTY_NAME="OpenWrt 23.05.2"
VERSION_ID="23.05.2"
HOME_URL="https://openwrt.org/"
BUG_URL="https://bugs.openwrt.org/"
SUPPORT_URL="https://forum.openwrt.org/"
BUILD_ID="r23630-842932a63d"
OPENWRT_BOARD="x86/64"
OPENWRT_ARCH="x86_64"
OPENWRT_TAINTS=""
OPENWRT_DEVICE_MANUFACTURER="OpenWrt"
OPENWRT_DEVICE_MANUFACTURER_URL="https://openwrt.org/"
OPENWRT_DEVICE_PRODUCT="Generic"
OPENWRT_DEVICE_REVISION="v0"
OPENWRT_RELEASE="OpenWrt 23.05.2 r23630-842932a63d"
* **Kernel**:
Linux OpenWrt 5.15.137 #0 SMP Tue Nov 14 13:38:11 2023 x86_64 GNU/Linux
* **Others**:
  `dae`安装在路由器(`OpenWrt`)上面, 路由器桥接到光猫上通过`PPPoE`拨号上网, 由 issue [关于代理路由器本机的疑问 #75](https://github.com/daeuniverse/dae/issues/75) 可知, `dae`无法代理该设备, 该设备发出的`DNS`请求也无法被`dae`劫持, 如果使用PR [chore: Remove dead code #411](https://github.com/daeuniverse/dae/pull/411) 中我提到的解决办法(`通过修改dnsmasq监听端口为非53端口, 路由器本机的所有DNS请求在既没有dae劫持也没有dnsmasq劫持的情况下会直接失败`)

你的内核不支持 veth: containers/podman#12246

当把dnsmasq的监听端口更改为非53端口后就没有veth相关报错了,并且局域网下客户端可以正常被代理,日志中也能看到dae正常劫持并返回了局域网客户端的DNS请求

更改dnsmasq监听端口后的dae日志

time="Jan 11 03:55:26" level=info msg="192.168.1.248:63326 <-> 8.8.8.8:53" _qname=upload.wikimedia.org. dialer="LAX_Pro" dscp=0 mac="局域网客户端MAC地址" network="tcp4(DNS)" outbound=proxy pid=0 pname= policy=min_moving_avg qtype=AAAA
time="Jan 11 03:55:27" level=info msg="192.168.1.248:52409 <-> 8.8.8.8:53" _qname=upload.wikimedia.org. dialer="LAX_Pro" dscp=0 mac="局域网客户端MAC地址" network="tcp4(DNS)" outbound=proxy pid=0 pname= policy=min_moving_avg qtype=A
time="Jan 11 03:55:27" level=info msg="局域网客户端IPv6地址:51888 <-> upload.wikimedia.org:443" dialer="LAX_Pro" dscp=0 ip="[2620:0:861:ed1a::2:b]:443" mac="局域网客户端MAC地址" network=tcp6 outbound=proxy pid=0 pname= policy=min_moving_avg sniffed=upload.wikimedia.org
time="Jan 11 03:55:27" level=info msg="局域网客户端IPv6地址:51890 <-> upload.wikimedia.org:443" dialer="LAX_Pro" dscp=0 ip="[2620:0:861:ed1a::2:b]:443" mac="局域网客户端MAC地址" network=tcp6 outbound=proxy pid=0 pname= policy=min_moving_avg sniffed=upload.wikimedia.org
time="Jan 11 03:55:27" level=info msg="局域网客户端IPv6地址:51889 <-> upload.wikimedia.org:443" dialer="LAX_Pro" dscp=0 ip="[2620:0:861:ed1a::2:b]:443" mac="局域网客户端MAC地址" network=tcp6 outbound=proxy pid=0 pname= policy=min_moving_avg sniffed=upload.wikimedia.org
time="Jan 11 03:55:27" level=info msg="局域网客户端IPv6地址:51891 <-> upload.wikimedia.org:443" dialer="LAX_Pro" dscp=0 ip="[2620:0:861:ed1a::2:b]:443" mac="局域网客户端MAC地址" network=tcp6 outbound=proxy pid=0 pname= policy=min_moving_avg sniffed=upload.wikimedia.org
time="Jan 11 03:55:27" level=info msg="局域网客户端IPv6地址:51892 <-> upload.wikimedia.org:443" dialer="LAX_Pro" dscp=0 ip="[2620:0:861:ed1a::2:b]:443" mac="局域网客户端MAC地址" network=tcp6 outbound=proxy pid=0 pname= policy=min_moving_avg sniffed=upload.wikimedia.org

@mzz2017
Copy link
Contributor

mzz2017 commented Jan 11, 2024

@jschwinger233 需要什么内核参数吗,我们可以加到报错信息里,以及在 docs 中我们有位置可以添加提示

@jschwinger233
Copy link
Member Author

@jschwinger233 需要什么内核参数吗,我们可以加到报错信息里,以及在 docs 中我们有位置可以添加提示

看5.15源码应该是只要 CONFIG_VETH=y 就会载入载入了 veth.ko: lsmod | grep veth 看看有没有输出

@jschwinger233
Copy link
Member Author

openwrt 官网有一篇容器的文章:https://openwrt.org/docs/guide-user/virtualization/docker_host?s[]=veth#create_veth_pair_for_container
里面有个 opkg install kmod-veth uxc procd-ujail procd-ujail-console ,说不定只要第一个就可以用 veth 了

@umlka
Copy link

umlka commented Jan 11, 2024

openwrt 官网有一篇容器的文章:https://openwrt.org/docs/guide-user/virtualization/docker_host?s[]=veth#create_veth_pair_for_container 里面有个 opkg install kmod-veth uxc procd-ujail procd-ujail-console ,说不定只要第一个就可以用 veth 了

按照群里404大佬的建议安装了kmod-veth确实没报错了,大佬幸苦了

@pchpub
Copy link

pchpub commented Jan 11, 2024

该PR在使用后不再出现端口冲突,但是dns请求不正常,换回0.4.0正常,dae请求完后dns响应无法到达发起dns响应的设备
(dae 仅绑定到lan口)

环境

Linux debian12net 6.1.0-16-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.67-1 (2023-12-12) x86_64

日志

https://appp.me/KjUSaf
(日志太大所以用了Pastebin)

相关配置

nftables

table inet filter {
        chain input {
                type filter hook input priority filter; policy accept;
                iifname "ppp0" tcp dport { 22, 80, 443, 2701-2714, 9090, 10000, 10808-10809, 12345 } drop
                iifname "ppp0" udp dport 53 drop
        }

        chain forward {
                type filter hook forward priority filter; policy accept;
        }

        chain output {
                type filter hook output priority filter; policy accept;
        }
}
table ip6 filter {
        chain input {
                type filter hook input priority filter; policy accept;
        }

        chain output {
                type filter hook output priority filter; policy accept;
        }
}
table ip ipv4-nat {
        chain postrouting {
                type nat hook postrouting priority srcnat; policy accept;
                oifname { "wan", "ppp0" } masquerade
        }

        chain prerouting {
                type nat hook prerouting priority dstnat; policy accept;
                tcp dport 80 ip daddr 192.168.2.2 dnat to 192.168.168.185:8096
                tcp dport 80 ip daddr 192.168.2.3 dnat to 192.168.168.185:7892
                tcp dport 80 ip daddr 192.168.2.4 dnat to 192.168.168.185:8989
                tcp dport 80 ip daddr 192.168.2.5 dnat to 192.168.168.185:9123
                ip protocol icmp ip daddr 192.168.2.2 dnat to 192.168.168.185
                ip protocol icmp ip daddr 192.168.2.3 dnat to 192.168.168.185
                ip protocol icmp ip daddr 192.168.2.4 dnat to 192.168.168.185
                ip protocol icmp ip daddr 192.168.2.5 dnat to 192.168.168.185
        }
}
table ip6 ipv6-nat {
        chain postrouting {
                type nat hook postrouting priority srcnat; policy accept;
                oifname { "wan", "ppp0" } ip6 saddr fc00::/7 masquerade
        }

        chain prerouting {
                type nat hook prerouting priority dstnat; policy accept;
        }
}

dae

global {
    tproxy_port: 12345
    tproxy_port_protect: true
    so_mark_from_dae: 0
    log_level: info
    disable_waiting_network: false
    lan_interface: ens33,ens36,ens37,br-lan
    #wan_interface: auto
    auto_config_kernel_parameter: true
    tcp_check_url: 'http://cp.cloudflare.com,1.1.1.1,2606:4700:4700::1111'
    tcp_check_http_method: HEAD
    udp_check_dns: 'dns.google.com:53,8.8.8.8,2001:4860:4860::8888'
    check_interval: 32767s
    check_tolerance: 50ms
    dial_mode: domain++
    allow_insecure: false
    sniffing_timeout: 100ms
    tls_implementation: tls
    utls_imitate: chrome_auto
}

dns {
    upstream {
        localdns: 'udp://127.0.0.1:53'
    }
    routing {
        request {
            fallback: localdns
        }
        response {
            fallback: accept
        }
    }
}

子设备dns请求截图

image
(已测试过dae所在设备请求本地dns服务器正常)

@umlka
Copy link

umlka commented Jan 11, 2024

@jschwinger233 需要什么内核参数吗,我们可以加到报错信息里,以及在 docs 中我们有位置可以添加提示

openwrt需要安装kmod-veth

@sumire88
Copy link
Contributor

sumire88 commented Jan 11, 2024

@jschwinger233 需要什么内核参数吗,我们可以加到报错信息里,以及在 docs 中我们有位置可以添加提示

openwrt需要安装kmod-veth

@umlka Would you like to raise a discussion to explicitly document the steps/actions needed for this workaround? Also cc @jschwinger233

Your contribution means a lot to us. Thanks in advance.

@jschwinger233 jschwinger233 force-pushed the gray/fix-udp-port-conflict branch from a8e57a6 to 52252eb Compare January 11, 2024 12:27
Copy link
Contributor

@mzz2017 mzz2017 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing job. Thank you!

@sumire88 sumire88 requested a review from mzz2017 January 11, 2024 13:32
Copy link
Contributor

@sumire88 sumire88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a long journey. Thanks for your hard work!

@jschwinger233 jschwinger233 merged commit 0f8277b into daeuniverse:main Jan 11, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants