Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on ip mptcp endpoint cleanup #281

Open
majek opened this issue Jul 21, 2023 · 17 comments
Open

Clarification on ip mptcp endpoint cleanup #281

majek opened this issue Jul 21, 2023 · 17 comments
Labels
question Further information is requested

Comments

@majek
Copy link

majek commented Jul 21, 2023

I'm playing with mptcpd.

I don't quite get what plugins do. I don't quite understand if mptcpd supports "userspace path manager" sysctl net.mptcp.pm_type thing.

Having said that, I don't think MPTCP is interesting (from client point of view) without fullmesh. Or maybe I'm missing something. Without it the handover doesn't seem to be aggressive enough.

With fullmesh I can see mptcpd adding ip mptcp endpoint paths:

$ /home/marek/src/iproute2/ip/ip mptcp endpoint show
192.168.1.71 id 1 subflow fullmesh dev enx644bf033040b 
192.168.1.83 id 5 subflow fullmesh dev wlp0s20f3 

Which is great. However I would expect these things to disappear when I yank out ethernet cable or disable wifi. And that doesn't seem to happen. Is mptcpd cleaning the old endpoints correctly? Is there a way to configure it somehow?

I'm on ubuntu jammy and new 6.4 kernel and custom-compiled mptcpd off git master.

@pabeni
Copy link
Contributor

pabeni commented Jul 25, 2023

Having said that, I don't think MPTCP is interesting (from client point of view) without fullmesh. Or maybe I'm missing something. Without it the handover doesn't seem to be aggressive enough.

Intersting. Do you have any datapoint you can share? e.g. the problematic configuration and the paired pcap capture?

With fullmesh I can see mptcpd adding ip mptcp endpoint paths:

$ /home/marek/src/iproute2/ip/ip mptcp endpoint show
192.168.1.71 id 1 subflow fullmesh dev enx644bf033040b 
192.168.1.83 id 5 subflow fullmesh dev wlp0s20f3 

Which is great. However I would expect these things to disappear when I yank out ethernet cable or disable wifi. And that doesn't seem to happen. Is mptcpd cleaning the old endpoints correctly?

It should, when the paired IP addresses are delete from the kernel.

That in turn could happen when the cable is actually unplugged, or not, depending on other factor e.g. is the system running NM?

Note that the MPTCP protocol, for existing/already established connection, could (and should) decide to not use the path[s] on the broken link[s] (well) before the endpoint is actually deleted.

@matttbe
Copy link
Member

matttbe commented Jul 26, 2023

Hi Marek,

I don't quite get what plugins do. I don't quite understand if mptcpd supports "userspace path manager" sysctl net.mptcp.pm_type thing.

mptcpd's initial goal is to control MPTCP's path manager. In the kernel, there are two types:

  • in-kernel: the default one -- the configuration is used to have the same behaviour for all connections of the same netns
  • userspace: if net.mptcp.pm_type is set to 1 -- the idea is to listen to Netlink events (connections, subflow, addresses, priority → you can see them with ip mptcp monitor) and react to them by sending commands via Netlink. So it is possible to have a different behaviour per connection.

With mptcpd, you can create plugins to control path-manager, either the in-kernel one (like addr_adv does) or the userspace one (like sspi does).

In most cases, the in-kernel path-manager can do the job. It can be configured manually with ip mptcp but also automatically with Network-Manager (from v1.40, example here) or mptcpd. But when it is automatic, it is might not set the right flag (subflow for a client or signal for a server typically) or add wrong IPs. (With mptcpd's addr_adv plugin, you can modify the config file /etc/mptcpd/mptcpd.conf but I guess you saw that.)

I don't think MPTCP is interesting (from client point of view) without fullmesh.

It depends: a typical deployment is a client with 2 interfaces and a server with 1 and you just want two subflows. The in-kernel PM supports that with or without the fullmesh mode. If now the server has two interfaces, what is the best here? Some might want to have 2 subflows, some might prefer to use all possible paths with "fullmesh" and have 4. Now if you have 2 interfaces, each with one IPv4 and one IPv6, do you want to have 8 paths? :)

I'm playing with mptcpd.

Do not hesitate if you have questions or ideas. Especially if you would like to write a nice blog post about that :-D
We know that a good documentation for users is lacking and some concepts might not be clear or confusing. We would like to improve that in the near future.

@majek
Copy link
Author

majek commented Jul 26, 2023

Thanks for the answer. That clarifies a lot. Indeed, I have a situation when the server is doing add_addr, which is why the fullmesh is interesting. Can I send RM_ADDR from server to discourage the use of primary server ip/port? My server is loadbalanced / ECMP'ed, so secondary subflow against the original server ip/port has low chances of working.

Going back to my original question, about ip mptcp endpoint cleanup, I think something is broken in mptcpd. I'm running ubuntu jammy, on linux with new kernel, with couple of net interfaces (wifi, wired, usb wired), and after initial greeting from a deamon, I see loads of these in systemctl status mptcp

Jul 26 14:58:28 mrprec mptcpd[215569]: add_addr: too many addresses or duplicate one: -34: Numerical result out of range

My lan has both ipv4 (192.168 range) and native ipv6.

mptcpd if fresh from github. I think maybe it gets confused by something and thats why it refuses to cleanup the old ip mptcp endpoint entries.

@matttbe
Copy link
Member

matttbe commented Jul 26, 2023

Can I send RM_ADDR from server to discourage the use of primary server ip/port? My server is loadbalanced / ECMP'ed, so secondary subflow against the original server ip/port has low chances of working.

I guess what you need is to have the C flag set in the MP_CAPABLE. To do that, you can change this sysctl on your server in the appropriated netns: sysctl -w net.mptcp.allow_join_initial_addr_port=0

I think something is broken in mptcpd. (...) I see loads of these in systemctl status mptcp

add_addr: too many addresses or duplicate one: -34: Numerical result out of range

Mmh, yes, it looks like a bug. After a quick look, I don't even see the error message in the code :-/
What's the error level according to journald? (sudo journalctl -p [1-7] -u mptcp ; "emerg" (0), "alert" (1), "crit" (2), "err" (3), "warning" (4), "notice" (5), "info" (6), "debug" (7))
What are the other messages just before/after?

@majek
Copy link
Author

majek commented Aug 7, 2023

$ sudo /home/marek/src/iproute2/ip/ip mptcp endpoint show
192.168.1.156 id 1 subflow fullmesh dev wlp0s
192.168.1.25 id 2 subflow fullmesh dev enx381428a7
2a01:110f:xxxw id 3 subflow fullmesh dev enx381428a7
192.168.1.11 id 4 subflow fullmesh dev enx5855ca23
2a01:110f:xxxz id 5 subflow fullmesh dev enx381428a7 
2a01:110f:xxxy id 6 subflow fullmesh dev enx5855ca23
192.168.1.83 id 7 subflow fullmesh dev wlp0s
2a01:110f:xxxx id 8 subflow fullmesh dev enx5855ca23 
marek@mrprec:~/Downloads/mptcpd-0.12$ sudo ~/bin/strace -f ./src/mptcpd 
...
write(2, "MPTCP address advertiser path ma"..., 51MPTCP address advertiser path manager initialized.
) = 51
write(2, "MPTCP single-subflow-per-interfa"..., 61MPTCP single-subflow-per-interface path manager initialized.
) = 61
setsockopt(4, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP, [9], 4) = 0
recvmsg(6, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\24\0\0\0\3\0\2\0\1\0\0\0001|\0\0\0\0\0\0", iov_len=4096}], msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_NETLINK, cmsg_type=0x3}], msg_controllen=24, msg_flags=0}, 0) = 20
epoll_ctl(3, EPOLL_CTL_MOD, 6, {events=EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, data={u32=3819575232, u64=94321301395392}}) = 0
epoll_wait(3, [{events=EPOLLOUT, data={u32=3819570752, u64=94321301390912}}, {events=EPOLLOUT, data={u32=3819575232, u64=94321301395392}}], 10, -1) = 2
sendto(4, [{nlmsg_len=36, nlmsg_type=mptcp_pm, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=3, nlmsg_pid=31793}, "\x05\x00\x00\x00\x08\x00\x03\x00\x02\x00\x00\x00\x08\x00\x02\x00\x02\x00\x00\x00"], 36, 0, NULL, 0) = 36
epoll_ctl(3, EPOLL_CTL_MOD, 4, {events=EPOLLIN|EPOLLERR|EPOLLHUP, data={u32=3819570752, u64=94321301390912}}) = 0
sendto(6, [{nlmsg_len=24, nlmsg_type=RTM_GETADDR, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK|NLM_F_DUMP, nlmsg_seq=2, nlmsg_pid=31793}, {ifa_family=AF_UNSPEC, ifa_prefixlen=0, ifa_flags=0, ifa_scope=RT_SCOPE_UNIVERSE, ifa_index=0}], 24, 0, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, 12) = 24
epoll_ctl(3, EPOLL_CTL_MOD, 6, {events=EPOLLIN|EPOLLERR|EPOLLHUP, data={u32=3819575232, u64=94321301395392}}) = 0
epoll_wait(3, [{events=EPOLLIN, data={u32=3819570752, u64=94321301390912}}, {events=EPOLLIN, data={u32=3819575232, u64=94321301395392}}], 10, -1) = 2
recvmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="$\0\0\0\2\0\0\1\3\0\0\0001|\0\0\0\0\0\0$\0\0\0\37\0\5\0\3\0\0\0"..., iov_len=8192}], msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_NETLINK, cmsg_type=0x3}], msg_controllen=24, msg_flags=0}, 0) = 36
recvmsg(6, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="L\0\0\0\24\0\2\0\2\0\0\0001|\0\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1"..., iov_len=4096}], msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_NETLINK, cmsg_type=0x3}], msg_controllen=24, msg_flags=0}, 0) = 448
epoll_ctl(3, EPOLL_CTL_MOD, 4, {events=EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, data={u32=3819570752, u64=94321301390912}}) = 0
epoll_wait(3, [{events=EPOLLOUT, data={u32=3819570752, u64=94321301390912}}, {events=EPOLLIN, data={u32=3819575232, u64=94321301395392}}], 10, -1) = 2
sendto(4, [{nlmsg_len=36, nlmsg_type=mptcp_pm, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=4, nlmsg_pid=31793}, "\x05\x00\x00\x00\x08\x00\x03\x00\x03\x00\x00\x00\x08\x00\x02\x00\x03\x00\x00\x00"], 36, 0, NULL, 0) = 36
epoll_ctl(3, EPOLL_CTL_MOD, 4, {events=EPOLLIN|EPOLLERR|EPOLLHUP, data={u32=3819570752, u64=94321301390912}}) = 0
recvmsg(6, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="H\0\0\0\24\0\2\0\2\0\0\0001|\0\0\n\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0"..., iov_len=4096}], msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_NETLINK, cmsg_type=0x3}], msg_controllen=24, msg_flags=0}, 0) = 720
epoll_wait(3, [{events=EPOLLIN, data={u32=3819570752, u64=94321301390912}}, {events=EPOLLIN, data={u32=3819575232, u64=94321301395392}}], 10, -1) = 2
recvmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="$\0\0\0\2\0\0\1\4\0\0\0001|\0\0\0\0\0\0$\0\0\0\37\0\5\0\4\0\0\0"..., iov_len=8192}], msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_NETLINK, cmsg_type=0x3}], msg_controllen=24, msg_flags=0}, 0) = 36
epoll_ctl(3, EPOLL_CTL_MOD, 4, {events=EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, data={u32=3819570752, u64=94321301390912}}) = 0
recvmsg(6, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\24\0\0\0\3\0\2\0\2\0\0\0001|\0\0\0\0\0\0", iov_len=4096}], msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_NETLINK, cmsg_type=0x3}], msg_controllen=24, msg_flags=0}, 0) = 20
epoll_wait(3, [{events=EPOLLOUT, data={u32=3819570752, u64=94321301390912}}], 10, -1) = 1
sendto(4, [{nlmsg_len=64, nlmsg_type=mptcp_pm, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=5, nlmsg_pid=31793}, "\x01\x00\x00\x00\x2c\x00\x01\x80\x06\x00\x01\x00\x02\x00\x00\x00\x08\x00\x03\x00\xc0\xa8\x01\x53\x05\x00\x02\x00\x07\x00\x00\x00"...], 64, 0, NULL, 0) = 64
epoll_ctl(3, EPOLL_CTL_MOD, 4, {events=EPOLLIN|EPOLLERR|EPOLLHUP, data={u32=3819570752, u64=94321301390912}}) = 0
epoll_wait(3, [{events=EPOLLIN, data={u32=3819570752, u64=94321301390912}}], 10, -1) = 1
recvmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\204\0\0\0\2\0\0\2\5\0\0\0001|\0\0\336\377\377\377@\0\0\0\37\0\5\0\5\0\0\0"..., iov_len=8192}], msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_NETLINK, cmsg_type=0x3}], msg_controllen=24, msg_flags=0}, 0) = 132
write(2, "add_addr: too many addresses or "..., 82add_addr: too many addresses or duplicate one: -34: Numerical result out of range
) = 82

I can't quite get the mptcpd from master to build due to I guess old autotools

marek@mrprec:~/src/mptcpd$ git pull
Already up to date.
marek@mrprec:~/src/mptcpd$ ./bootstrap 
libtoolize: putting auxiliary files in '.'.
libtoolize: linking file './ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'.
libtoolize: linking file 'm4/libtool.m4'
libtoolize: linking file 'm4/ltoptions.m4'
libtoolize: linking file 'm4/ltsugar.m4'
libtoolize: linking file 'm4/ltversion.m4'
libtoolize: linking file 'm4/lt~obsolete.m4'
configure.ac:103: error: possibly undefined macro: AC_MSG_ERROR
      If this token and others are legitimate, please use m4_pattern_allow.
      See the Autoconf documentation.
configure.ac:175: error: possibly undefined macro: AC_DEFINE
configure.ac:303: error: possibly undefined macro: AC_CHECK_DEFINE
configure:18521: error: possibly undefined macro: AC_MSG_WARN
autoreconf: error: /usr/bin/autoconf failed with exit status: 1

(ubuntu jammy and custom 6.4 kernel)

@matttbe
Copy link
Member

matttbe commented Aug 7, 2023

I can't quite get the mptcpd from master to build due to I guess old autotools

Did you install autoconf-archive package? I remember I had to add that as a build dep for the .deb packages.

@majek
Copy link
Author

majek commented Aug 7, 2023

autoconf-archive does not exist in my distro. I have autoconf which is 2.71-2 version apparently

@matttbe
Copy link
Member

matttbe commented Aug 7, 2023

Oh, stange, you should have it, no? It is in the universe repo (version 20210219-2.1) in Ubuntu Jammy, maybe you don't have this repo enabled?

https://packages.ubuntu.com/jammy/autoconf-archive

@majek
Copy link
Author

majek commented Aug 7, 2023

Thanks, yeah, autoconf-archive package helped with the AC_MSG_ERROR red herring.

Ok, config in /usr, is it configurable? /usr/local/etc/mptcpd/mptcpd.conf:

plugin-dir=/home/marek/src/mptcpd/plugins/path_managers/.libs/
path-manager=addr_adv
addr-flags=subflow,fullmesh
notify-flags=existing,skip_loopback,skip_link_local

shows

marek@mrprec:~/src/mptcpd$ sudo  ./src/mptcpd -d
configuration.c:mptcpd_config_create() path manager plugin directory: /home/marek/src/mptcpd/plugins/path_managers/.libs/
configuration.c:mptcpd_config_create() default path manager plugin: addr_adv
configuration.c:mptcpd_config_create() address flags: subflow,fullmesh
configuration.c:mptcpd_config_create() notify flags: existing,skip_link_local,skip_loopback
path_manager.c:family_appeared() "mptcp_pm" generic netlink family appeared
path_manager.c:dump_addrs_callback() ID sync: 1 | fe80::383:6d07:93e6:e484
path_manager.c:dump_addrs_callback() ID sync: 2 | 192.168.1.11
path_manager.c:dump_addrs_callback() ID sync: 3 | 2a01:110f::xxx
path_manager.c:dump_addrs_callback() ID sync: 4 | 2a01:110f::xxy
MPTCP address advertiser path manager initialized.
MPTCP single-subflow-per-interface path manager initialized.
add_addr: too many addresses or duplicate one: -34: Numerical result out of range
add_addr: too many addresses or duplicate one: -34: Numerical result out of range

@matttbe
Copy link
Member

matttbe commented Aug 7, 2023

Thanks, yeah, autoconf-archive package helped with the AC_MSG_ERROR red herring.

Great!

Ok, config in /usr, is it configurable? /usr/local/etc/mptcpd/mptcpd.conf:

I think it is standard when you don't pass a prefix. I see we are using AutoConf's sysconfdir variable: https://www.gnu.org/prep/standards/html_node/Directory-Variables.html
It can be overridden to use /etc (but it looks strange to put files there on one side but using /usr/local for the rest)

too many addresses or duplicate one

Arf, I don't know how I miss that: that's an error message coming from the kernel side (we can see in the strace output that it got that via Netlink) because the endpoint with this IP already exists.

I don't know the code of mptcpd well enough (sorry for that but I can ask for help tomorrow at our weekly call) but I wonder if it is not due to mptcpd restart: it tries to re-add endpoints that already exist. I guess it doesn't support a restart in this mode (yet). It should not be blocking but you might have to launch ip mptcp endpoint flush before a restart if you change the flags that are being set.

@matttbe
Copy link
Member

matttbe commented Aug 10, 2023

Hi Marek,

Sorry to come back to you only now (not easy with the holiday period and other changes).

I suggest to have a small recap because the discussions went into different directions, just to make sure we are aligned:

  • your main issue is that mptcpd doesn't seem to delete endpoints when expected, right?

    • I don't think you explained what happens when you yank out the Ethernet cable or disable the WiFi: is the interface removed as well or just the IP?
    • I'm asking that because it is possible mptcpd don't delete endpoints linked to IPs that have just been removed: maybe it only look at when interfaces have been removed (which doesn't seem enough) or maybe there is a bug in the tracking of the different endpoints (or maybe it doesn't track "old endpoints" added by other tools or from a previous execution).
  • regarding the other issues:

    • compilation issues
    • location of the config file
    • too many addresses or duplicate one warning
    • → I think they have been fixed or understood and we can focus on the main issue, right?

@birdofprey
Copy link

birdofprey commented Aug 16, 2023

os: Debian Gnu/Linux 12
Kernel: Linux localhost 6.4.0-0.deb12.2-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.4.4-3~bpo12+1 (2023-08-08) x86_64 GNU/Linux
iproute2: 6.1.0-3
mptcpd: 0.12-1
systemctl status mptcpd:

add_addr: too many addresses or duplicate one: -16: Device or resource busy

ip mptcp endpoint show:

192.x.43.103 id 1 subflow dev wlp2s0 
2408:8469:x:x:x:7fe4:3646:bfc7 id 2 subflow dev wlp2s0 
192.x.43.52 id 3 subflow dev wlx1cbfce8fa4ce 
2408:8469:x:x:x:5568:29ae:5ad id 4 subflow dev wlx1cbfce8fa4ce

its runing ? is it normal?

@matttbe
Copy link
Member

matttbe commented Aug 16, 2023

@birdofprey please do not comment on issues not related to the issue here (Clarification on ip mptcp endpoint cleanup). I think the add_addr: too many addresses or duplicate one error can happen if mptcpd is restarted while some endpoints have been added from another source. If you have this issue in other cases, please open a new issue not to mix that one with others.

@majek
Copy link
Author

majek commented Aug 18, 2023

ok, on linux you can't have more than 8 subflow configs

$ for i in `seq 1 10`; do sudo ip mptcp endpoint delete id $i; done

$ ip mptcp  endpoint

$ sudo ip mptcp endpoint add 1.1.1.1 id 1 subflow
$ sudo ip mptcp endpoint add 1.1.1.2 id 2 subflow
$ sudo ip mptcp endpoint add 1.1.1.3 id 3 subflow
$ sudo ip mptcp endpoint add 1.1.1.4 id 4 subflow
$ sudo ip mptcp endpoint add 1.1.1.5 id 5 subflow
$ sudo ip mptcp endpoint add 1.1.1.6 id 6 subflow
$ sudo ip mptcp endpoint add 1.1.1.7 id 7 subflow
$ sudo ip mptcp endpoint add 1.1.1.8 id 8 subflow
$ sudo ip mptcp endpoint add 1.1.1.9 id 9 subflow
Error: too many addresses or duplicate one: -34.

$ ip mptcp limits  set subflow 8 add_addr_accepted 4
$ ip mptcp limits  set subflow 9 add_addr_accepted 4
Error: limit greater than maximum.

@majek
Copy link
Author

majek commented Aug 18, 2023

As a side note, here's a manage_mptcp_endpoint.sh bash script, which does what I would expect mptcpd to do. Example run:

$ sudo bash manage_mptcp_endpoint.sh 
[+] Clearling old mptcp endpoints
[.] Update
ip mptcp endpoint add 192.168.1.83 dev wlp0s20f3 id 1 subflow
ip mptcp endpoint add 2a01:xxxy dev wlp0s20f3 id 2 subflow
ip mptcp endpoint add 192.168.1.25 dev enx381428a758c2 id 3 subflow
ip mptcp endpoint add 2a01:xxx dev enx381428a758c2 id 4 subflow

[.] Update
[.] Update
ip mptcp endpoint add 192.168.1.11 dev enx5855ca236af3 id 5 subflow
[.] Update
[.] Update
...
[.] Update
ip mptcp endpoint add 2a01:xxxy dev enx5855ca236af3 id 6 subflow
[.] Update
[.] Update
[.] Update
ip mptcp endpoint delete id 6 # 2a01:xxxx dev enx5855ca236af3
ip mptcp endpoint delete id 5 # 192.168.1.11 dev enx5855ca236af3
[.] Update

(during the run I plugged in enx5855ca236af3 interface and then plugged it out)

@matttbe
Copy link
Member

matttbe commented Aug 18, 2023

ok, on linux you can't have more than 8 subflow configs

With the in-kernel path manager, there is indeed a limit (which is not in the userspace PM if I'm not mistaken). Technically, it was supposed to be a limit only on the endpoints if I remember well, mainly to save some bytes in kernel structures and because we thought that 8 endpoints (8 public IPs) used at the same time sounded like a lot already. If you have a good use case, we are open to change, it is just that at that time, we didn't find any realistic one and nobody complained (and we thought that specific use cases might want to use the userspace PM anyway).

But regarding the limit on the subflows, it looks like we use the same software limit (8) for the number of subflows we want to establish. I don't see a technical limitation for that and if you have 8 endpoints, you can easily use more than 8 subflows, e.g. in a fullmesh and then have up to 64 subflows.

This simple kernel patch should be enough to try with more than 8 subflows (but still with a limit of 8 endpoints) if it is easy for you to modify the kernel:

diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 9661f3812682..b22dd35f94e1 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -1768,7 +1768,7 @@ static int mptcp_nl_cmd_dump_addrs(struct sk_buff *msg,
        return msg->len;
 }
 
-static int parse_limit(struct genl_info *info, int id, unsigned int *limit)
+static int parse_limit(struct genl_info *info, int id, unsigned int *limit, unsigned int max)
 {
        struct nlattr *attr = info->attrs[id];
 
@@ -1776,7 +1776,7 @@ static int parse_limit(struct genl_info *info, int id, unsigned int *limit)
                return 0;
 
        *limit = nla_get_u32(attr);
-       if (*limit > MPTCP_PM_ADDR_MAX) {
+       if (*limit > max) {
                GENL_SET_ERR_MSG(info, "limit greater than maximum");
                return -EINVAL;
        }
@@ -1792,12 +1792,12 @@ mptcp_nl_cmd_set_limits(struct sk_buff *skb, struct genl_info *info)
 
        spin_lock_bh(&pernet->lock);
        rcv_addrs = pernet->add_addr_accept_max;
-       ret = parse_limit(info, MPTCP_PM_ATTR_RCV_ADD_ADDRS, &rcv_addrs);
+       ret = parse_limit(info, MPTCP_PM_ATTR_RCV_ADD_ADDRS, &rcv_addrs, MPTCP_PM_ADDR_MAX);
        if (ret)
                goto unlock;
 
        subflows = pernet->subflows_max;
-       ret = parse_limit(info, MPTCP_PM_ATTR_SUBFLOWS, &subflows);
+       ret = parse_limit(info, MPTCP_PM_ATTR_SUBFLOWS, &subflows, 64);
        if (ret)
                goto unlock;
 

Both limit are software ones, they can be modified if needed.

As a side note, here's a manage_mptcp_endpoint.sh bash script, which does what I would expect mptcpd to do

Nice script, thank you for sharing it. I agree mptcpd should do that (+ take the endpoints already available when starting it). It doesn't look as simple when it is done in C with the Netlink API, etc. :-D

I guess this part in mptcpd has not been fully tested, more focused on the userspace PM side. (I can ask).

Note that NM 1.40+ should be able to do that too. (and hopefully better :) )

Because different issues have been discussed here, it might be clearer to open a new issue dedicated to that. I can quickly do that.

@majek
Copy link
Author

majek commented Aug 28, 2023

Technically, it was supposed to be a limit only on the endpoints if I remember well, mainly to save some bytes in kernel structures and because we thought that 8 endpoints (8 public IPs) used at the same time sounded like a lot already. If you have a good use case,

Ok, thanks for explanation. It makes total sense. However, from another point of view: assuming both ipv4 and ipv6 stack, 8 IPs is 4 interfaces of two IPs. But for Ipv6 you often have multiple addrs (mngtmpaddr noprefixroute vs temporary). So... that means we need to be careful which one we add to mptcp. In my script I choose non-temporary, and scope=global (ie: no site local). It's obvious in hindsight, it wasn't obvious on the go.

I think the 8-IP limit should be configurable in sysctl. Then in the docs we could write that 8 makes sense and bumping it is more often counterproductive than not.

Another issue, which hopefully this github discussion clarifies was that we saw the error add_addr: too many addresses or duplicate one: -34: Numerical result out of range and not recognized it as "too many IPs added" message. The 8-limit was just implicit and non-obvious.

But anyway, I'm not arguing for bumping it. It just I wasnt aware of the limit before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants