-
Notifications
You must be signed in to change notification settings - Fork 865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tcp btl: bind() on connect() side, interface conflict #7115
Comments
@bwbarrett, @jsquyres others? |
There was discussion on the webex yesterday: AWS has seen similar issues, and has some in-house fixes that they're still testing. @wckzhang has more information. |
It might be related, but geoffrey's description sounded more like that was about reachability of different hosts on different interfaces. Is there any work-in-progress I can browse for that to see what it does for my case? I'm concerned here about a mismatch between the interface a BTL is created to use vs what it actually uses. mca_btl_tcp_create() loops over all the interfaces creating a BTL for each and setting btl->tcp_ifaddr which is what the connect-side eventually binds to. But the endpoints it creates for use with this btl can pick between any of the local interfaces. It's not that it thinks ib0 can connect to ib1, it correctly identifies that they can't talk. It's just ignoring the information about what interface it's going to bind to and picking an incompatible address for the endpoint. |
Thanks, I've browsed and run 7134 and I'm satisfied that it does fix this issue. I still have two concerns about 7134 but those are separate from the bind() mismatch here. So I guess I'm okay with closing this |
On my machines if I run one rank on each of two hosts and use
the ordering in which the interfaces are used results in rank1 trying to connect to rank0 using a btl that's set up with a connect-side bind() to ib0 but the endpoint it's trying to connect to for the peer is on ib1.
We seem to create one BTL for each network interface and then we have the connect-side bind to
btl->tcp_ifaddr
which makes me think the BTL is intended to be exclusive to that interface. But over inmca_btl_tcp_proc_insert()
when it's filling in information for a btl_endpoint where the btl is already selected, it loops over all the local interfaces in making its selection.If I add some code in the
i,j
loop to skip over any local interfacei
such that its addresslocal_interface->ipv4_address
doesn't match the BTL's selectionbtl_endpoint->endpoint_btl->tcp_ifaddr
then my above ib1,ib0 run results in consistent use of ib0. Eg the endpoint ignores the ib1 entries and picks an IP on ib0.I can make a PR for this, but I want to get some discussion first about what's the right solution. Are BTLs supposed to be exclusive to the
btl->tcp_ifaddr
they're set up with? That seems reasonable to me, but it might be significant change in behavior, eg in my example above a BTL that would have been running over ib1 now picks ib0 for its peer. And even though I don't think theweights[]
code inmca_btl_tcp_proc_insert()
is doing anything super fancy like trying to pick the fastest interface, I'm concerned that maybe it is and my change is potentially is picking a slower interface for the endpoint.An alternate 'fix' is to let the endpoint get setup with whatever interface it wants and ignore the
btl->tcp_ifaddr
setting if it's going to conflict. Eg don't bind on the connect-side ifopal_net_samenetwork()
shows us that the upcoming connect is doomed to hang.Both of the above fixes worked for me.
Can anyone advise on what fits best with the intended design? Is there meant to be one TCP btl for each local interface and each btl is exclusive to that interface?
The text was updated successfully, but these errors were encountered: