Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better interface selection heuristic #3546

Merged
merged 4 commits into from
Nov 15, 2017
Merged

Better interface selection heuristic #3546

merged 4 commits into from
Nov 15, 2017

Conversation

dadgar
Copy link
Contributor

@dadgar dadgar commented Nov 13, 2017

This PR introduces a better interface selection heuristic such that we
select interfaces with globally routable unicast addresses over link
local addresses.

Fixes #3487

This PR introduces a better interface selection heuristic such that we
select interfaces with globally routable unicast addresses over link
local addresses.

Fixes #3487
Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this heuristic will work for the linked issue, but it will break on machine which worries me about its portability. When I go run ... this script I get:

1 lo
  0 127.0.0.1/8 (*net.IPNet)
    IsGlobalUnicast          = false
    IsInterfaceLocalMulticast= false
    IsLinkLocalMulticast     = false
    IsLinkLocalUnicast       = false
    IsLoopback               = true
    IsMulticast              = false
  1 ::1/128 (*net.IPNet)
    IsGlobalUnicast          = false
    IsInterfaceLocalMulticast= false
    IsLinkLocalMulticast     = false
    IsLinkLocalUnicast       = false
    IsLoopback               = true
    IsMulticast              = false

2 enp0s31f6

3 wlp4s0
  0 192.168.86.26/24 (*net.IPNet)
    IsGlobalUnicast          = true
    IsInterfaceLocalMulticast= false
    IsLinkLocalMulticast     = false
    IsLinkLocalUnicast       = false
    IsLoopback               = false
    IsMulticast              = false
  1 fe80::d896:c428:a4f2:e480/64 (*net.IPNet)
    IsGlobalUnicast          = false
    IsInterfaceLocalMulticast= false
    IsLinkLocalMulticast     = false
    IsLinkLocalUnicast       = true
    IsLoopback               = false
    IsMulticast              = false

4 lxcbr0
  0 10.0.3.1/24 (*net.IPNet)
    IsGlobalUnicast          = true
    IsInterfaceLocalMulticast= false
    IsLinkLocalMulticast     = false
    IsLinkLocalUnicast       = false
    IsLoopback               = false
    IsMulticast              = false

5 docker0
  0 172.18.0.1/16 (*net.IPNet)
    IsGlobalUnicast          = true
    IsInterfaceLocalMulticast= false
    IsLinkLocalMulticast     = false
    IsLinkLocalUnicast       = false
    IsLoopback               = false
    IsMulticast              = false
  1 fe80::42:a4ff:fed1:1f7d/64 (*net.IPNet)
    IsGlobalUnicast          = false
    IsInterfaceLocalMulticast= false
    IsLinkLocalMulticast     = false
    IsLinkLocalUnicast       = true
    IsLoopback               = false
    IsMulticast              = false

6 vmnet1
  0 172.16.203.1/24 (*net.IPNet)
    IsGlobalUnicast          = true
    IsInterfaceLocalMulticast= false
    IsLinkLocalMulticast     = false
    IsLinkLocalUnicast       = false
    IsLoopback               = false
    IsMulticast              = false
  1 fe80::250:56ff:fec0:1/64 (*net.IPNet)
    IsGlobalUnicast          = false
    IsInterfaceLocalMulticast= false
    IsLinkLocalMulticast     = false
    IsLinkLocalUnicast       = true
    IsLoopback               = false
    IsMulticast              = false

...two more vmnet's...

wlp4s0 is my wifi which I would expect to be my "best" interface. However because it has a linklocal unicast address (due to ipv6 being enabled locally but not available on my LAN) it gets a score of 1 while lxc's bridge (lxcbr0) gets a score of 2 for having a global unicast address but no link local.

I'm not sure what we can do other than blacklist or score interfaces based on well known names or attempt to use cloud-specific metadata before scraping the system's interfaces...


addrs, err := f.interfaceDetector.Addrs(&intf)
if err != nil {
return nil, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 We could WARN and continue here, but after peeking at the Addrs(...) code it seems like all of the errors it returns indicate pretty bad syscall failures. Seems like just bailing is probably safest as users can manually set this anyway.

func TestNetworkFingerPrint_excludelo_down_interfaces(t *testing.T) {
// This tests that we prefer skipping link local and down interfaces as well as
// those without global unicast addresses
func TestNetworkFingerPrint_preferential_interfaces(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how any of these tests are exercising the new code path with the scoring introduced above.
I realize its not possible to test in a portable way across machines, but I would try to refactor the code so the scoring/selection logic is testable

}
}

if intfScore > bestScore {
Copy link
Contributor

@preetapan preetapan Nov 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If infscore stays at zero and bestScore at -1000 this will pick the first intf from the intfs list, is that acceptable?

@@ -200,14 +199,57 @@ func (f *NetworkFingerprint) findInterface(deviceName string) (*net.Interface, e
return nil, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the comments on the PR indicated that the returned list here is not stable (can return things out of order). What happens if there is more than one interface with a global unicast address, thus resulting in them getting the same score?
Reading the code below, it seems like it will choose the first one, but if the order changes when calling interfaceDetector.Interfaces() that will change the results.

@dadgar
Copy link
Contributor Author

dadgar commented Nov 15, 2017

@preetapan @schmichael Would appreciate a new review. Code has changed.

to force network fingerprinting on. When run in dev mode, this is defaulted
to the machine's loopback interface. If not in dev mode, the interface attached
to the default route is used. All addresses on the interface are fingerprinted
except those that are scoped local for IPv6. When allocating ports for tasks,
Copy link
Contributor

@preetapan preetapan Nov 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggesting a small rewrite for brevity and continuity.

When run in dev mode, this defaults to the loopback interface. 
When not in dev mode, the interface attached to the default route is used. All IP addresses except those scoped local for IPV6 on the chosen interface are fingerprinted. The scheduler chooses from those IP addresses when allocating ports for tasks.

if defaultIfName == "" {
return nil, fmt.Errorf("no network_interface given and failed to determine interface attached to default route")
}
deviceName = defaultIfName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much nicer now 👍

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants