Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backend/vxlan: simplify vxlan processing #785

Merged
merged 1 commit into from
Aug 14, 2017
Merged

Conversation

tomdee
Copy link
Contributor

@tomdee tomdee commented Jul 27, 2017

No need for any netlink monitoring anymore
This means vxlan should work with older kernels (Centos 6)
and it should perform betteron heavily loaded systems.

This change is compatible with older versions of flannel
so it should be OK to do rolling upgrades to it.

Some notes on how vxlan worked before and how this changes it can be found in the code https://github.com/coreos/flannel/pull/785/files#diff-2fcdad83a560522bb17d06e480b6ed76R17

Fixes #779
Fixes #592
Fixes #617
Fixes #654
Fixes #604
Fixes #414
Fixes #427

@ayaz
Copy link

ayaz commented Aug 7, 2017

@tomdee @gunjan5 When can we hope to have this available officially? I have just had this bug hit me on a perfectly running cluster.

@gunjan5
Copy link
Contributor

gunjan5 commented Aug 7, 2017

@ayaz I plan to review it in the next few days, then it will need some testing before it's merged into master

Copy link
Contributor

@gunjan5 gunjan5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good, just a couple of comments.

// 2) Create a static ARP entry for the remote flannel host IP address (and the VTEP MAC)
// 3) Create an FDB entry with the VTEP MAC and the public IP of the remote flannel daemon.
//
// In this scheme the scaling of table entries in linear to the number of remote hosts - 1 route, 1 arp entry and 1 FDB entry per host
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entries is linear

return nil, err
// Ensure that the device has a /32 address so that no broadcast routes are created.
// This IP is just used as a source address for host to workload traffic (so
// the return path for the traffic has a decent address to use as the destination)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean by a "decent address" here?

case subnet.EventAdded:
log.V(2).Infof("Adding subnet: %s PublicIP: %s VtepMAC: %s", event.Lease.Subnet, event.Lease.Attrs.PublicIP, net.HardwareAddr(attrs.VtepMAC))
if err := nw.dev.AddArp(neighbor{IP: event.Lease.Subnet.IP, MAC: net.HardwareAddr(attrs.VtepMAC)}); err != nil {
log.Error("AddArp failed: ", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if we should return here when this fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning would just mean that we don't try to process the rest of the events in the batch. I think it's still worth trying to process those events. What's your reasoning for why we shouldn't try?

if err != nil {
log.Error("Delete L2 failed: ", err)
if err := nw.dev.AddFdb(neighbor{IP: event.Lease.Attrs.PublicIP, MAC: net.HardwareAddr(attrs.VtepMAC)}); err != nil {
log.Error("AddFdb failed: ", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and return here with cleanup from whatever we did in the previous step line 110? wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, I thing maybe you're suggesting a "continue" rather than "return"? And advocating for cleaning up the failed FDB or arp entries? It sounds like a reasonable suggestion to me (though I don't think it really brings much benefit to the user and increases the code complexity).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've addressed this now.

func (dev *vxlanDevice) AddL2(n neighbor) error {
log.V(4).Infof("calling NeighAdd: %v, %v", n.IP, n.MAC)
return netlink.NeighAdd(&netlink.Neigh{
func (dev *vxlanDevice) AddFdb(n neighbor) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better name would be AddFDB since FDB is an acronym
See: https://github.com/golang/go/wiki/CodeReviewComments#initialisms (also applies to a bunch of other method names in this file.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -177,8 +163,8 @@ func (dev *vxlanDevice) AddL2(n neighbor) error {
})
}

func (dev *vxlanDevice) DelL2(n neighbor) error {
log.V(4).Infof("calling NeighDel: %v, %v", n.IP, n.MAC)
func (dev *vxlanDevice) DelFdb(n neighbor) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, a better name would be DelFDB

@@ -188,77 +174,28 @@ func (dev *vxlanDevice) DelL2(n neighbor) error {
})
}

func (dev *vxlanDevice) AddL3(n neighbor) error {
log.V(4).Infof("calling NeighSet: %v, %v", n.IP, n.MAC)
func (dev *vxlanDevice) AddArp(n neighbor) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a better name would be AddARP

} else {
log.V(2).Infof("L3 miss: AddL3 for %s succeeded", miss.IP)
default:
log.Error("Internal error: unknown event type: ", int(event.Type))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why you have some error logs with log.Error and uppercase error message vs log.Errorf and a lowercase message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no reason 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update them all to lowercase

@tomdee
Copy link
Contributor Author

tomdee commented Aug 14, 2017

@gunjan5 I've made markups (ea2cb64) to address all your comments, PTAL.

@gunjan5
Copy link
Contributor

gunjan5 commented Aug 14, 2017

@tomdee LGTM! I haven't had a chance to manually test this. I'm assuming you have done some testing? lmk if you want me to do some manual testing before the merge. I will likely not have any time this week tho

@tomdee tomdee merged commit 4973e02 into flannel-io:master Aug 14, 2017
@tomdee tomdee deleted the vxlan branch August 14, 2017 23:45
@tomdee
Copy link
Contributor Author

tomdee commented Aug 14, 2017

I have done some testing, but I would love for people to do more testing before this gets released.

You can try it by using the image quay.io/coreos/flannel-git:v0.8.0-14-g4973e02e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants