Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix xDS protocol mishandling #1013

Closed
wants to merge 1 commit into from
Closed

fix xDS protocol mishandling #1013

wants to merge 1 commit into from

Conversation

phylake
Copy link

@phylake phylake commented Apr 11, 2019

Signed-off-by: Brandon Cook [email protected]
fixes #499

@@ -40,10 +40,10 @@ func (xh *xdsHandler) fetch(req *v2.DiscoveryRequest) (*v2.DiscoveryResponse, er
}
resources, err := toAny(r, toFilter(req.ResourceNames))
return &v2.DiscoveryResponse{
VersionInfo: "0",
VersionInfo: req.VersionInfo,
Copy link

@mattalberts mattalberts Apr 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this fix the issue (I would love it if it does)?? I'll go test the branch ;) . Looking at the xDS protocol diagram, the management server is supposed to control version and nonce values, passing them back as part of DiscoveryResponse to be used in the NACK/ACK DiscoveryRequest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment we always send version_info="0", which is not correct, it should change. Looking through the code on the envoy side there is no requirement for version_info to increment, I've not found any place where two versions are sorted, they are only compared for equality.

I'm not sure this fix is correct, in the initial discovery request version info will be "", so rather than replying with "0", we'd reply with "" which has some unexpected consequences as "" is treated as a marker for some conditions.

This is also why the tests pass with the versioninfo and nonce fields removed in all the internal/e2e test cases. I don't think this is correct.

It's easy enough to get the right version info, it's in last. We can use the string form of last for VersionInfo.

For the nonce, this is only used in the outer loop of GrpcMuxImpl::onDiscoveryResponse on the Envoy side. Its never compared to a previous nonce, or sorted for newer or olderness, despite what the xDS protocol document says. This is sort of backed up by the observation that if these were serial numbers, they would be sent on the wire as numbers, not opaque strings.

I think for the moment we can use the string form of last here, at least it will change on every response.

@davecheney davecheney self-requested a review April 12, 2019 01:15
Copy link
Contributor

@davecheney davecheney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this. I really appreciate you digging into the problem and proposing a fix.

I've spent today comparing the management server impl in envoyproxy/go-control-plane and the code on the Envoy side as well as some of the envoy test cases. I think your diagnosis is correct; version number and/or nonce should change on each request/response cycle -- however I don't think the solution you've proposed is correct. Please see my next comment for details.

@@ -40,10 +40,10 @@ func (xh *xdsHandler) fetch(req *v2.DiscoveryRequest) (*v2.DiscoveryResponse, er
}
resources, err := toAny(r, toFilter(req.ResourceNames))
return &v2.DiscoveryResponse{
VersionInfo: "0",
VersionInfo: req.VersionInfo,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment we always send version_info="0", which is not correct, it should change. Looking through the code on the envoy side there is no requirement for version_info to increment, I've not found any place where two versions are sorted, they are only compared for equality.

I'm not sure this fix is correct, in the initial discovery request version info will be "", so rather than replying with "0", we'd reply with "" which has some unexpected consequences as "" is treated as a marker for some conditions.

This is also why the tests pass with the versioninfo and nonce fields removed in all the internal/e2e test cases. I don't think this is correct.

It's easy enough to get the right version info, it's in last. We can use the string form of last for VersionInfo.

For the nonce, this is only used in the outer loop of GrpcMuxImpl::onDiscoveryResponse on the Envoy side. Its never compared to a previous nonce, or sorted for newer or olderness, despite what the xDS protocol document says. This is sort of backed up by the observation that if these were serial numbers, they would be sent on the wire as numbers, not opaque strings.

I think for the moment we can use the string form of last here, at least it will change on every response.

@davecheney davecheney added this to the 0.12.0 milestone Apr 12, 2019
@davecheney
Copy link
Contributor

@phylake can you please try this diff instead

% git diff
diff --git a/internal/grpc/xds.go b/internal/grpc/xds.go
index 9718d5d..693f837 100644
--- a/internal/grpc/xds.go
+++ b/internal/grpc/xds.go
@@ -16,6 +16,7 @@ package grpc
 import (
        "context"
        "fmt"
+       "strconv"
        "sync/atomic"
 
        "github.com/envoyproxy/go-control-plane/envoy/api/v2"
@@ -100,7 +101,7 @@ func (xh *xdsHandler) stream(st grpcStream) (err error) {
                        log.Info("stream_wait")
 
                        // now we wait for a notification, if this is the first time through the loop
-                       // then last will be zero and that will trigger a notification immediately.
+                       // then last will be less than zero and that will trigger a notification immediately.
                        r.Register(ch, last)
                        select {
                        case last = <-ch:
@@ -117,10 +118,10 @@ func (xh *xdsHandler) stream(st grpcStream) (err error) {
                                }
 
                                resp := &v2.DiscoveryResponse{
-                                       VersionInfo: "0",
+                                       VersionInfo: strconv.Itoa(last),
                                        Resources:   resources,
                                        TypeUrl:     r.TypeURL(),
-                                       Nonce:       "0",
+                                       Nonce:       strconv.Itoa(last),
                                }
                                if err := st.Send(resp); err != nil {
                                        return err

@phylake
Copy link
Author

phylake commented Apr 12, 2019

Closing in favor of #1016

@phylake phylake closed this Apr 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Observed envoy memory while adding|removing ingress
3 participants