-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watch cache regression: changes behavior of "resource version too old" error #25151
Comments
@kubernetes/sig-api-machinery I think this is candidate for back port to 1.2 to restore the previous behavior. |
+1 for backport |
hmm - we enabled watch cache in 1.1 - so this should also be broken in 1.1 |
It's possible that it was always nondeterministic and just depends on how fast the call to the storage layer executes. I think all of our in-tree clients will handle it either way. |
I'm fairly certain this never used to behave this way, because OpenShift On Wed, May 4, 2016 at 6:08 PM, Daniel Smith [email protected]
|
I see what you mean about ordering - etcd remote call would almost never be Either way, for web sockets if we don't fix this clients can't get this On Wed, May 4, 2016 at 6:52 PM, Clayton Coleman [email protected] wrote:
|
Well, it should be easy to just not wait at all, maybe? I'm OK with changing this behavior. |
Although, are web clients XSS hardened? I'm not sure that we're optimizing for them currently. |
If you mean CSRF, it isn't an issue with bearer token auth. Basic auth should already have a big warning around it saying "not safe with browsers" |
Maybe I'm missing something, but it seems deterministic to me. This is the uncached watch impl:
once the resourceVersion parses, there is no path that returns a direct error, so all errors would have to be returned as an event via the ResultChan(). I don't see us reading from that until after we've already written a 200 and flushed: https://github.com/kubernetes/kubernetes/blob/master/pkg/apiserver/watch.go#L177 |
Automatic merge from submit-queue Return 'too old' errors from watch cache via watch stream Fixes #25151 This PR updates the API server to produce the same results when a watch is attempted with a resourceVersion that is too old, regardless of whether the etcd watch cache is enabled. The expected result is a `200` http status, with a single watch event of type `ERROR`. Previously, the watch cache would deliver a `410` http response. This is the uncached watch impl: ``` // Implements storage.Interface. func (h *etcdHelper) WatchList(ctx context.Context, key string, resourceVersion string, filter storage.FilterFunc) (watch.Interface, error) { if ctx == nil { glog.Errorf("Context is nil") } watchRV, err := storage.ParseWatchResourceVersion(resourceVersion) if err != nil { return nil, err } key = h.prefixEtcdKey(key) w := newEtcdWatcher(true, h.quorum, exceptKey(key), filter, h.codec, h.versioner, nil, h) go w.etcdWatch(ctx, h.etcdKeysAPI, key, watchRV) return w, nil } ``` once the resourceVersion parses, there is no path that returns a direct error, so all errors would have to be returned as an `ERROR` event via the ResultChan().
Having read the docs, I was very surprised that "type": "ERROR" notices are sent as 200 (even over plain http), with the semantic code e.g. 410 only represented under https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes doesn't show at all how watch returns errors, and it even says "clients must handle the case by recognizing the status code 410 Gone" which really makes one expect an actual HTTP 410. |
@cben I agree this should be called out in the docs. Can you file a bug there and/or send a PR? I don't think this ~4 year old issue is the best place to track that. (Good detective work finding this, though!) |
Bug 1753649: UPSTREAM: 89937: portAllocator sync local data before allocate Origin-commit: d548b192d8b9e38402772509c8f0abc60ac7069c
References: * ManageIQ/kubeclient#452 (comment) * fabric8io/kubernetes-client#1800 (comment) * kubernetes/kubernetes#25151 (comment) * kubernetes/kubernetes#35068 (comment) * https://github.com/kubernetes/kubernetes/blob/dde6e8e7465468c32642659cb708a5cc922add64/test/e2e/apimachinery/protocol.go#L68-L75 * https://kubernetes.io/docs/reference/using-api/api-concepts/#410-gone-responses * https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes * https://www.baeldung.com/java-kubernetes-watch#1-resource-versions
References: * ManageIQ/kubeclient#452 (comment) * fabric8io/kubernetes-client#1800 (comment) * kubernetes/kubernetes#25151 (comment) * kubernetes/kubernetes#35068 (comment) * https://github.com/kubernetes/kubernetes/blob/dde6e8e7465468c32642659cb708a5cc922add64/test/e2e/apimachinery/protocol.go#L68-L75 * https://kubernetes.io/docs/reference/using-api/api-concepts/#410-gone-responses * https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes * https://www.baeldung.com/java-kubernetes-watch#1-resource-versions
When attempting to fetch a resource version that is too old, the behavior of watch (for both HTTP and web sockets) changed when watch cache was enabled. The watch cache returns a 410 - without the watch cache, we return 200 and then write an "error" entry to the watch stream. This broke a client that dependent on the watch behavior, and moreover is inconsistent with other errors we set in the registry. Also, for watch over web sockets, most javascript clients will be unable to get at a 410 error from the connection, because browsers don't get access to the data.
We should restore the watch cache behavior to be consistent with the previous behavior as this is a regression in API semantics.
Resource without watch cache, watching too old:
Resource with watch cache, watching too old:
The text was updated successfully, but these errors were encountered: