-
Notifications
You must be signed in to change notification settings - Fork 382
Allow the same user to edit an instance #1872
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -256,9 +257,23 @@ func validateServiceInstanceUpdate(instance *sc.ServiceInstance) field.ErrorList | |||
// internalValidateServiceInstanceUpdateAllowed ensures there is not a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to update this comment to reflect how changes by the same user are treated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch- fixed
I am wondering how we deal with the situation where a user updates the spec while the broker is committing the last update from that user. Do we have to worry about a race condition where two updates come in at the same time? I am surprised there are no required changes to the reconciliation loops. I wonder if the update should reset the backoff for the current queued request if there is one so they do not have to wait the max timeout for an instance in a bad state. We just don't have any test coverage of this case in the current test framework... |
Won't the Generation stuff cover this? |
I honestly do not know at the moment, I would have to look at the code again. @kibbles-n-bytes just refactored a bunch of this so I would like his input here. I don't know if the lock was fundamental to know which spec/generation was the thing we sent to the broker. Especially for async? I am just not sure... |
One of life's mysteries :-) |
@n3wscott On the reconcilation-loop-side of things, there are a couple things we should do.
As far as unlocking the spec for updates, though, same-user or not we have a bit of an issue... If an update request is in-flight, and someone updates the spec, the next status update will fail, so we'll lose all information of what the broker responded with. With a locked spec we can resend the request and (as long as the secret parameters haven't changed) we'll get the same response back. The odds of this response being different from the last one are small (if it was a retry), but it does hurt us in that we have no way of verifying what the true state of the world is, so our |
} | ||
|
||
if ctx != nil { | ||
if user, _ := genericapirequest.UserFrom(ctx); user != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have any idea if this works at all with our apiserver? Does this code only work when originating-identity is turned on? Is that the only case we're concerned with. I think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the lock is on all the time. This does seem to work with our apiserver since the test appears to pass :-)
|
||
if ctx != nil { | ||
if user, _ := genericapirequest.UserFrom(ctx); user != nil { | ||
newUID = user.GetUID() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be getting the UID of the new user directly from the ServiceInstance rather than from the context
The new user is set in the ServiceInstance as part of preparing the update, which happens before validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would certainly be easier! :-) trying it now....
@staebler thanks for the comment - I've updated it and it much smaller change now - thanks |
Adding |
@kibbles-n-bytes IIRC you have merged this change recently?
When a generation bump has occured, the updated spec will be picked up only after the current operation has cleaned up, i.e. either succeeded or failed and orphan mitigation finished. This code is part of #1765 So I think we have already fixed these 2 issues in master (unless there is some bug in implementation). |
newUID = new.Spec.UserInfo.UID | ||
} | ||
|
||
disableLocking := utilfeature.DefaultFeatureGate.Enabled(scfeatures.DisableInstanceLocking) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if disableInstanceLocking
is true, we can just return true at the beginning of this func
Signed-off-by: Doug Davis <[email protected]>
Signed-off-by: Doug Davis <[email protected]>
Added a feature gate to disable all locks, but if we are locking we allow edits from the same user. |
I'm liking this as a way forward on this subject |
pkg/features/features.go
Outdated
@@ -73,4 +79,5 @@ var defaultServiceCatalogFeatureGates = map[utilfeature.Feature]utilfeature.Feat | |||
AsyncBindingOperations: {Default: false, PreRelease: utilfeature.Alpha}, | |||
NamespacedServiceBroker: {Default: false, PreRelease: utilfeature.Alpha}, | |||
ResponseSchema: {Default: false, PreRelease: utilfeature.Alpha}, | |||
DisableInstanceLocking: {Default: false, PreRelease: utilfeature.Alpha}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend changing this feature flag to be called InstanceLocking
(or something else) and default it to true.
The reason why is there are so many bugs in the history of feature flags that relate to people not understanding what it means to false a disableFoo flag. People are super bad at understanding the negative in the flag name. It is cleaner to just call it a flag for a feature and enable (true) flag means on, disable (false) means off. The feature is Instance Locking, and we default the feature on (true) (enabled).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point @n3wscott, how about OriginatingIdentityLocking
, since this will eventually apply to bindings too?
Signed-off-by: Doug Davis <[email protected]>
if old.Generation != new.Generation && old.Status.CurrentOperation != "" { | ||
errors = append(errors, field.Forbidden(field.NewPath("spec"), "Another update for this service instance is in progress")) | ||
|
||
// If the OriginatingIdentityLocking feature is not set then only allow the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong comment, the condition is reverse: is not set
-> is set
newUID = new.Spec.UserInfo.UID | ||
} | ||
|
||
if old.Generation != new.Generation && old.Status.CurrentOperation != "" && oldUID != newUID { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@duglin I'm not sure if this condition is sufficient for you. The old.Status.CurrentOperation
is being set at the beginning of reconciling the new generation. This means that:
- User B could slip through his changes before User A's changes are picked up by controller-manager
- If the controller-manager is down (while apiserver is not), there could be a lot of spec changes from different users collected in the meantime.
The condition for locking should not be based on whether there is an operation in progress. It should be based on whether controller has finished processing the current generation (i.e. either succeeded, or failed and won't retry).
So I would suggest to change the condition to something more strict.
I suggest to copy the isServiceInstanceProcessedAlready
method from controller and change the condition to
if old.Generation != new.Generation && oldUID != newUID && !isServiceInstanceProcessedAlready(old) { ...
and copy (will also need to change v1beta1
-> sc
package)
https://github.com/kubernetes-incubator/service-catalog/blob/e15b73719911853d3755b71f3d8b26b21296d0a3/pkg/controller/controller_instance.go#L818-L826
P.S. I know that the condition was written this way before, but given that we decided to change it, it's worth fixing it as well.
@duglin see my comments above. once you fix them, I am happy with merging this change and rejecting mine. |
@@ -58,6 +58,17 @@ const ( | |||
// owner: @luksa | |||
// alpha: v0.1.12 | |||
ResponseSchema utilfeature.Feature = "ResponseSchema" | |||
|
|||
// OriginatingIdentityLocking controls whether we lock OSB API resources | |||
// that are being updated while we are still processing the update request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
processing the update request
-> haven't finished processing the current spec
(and rewrite the whole comment to reflect that)
@duglin what about |
That was for the instance polling queue, but the regular instance reconciliation queue also has a similar issue here.
As far as I can tell, not if there's a spec change while a request is in flight. In that case, the API server update with the response from that request would fail, and the reconciler would reattempt |
Well, actually, if we merge this with current master we may not actually have the issue with the instance reconciliation queue; I haven't tested it, but I'm pretty sure the rate limiter for regular reconciliation is actually broken today lol. The worker will call |
(The other issues I believe are still valid, though.) For the instance update issue, I've been talking with Doug offline, and it seems like changing our status updates to do a GET -> edit -> PUT for the resources should soothe my concerns in 99.9% of situations. This way, failures to update resource statuses would be minimized, and our surface area for getting into a wonky state is vastly reduced. For provisioning, though, I think we should still be defensive, because we have the ability to fix ourselves with orphan mitigation. |
@kibbles-n-bytes I will need to look at the code again, but my understanding is that we store all inline parameters in the So you're right of course that we can fail to record I don't see how unlocking makes things worse there, and will need to read the code again next week. |
@kibbles-n-bytes also even if this issue is valid, we should have a test for reproducing it. |
@nilebox It makes it worse because the number of circumstances that would cause us to fail to record With the spec unlocked, updating the spec during a reconciliation in which we're communicating with the broker would cause this behavior every time. Unless something has changed from what I remember, spec updates will cause the resource version of the instance to change, which will cause the API server to reject the status update request when we go to record the result of the operation. And then the new reconciliation would pick up and have no idea as to whether we had or hadn't actually sent a request for the resource. I see two options out of this:
AND/OR
Do you mean we should have a test for it today? it's not an issue currently, but will be once the unlock is introduced. The integration test would involve blocking the broker's response until the test controller makes an update to the instance spec, which currently couldn't be implemented due to the lock. |
@kibbles-n-bytes after examining the code, I see the problem, and I think it's a bug in the existing code. The current behavior is the following
if instance.Status.CurrentOperation == "" {
instance, err = c.recordStartOfServiceInstanceOperation(instance, v1beta1.ServiceInstanceOperationProvision, inProgressProperties)
...
} which means that My proposal is to make both OSB request preparation and updating While agree that this bug becomes more critical with removing locks, I still think it's independent and needs fixing ASAP (probably as a separate PR). |
@duglin: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Due to lack of activity, we're closing this. If you still have an interest in this, please reopen with these changes. |
Unblocks #1755
Signed-off-by: Doug Davis [email protected]