-
Notifications
You must be signed in to change notification settings - Fork 382
Delete field for instance and binding #2037
Conversation
@carolynvs @jberkhahn how do I fix the mismatch in svcat tests due to the new field? |
solution is Not terribly discoverable on what that means or how to fix it. Needs a better error message. |
Not sure about that bit, Carolyn's done most of the work in the cmd libraries. I do know she was working on rewriting the integration tests as proper e2e tests, though. I'm pairing with her tomorrow on some test work for the pkg svcat libs, I can ask her. |
I am fixing that in the dev guide today to document how the golden files work. I'm not certain that the switch to e2e tests will remove the golden files entirely but we'll find out more when I work with Jonathan on it later today. |
pkg/apis/servicecatalog/types.go
Outdated
@@ -774,6 +774,10 @@ type ServiceInstanceSpec struct { | |||
// allows for parameters to be updated with any out-of-band changes that have | |||
// been made to the secrets from which the parameters are sourced. | |||
UpdateRequests int64 | |||
|
|||
// DeletionRequested provides an imperative means to request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment doesn't really make it clear why we are adding this way to delete or when someone would want to use it. Maybe something like this?
// Deletion requested provides a mechanism for requesting a delete without performing a direct
// delete (which cannot be undone in cases where the delete is rejected by the broker). This is the
// recommended way to delete, to avoid resources becoming stuck in the deletion phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, you betcha.
At this point I want to establish the scaffolding to build everything on. Since openapi uses these comments, they do become documentation.
pkg/apis/servicecatalog/types.go
Outdated
|
||
// DeletionRequested provides an imperative means to request | ||
// deletion by doing an update. | ||
DeletionRequested bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this matches with the rest of kubernetes nor does it mesh with the norms of OSB. I would suggest:
unbindTimestamp time.Time
Then you can set the time the unbind was requested. That is declarative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then roll back by blanking it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would work the same way as your bool, but the date matches with how k8s does deletes already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see a different benefit in that we can time out the request at some point. @duglin was worried about that case.
|
||
// DeletionRequested provides an imperative means to request | ||
// deletion by doing an update. | ||
DeletionRequested bool `json:"deletionRequested"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this matches with the rest of kubernetes nor does it mesh with the norms of OSB. I would suggest:
deprovisionTimestamp time.Time
Then you can set the time the deprovision was requested. That is declarative.
// performing a direct delete (which cannot be undone in cases where the | ||
// delete is rejected by the broker). This is the recommended way to | ||
// delete, to avoid resources becoming stuck in the deletion phase. | ||
DeletionRequested metav1.Time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this should be deprovisionTimestamp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with this being a timestamp, do we expect people to actually set the time? That seems odd to me. And what does its value mean? What if its Jan 1, 1970? What is its Dec 31, 2020? Are we really supporting scheduled deletes? Is that even a thing in kube?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
people should not set the time, svcat would, and if we can hijack the k8s delete command, that would.
time before now means do a delete. So if Jan 1, 1970 and the resource is not deleted, try to delete it. k8s does not support a future date, see deletionTimestamp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we agree to hijack delete? And we need it to work with vanilla kubectl, so I do not think asking people to set a time is a good UX, when 99+% of the time I bet they mean "do it now".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is bad UX to not have a time tagged to when you attempted that action. How do you know if the status changed before of after a delete was requested? How do you know when service catalog should use a bigger hammer to unbind or deprovision the thing?
the validator could hijack the setting of these field to be set to now. so user sets it to anything they want, including true or now, and then the validator replaces that with time.Now().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not against there being a timestmap someplace. What I'm against is making the user set a timestamp as their UX for delete(). That's a weird design. The normal path for Kube is the user does an HTTP DELETE and as a side effect a timestamp is set. That's very different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are outside the area of calling DELETE directly on any of our objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I am fine with keeping the name as it, with a single field that's just a timestamp and relying on svcat to handle calling it appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we stay with an api server, the clear choice is to make a sub-resource for each type to do the deprovision or unbind, and those methods modify this field. then the user flow is kubectl unbind ServiceInstance foobar
but in the direction service catalog should go using CRDs that is not an option, so I am saying the existence of the field, even an empty string, allows the admission controller to set it to a correct timestamp.
// performing a direct delete (which cannot be undone in cases where the | ||
// delete is rejected by the broker). This is the recommended way to | ||
// delete, to avoid resources becoming stuck in the deletion phase. | ||
DeletionRequested metav1.Time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this should be unbindTimestamp
// performing a direct delete (which cannot be undone in cases where the | ||
// delete is rejected by the broker). This is the recommended way to | ||
// delete, to avoid resources becoming stuck in the deletion phase. | ||
DeletionRequested metav1.Time `json:"deletionRequested,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still the right name now that it's not a bool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope
I do not understand why the names have to be distinct per resource. I do not understand why the type should be encoded into the resource name. It is the |
I am trying to avoid adding complications to this. I can split this up into two fields, a bool in spec for a user to tweak and a timestamp in status for the controller to manage. I do not expect users to be directly creating timestamps to delete things. I expect it to be a programmatic client of some sort. |
svcat 4 lyfe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though I'm staying out of the naming dogfight. 😀
Adding do not merge until we can discuss the use of a timestamp as a boolean. Sorry but that's just a weird UX. |
I think this needs a design doc rather than going straight to code to work those details out. |
+1 - I'm still not convinced this is something we should really do, personally. It's going to be a much better use of time at this point to hash the details out 'on paper'. |
I don't care too terribly much if this is added, but from a preliminary look-over if we do add it I think we should do it in a declarative manner as I designed in #1942 (comment) |
took existsExternally from Michael Kibbe comment
Ddded a proposal and flipped the delete flag per @kibbles-n-bytes comment. |
|
||
Add a condition for ServiceInstanceConditionType, ServiceBindingConditionType. | ||
```go | ||
ServiceInstanceConditionDeleted ServiceBindingConditionType = "Deleted" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy/pasta error here
broker resource. If the broker delete succeeds, it sets the deleted | ||
status condition. | ||
|
||
If the field is set to false, no updates are allowed besides |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this limitation? Perhaps you want to stage a bunch of updates to a non-existent resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a simpler case.
What would it mean to update something that is going away? If there is some update required for the deletion to go through, I would make the update first and make sure it is committed before deleting.
What would staging updates to a non-existent resource mean? What would it do? Once the resource is gone, it is not going to come back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExistsExternally=false means the broker holds no resource for you and you are staging changes or you shutdown the external resource. flipping to true means provision it again.
What would it mean to update something that is going away?
I am staging changes, or I am getting a whole set of resources preped for a deploy. Or I want to shut the services down for a bit and then intend to bring them back later.
before deleting.
Stop thinking of it as deleting, it is deprovisiong, and it can be provisioned again in the same way.
Once the resource is gone, it is not going to come back.
Let it come back if ExistsExternally = true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/kubernetes-incubator/service-catalog/pull/2037/files#r189993692
"flipping to true means provision it again." That is not the workflow I was suggesting to implement or support.
"getting a whole set of resources preped for a deploy. Or I want to shut the services down for a bit and then intend to bring them back later." not the workflow suggested or intended to implement or support.
" it is deprovisiong, and it can be provisioned again in the same way." I'd prefer to have only one way to do things.
"Let it come back if ExistsExternally = true." not the workflow suggested or intended to implement or support.
|
||
### Controller-Manager Changes | ||
|
||
Additional condition to determine delete flow is added. This condition is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What safe-gaurds need to be added to prevent ExistsExternally=false
and then immediately deleting the k8s resource? Can you think about the status fields that need to be controlled?
Also this will change how DeprovisionRequired
and UnbindRequired
are managed, this should be outlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the case. If someone calls delete directly after setting that, it's not an update. We can't block deletes anyway, so it'll add the finalizer as usual and go through the existing flow.
Part of this work should be coming up with rbac rules to prevent access to DELETE. I think this is an extension of rbac that does not currently exist. Created #2061
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are making the assumption that marking ExistsExternally=false means service catalog will garbage collect that resource at some point in the future, acting as a delayed delete. This is not the usecase we are suggesting at all. We are saying the resource continues to exist in k8s indefinitely until a real DELETE is called, as normal.
RBAC could be applied to prevent normal people from removing resources and an elevated robot account could garbage collect, but I think this should happen externally to service catalog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, that's not my assumption. A workflow of:
- set some "delete on broker" flag
- wait
- if successful, then delete it for real
is not a very good UX. I'd prefer to not force them to do steps 2 and 3 when we can do it for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the feature you want, and I do not believe it is a feature anyone else wants.
A feature that would do what you want and still be useful to the community would be the described ExistsExternally functionality + a robot script on your side gives you the workflow you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, its a svc-cat feature but its being done in an attempt to be OSBAPI compliant. From past discussions no one else seems to care about this aspect of the spec and are comfortable with the idea of a broker saying "no" after the Kube delete process has started, and can't be stopped. So, I'm looking for a nice UX solution for people who will actually use this feature, not for people who won't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#2037 (comment)
"at some point in the future" yes, but as instant as it currently exists
"as a delayed delete" yes, but no more than the existing delay of calling DELETE and waiting for the finalizer to run.
"the resource continues to exist in k8s indefinitely until a real DELETE is called, as normal." no, that does not match my expectations.
#2037 (comment)
"a robot script on your side gives you the workflow you want." I call that the controller. As part of the existing controller.
The summary seems to be that we agree that we need a field to indicate that we want this process to start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we agree at all yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this as a OSBAPI feature, I see it more as a service catalog feature.
@n3wscott it is a confusing feature for Kubernetes users, who are the target users of Service Catalog, not the ones who know OSB API. Normal users won't set the flag, they will just kubectl delete serviceinstance ...
, they never need to update the Pod spec to shutdown the application, they just delete it.
So I would argue that this delete flag is only useful for emulating OSBAPI deprovisioning semantics to address a corner case of rejected deleted request. If there was no such corner case, users would never ask for deprovisioning without Kubernetes delete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, a corner case caused by the interaction between the differing behaviors of kubernetes and osbapi.
|
@duglin proposal supersedes code for now. |
I disagree. I think @kibbles-n-bytes 's idea was better to call it ExistsExternally is completely declarative, deletionRequested is imperative and the recovery case is really hard to understand. What happens when this flag is removed after setting it? After that do you need to have deletionRequested=false in the spec? what does deletionRequested=false mean for that resource longterm?! What happens when I create a resource with deletionRequested=true to start with? I understand what all those cases mean with ExistsExternally. |
I don't actually care about the name - its type matters more. But since we're in the weeds... "ExistsExternally" sounds more like a Status field than a Spec field. I would never have guessed that setting it to "false" would actually delete something. If I set it to "false" and then set it back to "true" will it recreate it? It shouldn't, yet how its being proposed it should. |
#2037 (comment) I am proposing a one-way flag. Existance -> Non-Existance. Non-Deleted -> Deleted. No going back in the state of the resource once the level-change is achieved. |
Deleted is a terminal state that cannot be exited once it has been entered. |
- Add a condition to indicate permanent end state of deletion success. | ||
|
||
Once the field is activated the controller deprovisions the broker | ||
resource. When successfully deleted the controller then deletes the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if deprovision via OSB fails? There are various cases, e.g.:
- initial request is rejected with
400 BadRequest
,403 Forbidden
or some other 4xx client-side error that should have no side effects - initial request ends with
5xx
server error - initial request processing times out
- initial request is accepted, triggering async deprovisioning and
last_operation
returns failure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this the normal cases that are already handled? We're doing a deprovision outside of calling delete. Everything else about the broker operation and responses is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. What's the point then? What do you achieve by doing exactly the same as with normal delete?
As far as I understand, the only reason for adding new flag is because you can't "undelete" Kubernetes resource: once "deletionTimestamp" is added, you can't remove it and can't change the spec. And the only reason for wanting to "undelete" is when broker rejects a deprovisioning request.
Do you unset the flag or allow updating the spec in some cases? If not then I don't see the point of having it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, in case you don't know. Currently when user marks ServiceInstance
for deletion, it doesn't get deleted by GC immediately, because we have a custom "Service Catalog finalizer" added to it. Once deprovisioning succeeds, svc-cat controller removes the finalizer and GC proceeds with deletion. If not, we retry until exceed limits, and then get stuck in the "failed to deprovision" state (and the only way to delete this ServiceInstance resource is to manually delete the finalizer).
It looks like you try to emulate exactly the same mechanics in your design. Why???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because, once the deletion timestamp is added, it cannot be removed. The bool can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if you remove it in the middle of deprovisioning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on when you mean in the middle. If it's after the deprovision call, that is synchronous and is a yes or no. If we catch it before the deprovision, it wouldn't do it. And if we catch it after, it's already done. We should prevent it from being unset, but the key thing is that the status changes to some form of "gone, permanently". I never proposed and do not intend to implement or support flipping back and causing a (re)provision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so then you'll need to add some extra flag like permanentDeletion bool
to the status.
This field is set to true by default. The user indicates that it wants | ||
to delete the backing resource by setting this field to false. The | ||
controller will see this update and attempt to delete the backing | ||
broker resource. If the broker delete succeeds, it sets the deleted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The happy path is obvious, it works without this new flag (via normal Kubernetes deletion).
The non-happy paths are the ones that supposed to be improved with this change, thus before we cover them this flag is useless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The happy-path is the currently broken path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MHBauer for example? Currently, if resource is successfully deprovisioned, we just remove the finalizer and proceed with deletion. Do I miss something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't about successful vs uncsuccessful.
This is about the delete not being allowed by the broker, yet kube will allow it. Once kube has allowed it, it cannot be unallowed. This denies the appropriate user the ability to do updates or deletes at all. The broker resource is stuck existing.
dependent on the state of the new field. Reuse all applicable status | ||
objects as appropriate. | ||
|
||
Proceed through the normal deletion flow and if the broker allows the delete, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and if not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am strongly opposed to this change until we reach consensus on non-happy-path.
As far as I understand, the new "deletionRequested" (or "existsExternally", whatever you call it) flag is only needed for handling rejected OSB deprovision request. But I only see the words "if successfully deprovision, we'll proceed with deletion of Kubernetes resource". It's a trivial case, you don't even need a new flag for that, you can just kubectl delete serviceinstance
and achieve the same result.
Once you try covering the non-happy path though, I'm sure you'll see a lot of extra complexity in the controller logic, which is already very complicated and has issues like #2025. So I would ask for a fully detailed spec, or the implemented proof of concept in controller (whatever is preferred) before merging any change in this area.
I'm also strongly opposed to this change. I don't think adding a second way to deprovision is going to make things simpler for anyone: our users, or ourselves as developers for this project. I second @nilebox's suggestion that we make progress on #2025 before even considering proceeding on this change. |
@MHBauer: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
#2037 (comment) |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: MHBauer The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unapprove |
/close |
trying to start solving #1942 by adding a delete field to the necessary resources spec.