Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dont wait for delete operations to be completed by default #770

Merged
merged 2 commits into from
Apr 8, 2020

Conversation

duglin
Copy link
Contributor

@duglin duglin commented Apr 1, 2020

In it's current state it now takes me about 25 seconds for the kn delete
to complete. Before #682 it used to be
almost immediate. This is because we now pass in the
DeletePropagationBackground flag. I believe this is a mistake, not only
because of the 20+ seconds of additional time to delete things, but IMO
the CLI should talk to the server in the same way regardless of the --wait
flag. That flag should just be a CLI thing to indicate if the user wants the CLI
to wait for the server to complete but not HOW the server should do the delete.

Signed-off-by: Doug Davis [email protected]

In it's current state it now takes me about 25 seconds for the `kn delete`
to complete. Before knative#682 it used to be
almost immediate. This is because we now pass in the
`DeletePropagationBackground` flag. I believe this is a mistake, not only
because of the 20+ seconds of additional time to delete things, but IMO
the CLI should talk to the server in the same way regardless of the --wait
flag. That flag should just be a CLI thing to indicate if the user wants the CLI
to wait for the server to complete but not HOW the server should do the delete.

Signed-off-by: Doug Davis <[email protected]>
@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Apr 1, 2020
Copy link
Contributor

@knative-prow-robot knative-prow-robot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duglin: 0 warnings.

In response to this:

In it's current state it now takes me about 25 seconds for the kn delete
to complete. Before #682 it used to be
almost immediate. This is because we now pass in the
DeletePropagationBackground flag. I believe this is a mistake, not only
because of the 20+ seconds of additional time to delete things, but IMO
the CLI should talk to the server in the same way regardless of the --wait
flag. That flag should just be a CLI thing to indicate if the user wants the CLI
to wait for the server to complete but not HOW the server should do the delete.

Signed-off-by: Doug Davis [email protected]

Description

Changes

Reference

Fixes #

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow-robot knative-prow-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Apr 1, 2020
@dsimansk
Copy link
Contributor

dsimansk commented Apr 2, 2020

It was introduced as a part of synchronous delete operation to ensure all dependant resources are deleted for the service and we wait in-sync for the service resource to be gone.

  • DeletePropagationBackground is added for --no-wait and shouldn't cause any blocking. Afaik, it should be cluster default behaviour, but I do agree we could pass empty value to ensure server-side is honoured.
  • DeletePropagationForeground is current default and configured by --wait flag. It's part of synchronous delete operation, otherwise there should be other mechanism to ensure removal of everything related to service.

From my local test delete --no-wait is immediate and delete --wait takes some time as expected from synchronous operation.

@duglin
Copy link
Contributor Author

duglin commented Apr 2, 2020

I think the biggest issue is that --no-wait is not the default for kn so the new 20+second delay I was seeing was very noticeable/unexpected and gave me a bad UX. Perhaps just changing the default would be sufficient

However, I still wonder if --no-wait should change what we send to the server. I suspect many folks will think of it the way I did... it's just CLI thingy that only changes how long it takes for my cmd prompt to come back to me - it doesn't change what we tell the server to do - which I believe is true for other --no-wait flags on our other commands. If we want the user to be able to influence the DeletePropagation flag then I think we should introduce a flag to specifically allow the user to control that knob w/o overloading the --no-wait flag.

But I actually think that's a bit too low level for our CLI. I think the biggest concern for people is "when can I create a new ksvc with the same name w/o getting an error about that name being used?". And the default/DeletePropagationForeground does that. How long it takes to delete other resources that are hidden from the user, and would not influence any subsequent cmd they execute, isn't really that interesting except to a pretty advanced user.

@dsimansk
Copy link
Contributor

dsimansk commented Apr 2, 2020

I think the biggest issue is that --no-wait is not the default for kn so the new 20+second delay I was seeing was very noticeable/unexpected and gave me a bad UX. Perhaps just changing the default would be sufficient

The motivation to have --wait as default is alignment to create or update. Personally, I don't have a strong preference here.

However, I still wonder if --no-wait should change what we send to the server. I suspect many folks will think of it the way I did... it's just CLI thingy that only changes how long it takes for my cmd prompt to come back to me - it doesn't change what we tell the server to do - which I believe is true for other --no-wait flags on our other commands. If we want the user to be able to influence the DeletePropagation flag then I think we should introduce a flag to specifically allow the user to control that knob w/o overloading the --no-wait flag.

But I actually think that's a bit too low level for our CLI. I think the biggest concern for people is "when can I create a new ksvc with the same name w/o getting an error about that name being used?". And the default/DeletePropagationForeground does that. How long it takes to delete other resources that are hidden from the user, and would not influence any subsequent cmd they execute, isn't really that interesting except to a pretty advanced user.

The overall goal to have a synchronous delete was to ensure there aren't unexpected race condition between calling CRUD commands in rapid succession, e.g. scripts, E2E tests etc. The case of "service name collision" is on point here as well. I understand Service resource as an umbrella that should ensure removal of everything related to prevent unwanted dangling resources.

Wrt the delete policy flag, I lean strongly toward "too low level" approach. The policy should be opinionated default of synchronous delete. In this case it's part of delete everything strategy.

@duglin
Copy link
Contributor Author

duglin commented Apr 2, 2020

The motivation to have --wait as default is alignment to create or update. Personally, I don't have a strong preference here.

Totally agree, but in this case I think we only need to wait for the ksvc to vanish, not the hidden resources.

@duglin
Copy link
Contributor Author

duglin commented Apr 2, 2020

Net: 20+ seconds for an out-of-the-box kn service delete is not good :-)

@rhuss
Copy link
Contributor

rhuss commented Apr 2, 2020

Kubectl uses cascading delete by default, and we should do that, too. Because if you don't do that, who is going to delete the revisions of a service afterwards ? (afaik, if you don't do a cascading delete those revisions, even when they have an ownerReference will just stay around forever. tbv)

However, I think for delete we should really switching to async by default. Because when you delete something you more often don't care anymore about the thing, which is different than when you create something which you might want to use immediate.

So my suggestion is:

  • Do cascading delete all the time
  • Switch to async delete by default, which can be switched on with --wait .We probably will have to adapt the E2E tests, too to add --wait

@rhuss
Copy link
Contributor

rhuss commented Apr 2, 2020

My fear is if we don't do a cascade delete we are left with orphaned objects.

@duglin
Copy link
Contributor Author

duglin commented Apr 2, 2020

to be clear... I don't think cascading or not is the issue, either way we'll delete stuff. The question is how we ask the server to do the delete and whether we wait on the CLI side or not.

Switching to --no-wait by default is fine, as long as it will at least wait until the ksvc is gone - and that's always been quick for me and what kn used to do. I think the issue here is whether kn's --wait flag is just a CLI thing or not because we're mixing two concepts into one with the current code base.

I'll repeat what I said above:

However, I still wonder if --no-wait should change what we send to the server. I suspect many folks will think of it the way I did... it's just CLI thingy that only changes how long it takes for my cmd prompt to come back to me - it doesn't change what we tell the server to do - which I believe is true for other --no-wait flags on our other commands. If we want the user to be able to influence the DeletePropagation flag then I think we should introduce a flag to specifically allow the user to control that knob w/o overloading the --no-wait flag.

I'd recommend that we keep --wait controlling how long the CLI waits for the server to complete it's task - strictly a CLI-side flag. We can then discuss a second flag to indicate what that "task" is - meaning is it what it used to do (background delete propagation), or is it a foreground delete.

Copy link
Contributor

@maximilien maximilien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make more sense to have a --cascade-delete or similar flag that does not default to true? Seems like removing that feature completely is not a step forward...

@maximilien
Copy link
Contributor

Also @duglin consider adding tests :)

@rhuss
Copy link
Contributor

rhuss commented Apr 2, 2020

For me whether we wait for all objects to be deleted (foreground) or to only the main object to be deleted (background) is also a client concern, there is no different background operation. Serverside the eventual result will be the same, whether its foreground or background deletion (with foreground delete policy relying on the kubernetes garbage collector, thanks @dsimansk for the link).

So the end result will be the same: The object and all its dependent objects are deleted (the option here is actually not related whether to cascade or not, I misunderstood that).

Its just the way how the client instructs the deletion, so I'm totally fine that this way differs for --wait vs --no-wait, to make --no-wait as fast as possible and --wait as safe as possible (important for automated use cases like our e2e tests):

@rhuss
Copy link
Contributor

rhuss commented Apr 2, 2020

Wouldn't it make more sense to have a --cascade-delete or similar flag that does not default to true? Seems like removing that feature completely is not a step forward...

I would do always a cascade delete as I don't see the use case for non-cascade delete on the abstraction level of kn. The question is whether we do the cascade delete explicit (foreground) or rely on the k8s garbage collector (background).

@duglin
Copy link
Contributor Author

duglin commented Apr 2, 2020

I'm not thrilled with the idea that --wait has different semantics based on the CLI cmd. If it means "CLI returns immediately but the server action is the same" for everything except delete then I think that inconsistency is a problem and not an ideal UX.

If you believe that the client might care about how the delete happens then linking it with whether the CLI returns immediately or not would be bad, because this means that someone can't choose 2 of the 4 possible combination of semantics.

In the end, I strongly believe that kn service delete foo needs to return as quickly as possible and it means that I can reuse that Ksvc name immediately, like it used to. Forcing a flag to get that semantics would be bad IMO.

So I'd like to suggest that we deal with that first and then have a discussion around whether, and how, to change the semantics of the delete on the server. I still like a new flag to control that because I do believe linking it with --wait is mixing topics/concerns.

@knative-prow-robot knative-prow-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 2, 2020
@knative-metrics-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-knative-client-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/kn/commands/wait_flags.go 100.0% 87.5% -12.5

@duglin
Copy link
Contributor Author

duglin commented Apr 3, 2020

/test pull-knative-client-go-coverage

@duglin
Copy link
Contributor Author

duglin commented Apr 3, 2020

Updated PR to just set the default to be --no-wait per our slack chat


// Special-case 'delete' command so it comes back to the user immediately
noWaitDefault := false
if action == "Delete" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather see default value provided through function's argument. The current approach will change default for all delete operations, e.g. also Revisions. What about a new config struct that for now can have 2 fields, default value and timeout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I didn't realize you could be more specific... I'll fix....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, I would be happy to switch to no-wait as default for all delete operations (this would then be also easy to reason about).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like consistency :-) We just need to figure out which way it spans. Making all deletes the same (as long as it doesn't have negative side-effects) seems ok to me. @dsimansk ?

Copy link
Contributor

@dsimansk dsimansk Apr 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure about all the usage. However, wait flags are used in Service (delete) and Revision (delete), no objections to go with --no-wait for all delete then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok - if we do that, then I think the PR is ready for review

@duglin
Copy link
Contributor Author

duglin commented Apr 3, 2020

Weird - the CLA bot was happy yesterday

@dsimansk
Copy link
Contributor

dsimansk commented Apr 3, 2020

/check-cla

@maximilien
Copy link
Contributor

@duglin this error: "cla/google — CLAs are signed, but unable to verify author consent" seems odd? You clearly have signed the CLA... not sure what's going on. First time seeing this.

@duglin
Copy link
Contributor Author

duglin commented Apr 3, 2020

@rgregg any ideas on the CLA issue?

@duglin
Copy link
Contributor Author

duglin commented Apr 4, 2020

@googlebot rescan

@duglin
Copy link
Contributor Author

duglin commented Apr 4, 2020

OK CLA check is now happy - ready for review

@rhuss
Copy link
Contributor

rhuss commented Apr 8, 2020

/lgtm
/approve

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 8, 2020
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: duglin, rhuss

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 8, 2020
@knative-prow-robot knative-prow-robot merged commit de12484 into knative:master Apr 8, 2020
@navidshaikh navidshaikh changed the title Remove the delete propagation flag Dont wait for delete operations to be completed by default Apr 15, 2020
@navidshaikh navidshaikh added the backport/candidate Consider this PR to be backported to the release branch label Apr 15, 2020
rhuss pushed a commit to rhuss/knative-client that referenced this pull request Apr 15, 2020
* Remove the delete propagation flag

In it's current state it now takes me about 25 seconds for the `kn delete`
to complete. Before knative#682 it used to be
almost immediate. This is because we now pass in the
`DeletePropagationBackground` flag. I believe this is a mistake, not only
because of the 20+ seconds of additional time to delete things, but IMO
the CLI should talk to the server in the same way regardless of the --wait
flag. That flag should just be a CLI thing to indicate if the user wants the CLI
to wait for the server to complete but not HOW the server should do the delete.

Signed-off-by: Doug Davis <[email protected]>

* try just tweaking the --no-wait flag

Signed-off-by: Doug Davis <[email protected]>
knative-prow-robot pushed a commit that referenced this pull request Apr 15, 2020
* (refactor) address the e2e extract / refactor of issue #763 (#765)

* (refactor) address the e2e extract / refactor of issue #763

* various updates to address reviewers feedback

* renamed lib/test/integration to lib/test and package to test

Signed-off-by: Roland Huß <[email protected]>
# Conflicts:
#	CHANGELOG.adoc
#	test/e2e/service_export_import_apply_test.go
#	test/e2e/trigger_test.go

* fix(plugin): Fix plugin lookup with file ext on Windows (#774)

* fix(plugin): Fix plugin lookup with file ext on Windows

* chore: Update changelog

* fix: Reflect review feedback

* fix: Reflect review feedback and add future todo

Signed-off-by: Roland Huß <[email protected]>
# Conflicts:
#	CHANGELOG.adoc

* fix(issue #762): correct error message when updating service (#778)

* fix(issue #762): correct error message when updating service

* correct message when updating service and passing many names
* fix issue with TestServiceUpdateWithMultipleImages running create vs update

* * added TestServiceDescribeWithMultipleNames
* added TestServiceCreateWithMultipleNames
* fix error message for service delete since many names can be passed

* Use vendored deps while running e2e locally (#783)

Also set GO111MODULE=on unconditionally

* Update sink binding create usage string (#785)

* Add "--target-utilization" to manage "autoscaling.knative.dev/targetUtilizationPercentage" annotation (#788)

* Support setting "autoscaling.knative.dev/targetUtilizationPercentage" annotation.

Signed-off-by: Roland Huß <[email protected]>
# Conflicts:
#	test/e2e/service_options_test.go

* Remove the delete propagation flag (#770)

* Remove the delete propagation flag

In it's current state it now takes me about 25 seconds for the `kn delete`
to complete. Before #682 it used to be
almost immediate. This is because we now pass in the
`DeletePropagationBackground` flag. I believe this is a mistake, not only
because of the 20+ seconds of additional time to delete things, but IMO
the CLI should talk to the server in the same way regardless of the --wait
flag. That flag should just be a CLI thing to indicate if the user wants the CLI
to wait for the server to complete but not HOW the server should do the delete.

Signed-off-by: Doug Davis <[email protected]>

* try just tweaking the --no-wait flag

Signed-off-by: Doug Davis <[email protected]>

* Fix error when output is set to name (#775)

* fix error when output is set to name

* add e2e test

* change to flags/listprint.go

Signed-off-by: Roland Huß <[email protected]>
# Conflicts:
#	test/e2e/basic_workflow_test.go

* Show all revisions when run `service describe -v` (#790)

* The `kn service describe -v` command shows repetitive revisions, because
  the revision would be covered by next one.

* Fix resource listing with -oname flag (#799)

* Fix resource listing with -oname flag

* add e2e tests

Signed-off-by: Roland Huß <[email protected]>
# Conflicts:
#	test/e2e/ping_test.go
#	test/e2e/revision_test.go
#	test/e2e/route_test.go
#	test/e2e/source_apiserver_test.go
#	test/e2e/source_binding_test.go
#	test/e2e/trigger_test.go

* Make wait, no-wait and async flags per bool var CLI convention (#802)

* Make wait, no-wait and async flags per bool var CLI convention

 Fixes #800

 - Deprecated bool vars can be supported for CLI convention
 - Bind --async flag value to --no-wait
 - Only one flag among [wait, no-wait, async] can be provided, else raise an error

* Simplify conditionals

* Add unit tests for deprecated flag async

* Fix a typo

* e2e: Foreground delete for revisions and services in e2e (#794)

* e2e: Foreground delete for revisions and services in e2e

 to avoid any race conditions and flakes

* Use --wait instead of --no-wait=false

Signed-off-by: Roland Huß <[email protected]>
# Conflicts:
#	test/e2e/basic_workflow_test.go
#	test/e2e/revision_test.go

* e2e: Run tekton e2e against pipeline v0.11.1 (#803)

* Use buildah task from master branch and paramterize FORMAT

* Configure pipeline v0.11.1

* DNM: Run tekton e2e in this PR

* Revert "DNM: Run tekton e2e in this PR"

This reverts commit 903f5be.

* Update CHANGELOG for v0.13.2 (#804)

* Pin serving to v0.13.2 and update version command (#797)

* Pin serving v0.13.2 dep to v0.13.2

* Update version command

 now points to serving v0.13.2 and eventing v0.13.6

* Copy go.sum as generated in CI

Signed-off-by: Roland Huß <[email protected]>
# Conflicts:
#	go.mod
#	go.sum
#	vendor/modules.txt

* add missing vendored files

* fixed error reporting for traffics tests

* Updated test

* fix formatting

* e2e for service export (#739)

* e2e for service export

* e2e for service export

* e2e for service export

* e2e for service export

* e2e for service export

Signed-off-by: Roland Huß <[email protected]>
# Conflicts:
#	test/e2e/service_export_import_apply_test.go

Co-authored-by: dr.max <[email protected]>
Co-authored-by: David Simansky <[email protected]>
Co-authored-by: Navid Shaikh <[email protected]>
Co-authored-by: Lv Jiawei <[email protected]>
Co-authored-by: Doug Davis <[email protected]>
Co-authored-by: Ying Chun Guo <[email protected]>
Co-authored-by: Murugappan Chetty <[email protected]>
@navidshaikh navidshaikh added backport/pr A backport PR which is target to a release branch. and removed backport/candidate Consider this PR to be backported to the release branch labels Apr 20, 2020
@rhuss rhuss added backported-to/0.13 and removed backport/pr A backport PR which is target to a release branch. labels Apr 20, 2020
@duglin duglin deleted the fixDelTime branch August 31, 2020 02:56
dsimansk added a commit to dsimansk/client that referenced this pull request Aug 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants