Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(service): Improvements for waiting on service readiness #431

Merged
merged 1 commit into from
Oct 10, 2019

Conversation

rhuss
Copy link
Contributor

@rhuss rhuss commented Oct 7, 2019

  • Increased default timeout to 600s. This timeout will be triggered
    only when the Ready condition stays in UNKNOWN for that long. If its
    True or False then the command will return anyway sooner.
    So it makes sense to go for a much longer timeout than 60s
  • Enhanced output to indicate the progress

This change needs some updates to the API and introduces a 'MessageCallback'
type which is calle for each intermediate event with the "Ready" condition message.

Example of the UX:

asciicast

@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Oct 7, 2019
@knative-prow-robot knative-prow-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 7, 2019
Copy link
Contributor

@maximilien maximilien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall nice. Definitely better UX. Left some feedback to consider and couple minor nit.

@@ -59,7 +59,7 @@ kn service create NAME --image IMAGE [flags]
--requests-memory string The requested memory (e.g., 64Mi).
--revision-name string The revision name to set. Must start with the service name and a dash as a prefix. Empty revision name will result in the server generating a name for the revision. Accepts golang templates, allowing {{.Service}} for the service name, {{.Generation}} for the generation, and {{.Random [n]}} for n random consonants. (default "{{.Service}}-{{.Random 5}}-{{.Generation}}")
--service-account string Service account name to set. Empty service account name will result to clear the service account.
--wait-timeout int Seconds to wait before giving up on waiting for service to be ready. (default 60)
--wait-timeout int Seconds to wait before giving up on waiting for service to be ready. (default 600)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to also have a --no-wait?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

600 seconds == 10 minutes. Is that too much for a default? I feel like one minute (60 seconds) should be good enough. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had 60s before, but that was often too short, especially when the created Pods still have to pull the image, which is often longer than one minute. In that case a "timeout error" occurs as in #423, which is confusing as the service itself is ready eventually.

As I'm assuming that as long the "Ready" condition is in the state "UNKNOWN" I rely on Knative that this eventually will be reconciled to either "TRUE" or "FALSE". If this is the case, the command will return. It will only run into a timeout when staying in UNKNOWN. So from my experience, we should rely on Knative serving to do that reconciliation and e.g. if it can't get the service up in a certain amount of time, will be put the "Ready" condition to "FALSE". Hence this large timeout, which could maybe even longer (not sure if there are image that large that it will take longer than 10 mins, but then a sync wait on the CLI without a real progress meter is tedious, too).

I added the detailed feedback on the event as a poor-men progress meter replacement, so that the user sees that something is going on even when the creation/update takes a bit longer.

That output is also helpful for scripting use cases as it helps in debugging timing issues.

Still, with a -q we should certainly not show that messages.

One can argue that one migt move these event to --verbose, but then one would lose the simple progress indicator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--no-wait is achieved with --async (that was chosen over no-wait at the time when the feature was introduced)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I can live with that. Thanks.

@@ -38,8 +39,8 @@ type waitForReadyConfig struct {
type WaitForReady interface {

// Wait on resource the resource with this name until a given timeout
// and write status out on writer
Wait(name string, timeout time.Duration) error
// and write event messages for uknown event to the status writer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*unknown

@maximilien
Copy link
Contributor

/ok-to-test

@knative-prow-robot knative-prow-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Oct 7, 2019
@navidshaikh
Copy link
Collaborator

From e2e, ContainsAll comparisons need update.

@rhuss rhuss force-pushed the pr/wait-improvements branch from fefb84e to 657e103 Compare October 8, 2019 09:36
@navidshaikh
Copy link
Collaborator

@rhuss Since we're updating the text printed after service create and update, how about considering #290 part of this?

@rhuss
Copy link
Contributor Author

rhuss commented Oct 8, 2019

@rhuss Since we're updating the text printed after service create and update, how about considering #290 part of this?

Yes, that makes sense for the sync case as the revision would be available at that point, too (we could just add it to the 'showUrl()' method). Or we could add it to the event list, so that the final entry could look like:

       12.342s Ready to serve (Revision: myservice-2-3bmz3)

Another idea would be to have a kn service create ... -o yaml which could output a stream of JSON objects, which then would be even easier to parse in a shell loop (like this kind of shell-based controller.

This would us also allow to not think about how to format human-readable output that it can be easily parsed (which I think is dangerous anyway in the long run as we can easily break things if we make the concrete output format part of a kind of "API")

@navidshaikh
Copy link
Collaborator

   12.342s Ready to serve (Revision: myservice-2-3bmz3)

A related note on this, not all service update operations would result in a new revision for eg: updating only traffic portions and/or tags of them.

The Ready to serve line would be there irrespective of update operation generating a new revision or not. How about printing revision name in a separate line (wherever it fits best) and it appears in the output only if operation generates a revision?

@rhuss
Copy link
Contributor Author

rhuss commented Oct 9, 2019

BTW, one reason why I would love to have the URL on a separate line also for human readable output is highlighted in https://youtu.be/9rnrOK0Ifqs?t=1876 : As you can see its not so easy to select an URL in floating text than having a double (or tripple) mouse click on a single line.

Not sure whether we should do that for a revision, too.

Maye we can rephrase the last URL line as:

Sevice 'demo' with revision 'demo-1-abced' can be reached at
https://demo.example.com

@navidshaikh
Copy link
Collaborator

@rhuss : The URL on a separate (last) line is good, +1 for that.

For,

Sevice 'demo' with revision 'demo-1-abced' can be reached at

the service might not necessarily be serving from this single revision.

WDYT about this:

For service create:

Service 'demo-1' created in namespace 'default'.
Revision 'demo-1-abced' created in namespace 'default'.
[..]

URL:
http://demo.default.example.com

For update creating a new revision:

Service 'demo-1' updated in namespace 'default'.
Revision 'demo-1-wxyz' created in namespace 'default'.
[..]

URL:
http://demo.default.example.com

For update not creating a new revision:

Service 'demo-1' updated in namespace 'default'.
[..]

URL:
http://demo.default.example.com

@rhuss
Copy link
Contributor Author

rhuss commented Oct 9, 2019

the service might not necessarily be serving from this single revision.

I thought for kn service create we will only have one revision ? (not sure about kn service create --force though)

@rhuss
Copy link
Contributor Author

rhuss commented Oct 9, 2019

The revision can be inferred best only after the service is ready, not when the Service CR is created.

The output for sync create looks like (with a revision added after the service reconciled):

Creating service 'svc' in namespace 'default':

    0.123 bla
    1.245 bla
   10.345 Ready to serve.

Revision: 'svc-1-sada'
URL:
https://demo.com

I don't think we need to repeat the namespace for the revision name.

@rhuss rhuss force-pushed the pr/wait-improvements branch from 3d65623 to a0cb9d8 Compare October 10, 2019 08:08
@rhuss
Copy link
Contributor Author

rhuss commented Oct 10, 2019

@navidshaikh I added now the revision name (uncoditionally, but for update I could try to find out whether the revision has changed and add an "(unchanged)" if not.

I also updated the asciinema video in the issue comment to reflect this change.

@rhuss
Copy link
Contributor Author

rhuss commented Oct 10, 2019

@navidshaikh updated now with "(unchanged)" as suffix to a revision if that revision has not changed (also did an update of the video)

* Increased default timeout to 600s. This timeout will be triggered
  only when the Ready condition stays in UNKNOWN for that long. If its
  True or False then the command will return anyway sooner.
  So it makes sense to go for a much longer timeout than 60s
* Enhanced output to indicate the progress

This change needs some updates to the API and introduces a 'MessageCallback'
type which is calle for each intermediate event with the "Ready" condition message.
@rhuss rhuss force-pushed the pr/wait-improvements branch from d650365 to 6017c23 Compare October 10, 2019 08:25
@knative-metrics-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-knative-client-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/kn/commands/service/create.go 78.2% 76.2% -2.0
pkg/kn/commands/service/service.go 86.7% 88.0% 1.3
pkg/kn/commands/service/update.go 74.1% 75.4% 1.4
pkg/serving/v1alpha1/client.go 82.8% 82.2% -0.6
pkg/serving/v1alpha1/client_mock.go 95.8% 93.9% -1.9
pkg/wait/wait_for_ready.go 80.7% 71.6% -9.1

Copy link
Collaborator

@navidshaikh navidshaikh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 10, 2019
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: navidshaikh, rhuss

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

dsimansk added a commit to dsimansk/client that referenced this pull request Sep 9, 2024
* [release-v1.14] Add kn-plugin-source-kafka v1.14

* [release-v1.14] Add kn-plugin-event v1.14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants