Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pods stability for health check. #2374

Closed
wants to merge 14 commits into from
Closed

Conversation

tejal29
Copy link
Contributor

@tejal29 tejal29 commented Jun 28, 2019

This PR checks if Pods are either running or succeed whichever event is seen first within 5 minutes.
More context #2359

Add pods Stability

An example output will look like this for a number of pods:


tejaldesai:unstable-deployment (add_pods_stability)$ skaffold deploy
Starting deploy...
kubectl client version: 1.11+
kubectl version 1.12.0 or greater is recommended for use with Skaffold
deployment.apps/unstable-deployment unchanged
Deploy complete in 791.09378ms
Waiting for deployments to stabilize
Waiting on 2 of 2 deployments
Waiting on 5 of 5 pods
Waiting on 4 of 5 pods
Waiting on 3 of 5 pods
Waiting on 2 of 5 pods
Waiting on 1 of 5 pods
Waiting on 0 of 5 pods
Waiting on 1 of 2 deployments
Waiting on 0 of 2 deployments
following resources are not stable:
deployment/leeroy-web failed due to Running [kubectl --context gke_tejal-test_us-central1-a_dump rollout status deployment leeroy-web --watch=false]: stdout , stderr: Error from server (NotFound): deployments.extensions "leeroy-web" not found
, err: exit status 1: exit status 1
deployment/unstable-deployment failed due to Running [kubectl --context gke_tejal-test_us-central1-a_dump rollout status deployment unstable-deployment --watch=false]: stdout , stderr: error: deployment "unstable-deployment" exceeded its progress deadline
, err: exit status 1: exit status 1
FATA[0001] following resources are not stable:
deployment/leeroy-web failed due to Running [kubectl --context gke_tejal-test_us-central1-a_dump rollout status deployment leeroy-web --watch=false]: stdout , stderr: Error from server (NotFound): deployments.extensions "leeroy-web" not found
, err: exit status 1: exit status 1
deployment/unstable-deployment failed due to Running [kubectl --context gke_tejal-test_us-central1-a_dump rollout status deployment unstable-deployment --watch=false]: stdout , stderr: error: deployment "unstable-deployment" exceeded its progress deadline
, err: exit status 1: exit status 1 
tejaldesai:unstable-deployment (add_pods_stability)$ 

@tejal29 tejal29 changed the title Add pods stability Add pods stability for health check. Jun 28, 2019
@priyawadhwa
Copy link
Contributor

Looks like a failure in the deploy unit test

--- FAIL: TestDeploy (0.01s)
    --- FAIL: TestDeploy/deploy_shd_perform_status_check (0.00s)
        util.go:68: unexpected error: getting client config for kubernetes client: error creating kubeConfig: invalid configuration: no configuration has been provided

pkg/skaffold/deploy/status_check.go Outdated Show resolved Hide resolved
pkg/skaffold/runner/runner.go Outdated Show resolved Hide resolved
@tejal29
Copy link
Contributor Author

tejal29 commented Jul 10, 2019

Looks like a failure in the deploy unit test

--- FAIL: TestDeploy (0.01s)
    --- FAIL: TestDeploy/deploy_shd_perform_status_check (0.00s)
        util.go:68: unexpected error: getting client config for kubernetes client: error creating kubeConfig: invalid configuration: no configuration has been provided

Looks like this is a flake. it passes locally on my branch. I will try running it several times.

@codecov
Copy link

codecov bot commented Jul 17, 2019

Codecov Report

Merging #2374 into master will decrease coverage by 0.43%.
The diff coverage is 42.36%.

Impacted Files Coverage Δ
pkg/skaffold/runner/deploy.go 70.58% <100%> (ø) ⬆️
pkg/skaffold/deploy/status_check.go 34.55% <17.39%> (-33.14%) ⬇️
pkg/skaffold/kubernetes/wait.go 55.2% <83.33%> (+9.37%) ⬆️
pkg/skaffold/kubernetes/status_check.go 88.88% <88.88%> (ø)


fmt.Fprintln(out, fmt.Sprintf("Waiting on %d of %d deployments", atomic.LoadInt32(&ops), numDeps))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use color.Default.Frpintf instead

@balopat
Copy link
Contributor

balopat commented Jul 18, 2019

As discussed

  1. The unstable example should have scratch instead of SCRATCH - it is easier to test manually then with skaffold dev
  2. change back the image to gcr.io/k8s-skaffold/incorrect-example in the yaml
  3. Instead of waiting for 0 of X deployment/pod, we could have no more pods/deployments pending

@balopat
Copy link
Contributor

balopat commented Jul 18, 2019

  1. It's good to have the number of pods that we are waiting still, but when we know that a pod/deployment is errored/succeeded print that out as well. pod foobar is Ready (4/13 pods still pending) or pod foobar Failed due to 'PodUnschedulable blablabla' (4/13 pods still pending)

@tejal29 tejal29 force-pushed the add_pods_stability branch from 710c7f6 to a50b54f Compare July 31, 2019 00:04
@tejal29
Copy link
Contributor Author

tejal29 commented Jul 31, 2019

@balopat Please take a look at the new changes.
I tested this on microservices/demo here with few changes.
https://github.com/tejal29/microservices-demo/tree/healthcheck_demo

@tejal29
Copy link
Contributor Author

tejal29 commented Jul 31, 2019

@balopat please take a look at the final output
https://gist.github.com/tejal29/02bab3f291d7fdc80066a0d825b4b1a2

@tejal29
Copy link
Contributor Author

tejal29 commented Jul 31, 2019

Will wait for #2591

@nkubala
Copy link
Contributor

nkubala commented Aug 22, 2019

@tejal29 since #2591 is now in can you get this one rebased?

@tejal29
Copy link
Contributor Author

tejal29 commented Sep 3, 2019

Discard in favour of #2750

@tejal29 tejal29 closed this Sep 3, 2019
@dgageot dgageot deleted the add_pods_stability branch September 9, 2019 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants