Add failing health check if autoscaler loop consistently returns error #90

aleksandra-malinowska · 2017-05-26T11:08:24Z

Extend HealthCheck to verify that a successful run of autoscaler loop occured within given (configurable by setting --max-failing-time flag) time.

MaciekPytel · 2017-05-26T11:14:04Z

cluster-autoscaler/metrics/liveness_test.go

+}
+
+func TestNoTimeoutFailingServeHTTP(t *testing.T) {
+	w := getTestResponse(time.Now().Add(time.Second*-2), time.Second, time.Second, true)


Above 2 tests seem to be exactly identical.

MaciekPytel · 2017-05-26T12:40:23Z

cluster-autoscaler/metrics/liveness.go

+	hc.mutex.Lock()
+	defer hc.mutex.Unlock()
+	if timestamp.After(hc.lastSuccessfulRun) {
+		hc.lastSuccessfulRun = timestamp


It probably doesn't matter all that much if we have reasonable timeouts, but maybe we should update lastActivity in here as well? My reasoning is that successfully finishing a loop means that it is running and we're about to do a 10s sleep. So theoretically if a loop took 9:55 we would fail in the middle of the sleep.

MaciekPytel · 2017-05-26T12:44:53Z

I wonder how long does it take for an HTTP request to timeout? I'm guessing we don't override default timeouts and I'm wondering how long a single loop will take if we lose connectivity to API server / cloud provider? Maybe it would be worth checking out to make sure that we should get at least 2 or 3 failed loops before failing the probe?

MaciekPytel · 2017-05-29T11:32:27Z

/lgtm

…ed-event UPSTREAM: <carry>: openshift: record max-nodes-total event

…ze-reached-event" This reverts commit 9b2cf49, reversing changes made to 291cdcd.

…from frobware/add-cluster-size-reached-event" This reverts commit 9b2cf49, reversing changes made to 291cdcd.

NamespaceSelector for ClusterQueue

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 26, 2017

MaciekPytel reviewed May 26, 2017

View reviewed changes

Add failing health check if autoscaler loop consistently returns error

9727724

aleksandra-malinowska force-pushed the health-check-errors branch from cbb180e to 9727724 Compare May 29, 2017 09:32

k8s-ci-robot assigned MaciekPytel May 29, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 29, 2017

MaciekPytel merged commit 8b1dfac into kubernetes:master May 29, 2017

joelsmith pushed a commit to joelsmith/autoscaler that referenced this pull request May 17, 2019

Merge pull request kubernetes#90 from frobware/add-cluster-size-reach…

9b2cf49

…ed-event UPSTREAM: <carry>: openshift: record max-nodes-total event

frobware added a commit to frobware/autoscaler that referenced this pull request Jun 5, 2019

Revert "Merge pull request kubernetes#90 from frobware/add-cluster-si…

fd90a4c

…ze-reached-event" This reverts commit 9b2cf49, reversing changes made to 291cdcd.

frobware added a commit to frobware/autoscaler that referenced this pull request Jun 5, 2019

UPSTREAM: <carry>: openshift: Revert "Merge pull request kubernetes#90 …

c2f1516

…from frobware/add-cluster-size-reached-event" This reverts commit 9b2cf49, reversing changes made to 291cdcd.

frobware added a commit to frobware/autoscaler that referenced this pull request Jun 19, 2019

UPSTREAM: <carry>: openshift: Revert "Merge pull request kubernetes#90 …

97f1da1

…from frobware/add-cluster-size-reached-event" This reverts commit 9b2cf49, reversing changes made to 291cdcd.

frobware added a commit to frobware/autoscaler that referenced this pull request Jun 20, 2019

UPSTREAM: <carry>: openshift: Revert "Merge pull request kubernetes#90 …

f5d5d66

…from frobware/add-cluster-size-reached-event" This reverts commit 9b2cf49, reversing changes made to 291cdcd.

enxebre pushed a commit to enxebre/autoscaler that referenced this pull request Oct 28, 2019

UPSTREAM: <carry>: openshift: Revert "Merge pull request kubernetes#90 …

469b32c

…from frobware/add-cluster-size-reached-event" This reverts commit 9b2cf49, reversing changes made to 291cdcd.

enxebre pushed a commit to enxebre/autoscaler that referenced this pull request Jan 13, 2020

UPSTREAM: <carry>: openshift: Revert "Merge pull request kubernetes#90 …

005b471

…from frobware/add-cluster-size-reached-event" This reverts commit 9b2cf49, reversing changes made to 291cdcd.

yaroslava-serdiuk pushed a commit to yaroslava-serdiuk/autoscaler that referenced this pull request Feb 22, 2024

Merge pull request kubernetes#90 from ahg-g/ahg-ns

0efe0da

NamespaceSelector for ClusterQueue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add failing health check if autoscaler loop consistently returns error #90

Add failing health check if autoscaler loop consistently returns error #90

aleksandra-malinowska commented May 26, 2017

MaciekPytel May 26, 2017

MaciekPytel May 26, 2017

MaciekPytel commented May 26, 2017

MaciekPytel commented May 29, 2017

Add failing health check if autoscaler loop consistently returns error #90

Add failing health check if autoscaler loop consistently returns error #90

Conversation

aleksandra-malinowska commented May 26, 2017

MaciekPytel May 26, 2017

Choose a reason for hiding this comment

MaciekPytel May 26, 2017

Choose a reason for hiding this comment

MaciekPytel commented May 26, 2017

MaciekPytel commented May 29, 2017