Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add failing health check if autoscaler loop consistently returns error #90

Merged

Conversation

aleksandra-malinowska
Copy link
Contributor

Extend HealthCheck to verify that a successful run of autoscaler loop occured within given (configurable by setting --max-failing-time flag) time.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 26, 2017
}

func TestNoTimeoutFailingServeHTTP(t *testing.T) {
w := getTestResponse(time.Now().Add(time.Second*-2), time.Second, time.Second, true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above 2 tests seem to be exactly identical.

hc.mutex.Lock()
defer hc.mutex.Unlock()
if timestamp.After(hc.lastSuccessfulRun) {
hc.lastSuccessfulRun = timestamp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably doesn't matter all that much if we have reasonable timeouts, but maybe we should update lastActivity in here as well? My reasoning is that successfully finishing a loop means that it is running and we're about to do a 10s sleep. So theoretically if a loop took 9:55 we would fail in the middle of the sleep.

@MaciekPytel
Copy link
Contributor

I wonder how long does it take for an HTTP request to timeout? I'm guessing we don't override default timeouts and I'm wondering how long a single loop will take if we lose connectivity to API server / cloud provider? Maybe it would be worth checking out to make sure that we should get at least 2 or 3 failed loops before failing the probe?

@MaciekPytel
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 29, 2017
@MaciekPytel MaciekPytel merged commit 8b1dfac into kubernetes:master May 29, 2017
joelsmith pushed a commit to joelsmith/autoscaler that referenced this pull request May 17, 2019
…ed-event

UPSTREAM: <carry>: openshift: record max-nodes-total event
frobware added a commit to frobware/autoscaler that referenced this pull request Jun 5, 2019
…ze-reached-event"

This reverts commit 9b2cf49, reversing
changes made to 291cdcd.
frobware added a commit to frobware/autoscaler that referenced this pull request Jun 5, 2019
…from frobware/add-cluster-size-reached-event"

This reverts commit 9b2cf49, reversing
changes made to 291cdcd.
frobware added a commit to frobware/autoscaler that referenced this pull request Jun 19, 2019
…from frobware/add-cluster-size-reached-event"

This reverts commit 9b2cf49, reversing
changes made to 291cdcd.
frobware added a commit to frobware/autoscaler that referenced this pull request Jun 20, 2019
…from frobware/add-cluster-size-reached-event"

This reverts commit 9b2cf49, reversing
changes made to 291cdcd.
enxebre pushed a commit to enxebre/autoscaler that referenced this pull request Oct 28, 2019
…from frobware/add-cluster-size-reached-event"

This reverts commit 9b2cf49, reversing
changes made to 291cdcd.
enxebre pushed a commit to enxebre/autoscaler that referenced this pull request Jan 13, 2020
…from frobware/add-cluster-size-reached-event"

This reverts commit 9b2cf49, reversing
changes made to 291cdcd.
yaroslava-serdiuk pushed a commit to yaroslava-serdiuk/autoscaler that referenced this pull request Feb 22, 2024
NamespaceSelector for ClusterQueue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants