-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add failing health check if autoscaler loop consistently returns error #90
Add failing health check if autoscaler loop consistently returns error #90
Conversation
} | ||
|
||
func TestNoTimeoutFailingServeHTTP(t *testing.T) { | ||
w := getTestResponse(time.Now().Add(time.Second*-2), time.Second, time.Second, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above 2 tests seem to be exactly identical.
hc.mutex.Lock() | ||
defer hc.mutex.Unlock() | ||
if timestamp.After(hc.lastSuccessfulRun) { | ||
hc.lastSuccessfulRun = timestamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably doesn't matter all that much if we have reasonable timeouts, but maybe we should update lastActivity in here as well? My reasoning is that successfully finishing a loop means that it is running and we're about to do a 10s sleep. So theoretically if a loop took 9:55 we would fail in the middle of the sleep.
I wonder how long does it take for an HTTP request to timeout? I'm guessing we don't override default timeouts and I'm wondering how long a single loop will take if we lose connectivity to API server / cloud provider? Maybe it would be worth checking out to make sure that we should get at least 2 or 3 failed loops before failing the probe? |
cbb180e
to
9727724
Compare
/lgtm |
…ed-event UPSTREAM: <carry>: openshift: record max-nodes-total event
NamespaceSelector for ClusterQueue
Extend HealthCheck to verify that a successful run of autoscaler loop occured within given (configurable by setting --max-failing-time flag) time.