Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout and unit tests for agent healthcheck #2437

Merged
merged 3 commits into from
Apr 27, 2020

Conversation

sparrc
Copy link
Contributor

@sparrc sparrc commented Apr 23, 2020

Summary

The docker healthcheck default timeout is 30s. We don't currently set any http timeouts on our internal healthcheck so if the request hangs then we don't log anything in agent or explicitly return a failed healthcheck to docker.

Adding a timeout of 25s means we can timeout in the agent codebase, log the error, and then explicitly return the failed healthcheck error code.

Also did a little refactor to allow adding unit tests.

New tests cover the changes: yes

test output:

% go test ./agent/app/. -count=1 -tags unit -v -run=Healthcheck
=== RUN   TestHealthcheck_Sunny
--- PASS: TestHealthcheck_Sunny (0.00s)
=== RUN   TestHealthcheck_InvalidURL2
--- PASS: TestHealthcheck_InvalidURL2 (0.00s)
=== RUN   TestHealthcheck_Timeout
level=error time=2020-04-23T21:51:23Z msg="error creating healthcheck request: parse   http://foobar: first path segment in URL cannot contain colon" module=healthcheck.go
level=error time=2020-04-23T21:51:25Z msg="health check [GET http://127.0.0.1:57227] failed with error: Get http://127.0.0.1:57227: net/http: request canceled (Client.Timeout exceeded while awaiting headers)" module=healthcheck.go
--- PASS: TestHealthcheck_Timeout (2.00s)
=== RUN   TestHealthcheck_404
--- PASS: TestHealthcheck_404 (0.00s)
PASS
ok  	github.com/aws/amazon-ecs-agent/agent/app	2.147s

Description for the changelog

Log agent healthcheck timeouts.

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Member

@fierlion fierlion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to have the more granular view here.

)

func TestHealthcheck_Sunny(t *testing.T) {
ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!

}

func TestHealthcheck_InvalidURL2(t *testing.T) {
// leading space in url is invalid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it’s supposed to be there but it still really bothers me.

@sparrc sparrc force-pushed the healthcheck-timeout branch 2 times, most recently from 96b9674 to 435b1f1 Compare April 24, 2020 00:09
@@ -87,6 +87,11 @@ func StartSession(params *TelemetrySessionParams, statsEngine stats.Engine) erro
seelog.Errorf("Error: lost websocket connection with ECS Telemetry service (TCS): %v", tcsError)
params.time().Sleep(backoff.Duration())
}
select {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this change, but this is a fix for when TestDoStartCgroupInitHappyPath has a failure after the test goroutine has already exited.

@@ -172,6 +172,7 @@ func createVolumeTask(scope, arn, volume string, autoprovision bool) (*apitask.T
DriverOpts: map[string]string{
"device": tmpDirectory,
"o": "bind",
"type": "tmpfs",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was added for compatibility with newer docker versions, see docker-archive/engine@d5b271c

@sparrc sparrc merged commit 4369eec into aws:dev Apr 27, 2020
@sparrc sparrc deleted the healthcheck-timeout branch April 27, 2020 17:53
@sparrc sparrc added this to the 1.40.0 milestone May 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants