-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V1.40.0 stage #2469
Closed
Closed
V1.40.0 stage #2469
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Timeout and unit tests for agent healthcheck * integ test cleanup wait duration and expand terminal reason log msg * add 'type' to DriverOpts
* backoff container stats collection when fails * make gogenerate * fix unit tests and tweak timeout/backoff Extend inactivity timeout fix missing return statement in metric polling * address PR review comments
1. normalize the container name/id logging format when handling container state change 2. log the full container event as a debug message logs will go from this: level=info time=2020-05-02T15:39:55Z msg="Managed task [arn:aws:ecs:us-east-1:039193556298:task/3371e240-6285-4d09-b4f5-500587e9d0cf]: waiting for event for task" module=task_manager.go level=info time=2020-05-02T15:39:55Z msg="Managed task [arn:aws:ecs:us-east-1:039193556298:task/3371e240-6285-4d09-b4f5-500587e9d0cf]: got container [dd (Runtime ID: 35a25c7e69d3e6a4b1833d70157ba666032b3294045abe2c993dd3744c72149d)] event: [RUNNING]" module=task_manager.go level=info time=2020-05-02T15:39:55Z msg="Managed task [arn:aws:ecs:us-east-1:039193556298:task/3371e240-6285-4d09-b4f5-500587e9d0cf]: handling container change [{RUNNING {35a25c7e69d3e6a4b1833d70157ba666032b3294045abe2c993dd3744c72149d <nil> [] <nil> [] map[com.amazonaws.ecs.cluster:auto-qjkn com.amazonaws.ecs.container-name:dd com.amazonaws.ecs.task-arn:arn:aws:ecs:us-east-1:039193556298:task/3371e240-6285-4d09-b4f5-500587e9d0cf com.amazonaws.ecs.task-definition-family:dd com.amazonaws.ecs.task-definition-version:9] 2020-05-02 14:59:33.688910738 +0000 UTC 2020-05-02 14:59:55.013436083 +0000 UTC 0001-01-01 00:00:00 +0000 UTC {UNKNOWN <nil> 0 } default 0xc000aca300} ContainerStatusChangeEvent}] for container [dd (Runtime ID: 35a25c7e69d3e6a4b1833d70157ba666032b3294045abe2c993dd3744c72149d)]" module=task_manager.go level=info time=2020-05-02T15:39:55Z msg="Managed task [arn:aws:ecs:us-east-1:039193556298:task/3371e240-6285-4d09-b4f5-500587e9d0cf]: redundant container state change. dd to RUNNING, but already RUNNING" module=task_manager.go level=info time=2020-05-02T15:39:55Z msg="Managed task [arn:aws:ecs:us-east-1:039193556298:task/3371e240-6285-4d09-b4f5-500587e9d0cf]: task at steady state: RUNNING" module=task_manager.go level=info time=2020-05-02T15:39:55Z msg="Managed task [arn:aws:ecs:us-east-1:039193556298:task/3371e240-6285-4d09-b4f5-500587e9d0cf]: waiting for event for task" module=task_manager.go to this: level=info time=2020-05-02T16:07:58Z msg="Managed task [arn:aws:ecs:us-east-1:039193556298:task/d46739f3-4b08-4660-ab50-97ee9d0ccdc6]: waiting for event for task" module=task_manager.go level=info time=2020-05-02T16:07:59Z msg="Managed task [arn:aws:ecs:us-east-1:039193556298:task/d46739f3-4b08-4660-ab50-97ee9d0ccdc6]: Container [name=dd runtimeID=0e955bcf4c289034005f4b4032045030b80a0e570152158ccd29c7eb095556ea]: handling container change event [RUNNING]" module=task_manager.go level=info time=2020-05-02T16:07:59Z msg="Managed task [arn:aws:ecs:us-east-1:039193556298:task/d46739f3-4b08-4660-ab50-97ee9d0ccdc6]: Container [name=dd runtimeID=0e955bcf4c289034005f4b4032045030b80a0e570152158ccd29c7eb095556ea]: container change RUNNING->RUNNING is redundant" module=task_manager.go level=info time=2020-05-02T16:07:59Z msg="Managed task [arn:aws:ecs:us-east-1:039193556298:task/d46739f3-4b08-4660-ab50-97ee9d0ccdc6]: task at steady state: RUNNING" module=task_manager.go level=info time=2020-05-02T16:07:59Z msg="Managed task [arn:aws:ecs:us-east-1:039193556298:task/d46739f3-4b08-4660-ab50-97ee9d0ccdc6]: waiting for event for task" module=task_manager.go
Implications of this change: 1. During one TACS publishing interval, we previously would only send metrics to TACS if we received two separate stats from docker. Going forward, if we only received a single stat we will compare it to a stat from a previous interval in order to get the usage. 2. This also allows the user to configure the stats collection interval to be once per TACS publish interval, further improving CPU usage. Fix unit tests
always verify that the task is stopped at the end of every test, in case unrelated state change events are received that may trigger a test to exit before it has been cleaned up. also get a new port for each test to avoid potential collisions.
* Change default docker metric gathering behavior 1. Change default docker metric gathering behavior from streaming metrics to polling. 2. Change the default polling interval to half of the TACS publishing interval (currently 20s), so that every publish interval we have two docker metrics. 3. Change the minimum polling interval to 5s to prevent customers from configuring polling to be just as resource-intensive as streaming metrics. These changes are being made because we have found that docker streaming stats consumes considerable resources from the agent, dockerd daemon, and containerd daemon. * Task stats endpoint needs to be populated immediately to avoid changing behavior from streaming stats, we need to populate the stats endpoint immediately when the stats engine starts. So instead of jittering the first stats gather we need to just do it immediately. * Pinning windows images (ltsc201*) to known working version 5/12 release broke our tests
These response bodies need to be closed in order to reuse the underlying tcp connections in Go. I have also changed the healthcheck GET request to a HEAD request because it doesn't have any use for the data contained in the GET, so might as well not get it in the first place. wrote a small benchmark test to show the improvement in time and allocations with this change: dev branch: $ go test ./agent/app/. -bench=Healthcheck -benchmem goos: darwin goarch: amd64 pkg: github.com/aws/amazon-ecs-agent/agent/app BenchmarkHealthcheck-4 3280 782405 ns/op 16996 B/op 119 allocs/op PASS ok github.com/aws/amazon-ecs-agent/agent/app 5.393s this branch: $ go test ./agent/app/. -bench=Healthcheck -benchmem goos: darwin goarch: amd64 pkg: github.com/aws/amazon-ecs-agent/agent/app BenchmarkHealthcheck-4 15004 75400 ns/op 4303 B/op 55 allocs/op PASS ok github.com/aws/amazon-ecs-agent/agent/app 3.980s
* Change asserts to require in integ tests * Copy all test tasks in integ tests to avoid data races * image manager integ test logging fix * skipping windows image cleanup test because of hcsshim bug
* Remove unused code found via staticcheck tool * Add staticcheck unused code check to CI
fenxiong
reviewed
Jun 2, 2020
CHANGELOG.md
Outdated
@@ -1,5 +1,9 @@ | |||
# Changelog | |||
|
|||
## 1.40.0 | |||
* Bug - Register custom logger before it gets used to ensure that the formatter is initiated before it is loaded [#2438](https://github.com/aws/amazon-ecs-agent/pull/2438) | |||
* Enhancement - adds a jitter to this so that we don't query docker for every container's state all at the same time [#2444] (https://github.com/aws/amazon-ecs-agent/pull/2444) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra space after [#2444]
causing link not rendered;
nit: put enhancement item above bug
@@ -1,5 +1,9 @@ | |||
# Changelog | |||
|
|||
## 1.40.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should have a changelog for #2452
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Implementation details
Testing
New tests cover the changes:
Description for the changelog
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.