Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expedited image manifest digest reporting #4177

Merged
merged 4 commits into from
May 22, 2024
Merged

Conversation

amogh09
Copy link
Contributor

@amogh09 amogh09 commented May 14, 2024

Summary

This PR contains all the changes for expedited image digest reporting enhancement. The functional changes are listed below.

  • For a task container that does not have a digest specified in its Image field in the task payload, Agent will resolve the manifest digest for the container image during the container's transition to MANIFEST_PULLED state. Digest will be resolved by calling the image repository if image pull is required for the container and by inspecting a locally available image if image pull is not required for the container.
  • If at least one container of the task had its digest resolved during transition to MANIFEST_PULLED state, then Agent will make a STSC (SubmitTaskStateChange) call to ECS backend to report all the resolved digests for the task.
  • If a container had its image digest resolved during transition to MANIFEST_PULLED state, then a canonical image reference, prepared using the image repository name and the resolved digest, will be used for pulling the container image. For example, instead of docker image pull public.ecr.aws/library/busybox:latest, Agent will perform the equivalent of docker image pull public.ecr.aws/library/busybox@sha256:<resolved-digest>. After pulling the image, Agent will tag the pulled image with the value of container's Image field so that the image is discoverable on the host using the container's Image field.

Impact

Users can notice these changes in a few ways.

  • Some additional delay is expected to task start times due to the overhead of expedited digest resolution. The exact delay would depend on the image registry for the container images. During our testing with public and ECR images, and Dockerhub images with unlimited pulls allowed, we observed additional task start delay under 500ms on average.
  • Users' image repositories will see one additional call to their v2 manifest endpoints per task launch.
  • For each task requiring digest resolution, there will be one additional SubmitTaskStateChange API call made on customer's behalf by Agent. This call belongs to "Agent modify actions" API action category.

Approved PRs included

This PR merges a feature branch for expedited digest reporting feature to dev branch. The individual PRs included in this PR were reviewed in the past and are listed below.

Testing

New unit, integration, and functional tests have been added.

Comprehensive manual testing was performed. Stress testing was performed to measure the additional task start delay (results noted above) and it was decided that the additional task start delay is acceptable.

New tests cover the changes: yes

Description for the changelog

Feature: Expedited reporting of container image manifest digests to ECS backend. This change makes Agent resolve container image manifest digests for container images prior to image pulls by either calling image registries or inspecting local images depending on the host state and Agent configuration. Resolved digests will be reported to ECS backend using an additional SubmitTaskStateChange API call.

Does this PR include breaking model changes? If so, Have you added transformation functions?

No breaking model changes included.

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@amogh09 amogh09 force-pushed the feature/digest-resolution branch from 499474c to c0ced29 Compare May 21, 2024 19:21
@amogh09 amogh09 marked this pull request as ready for review May 21, 2024 20:51
@amogh09 amogh09 requested a review from a team as a code owner May 21, 2024 20:51
@amogh09 amogh09 changed the title Feature/digest resolution Expedited image manifest digest reporting May 21, 2024
@amogh09 amogh09 merged commit c763e4d into dev May 22, 2024
45 checks passed
@Yiyuanzzz Yiyuanzzz mentioned this pull request May 28, 2024
jiuchoe4 pushed a commit to saurabhc123/amazon-ecs-agent that referenced this pull request Jun 3, 2024
saurabhc123 pushed a commit to saurabhc123/amazon-ecs-agent that referenced this pull request Jun 4, 2024
@mvanholsteijn
Copy link

Unfortunately, this change does cause a breaking change for tagged images that are updated and the old image layer is deleted.

@clarkohw
Copy link

clarkohw commented Jul 8, 2024

+1 to the breaking changes comment. This is also happening when using fargate tasks. Intentional deployments pull the correct image, but unexpected restarts use the old image digest.

@timdarbydotnet
Copy link

timdarbydotnet commented Jul 8, 2024

This has broken my test cluster and I don't know how to work around it. The agent absolutely will not pull an updated image that has been tagged latest and insists on pulling the older untagged image. I'm seeing this message now in the ecs-agent log:

level=info time=2024-07-08T13:08:38Z msg="Digest resolution not required" taskARN="arn:aws:ecs:us-west-2:awsaccount:task/eds-ldap-test/49a62df1c2294870b6124a69babe4f5e" containerName="ldap-container" image="awsaccount.dkr.ecr.us-west-2.amazonaws.com/eds-ldap-test/389ds@sha256:2ce6935804572d1133a069b6a12b2df560599f36c113d11b46356256ed4b6ab0"

How do I force it to perform digest resolution against a newly pushed and tagged image?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants