Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(aws-cdk/lib/api/garbage-collection): (Garbage collection for ECR prints incorrect number of assets/images deleted and runs indefinitely) #32498

Closed
1 task
tainsworth102 opened this issue Dec 12, 2024 · 4 comments · Fixed by #32679
Assignees
Labels
bug This issue is a bug. effort/small Small work item – less than a day of effort p1 package/tools Related to AWS CDK Tools or CLI

Comments

@tainsworth102
Copy link

tainsworth102 commented Dec 12, 2024

Describe the bug

When running cdk gc for ECR, the number of assets is misprinted and the files scanned exceeds the number in the bootstrap repository. This results in the percentage of files scanned exceeding 100.00% and therefore command runs indefinitely. Additionally, images are not being tagged in any attempted run, it is jumping straight to deleting the a random number of unused images which is not reflected in the print statement.

Regression Issue

  • Select this option if this issue appears to be a regression.

Last Known Working CDK Version

No response

Expected Behavior

The printed output should have stated:
[100.00%] 136 files scanned: 3 assets (0.56 GiB) tagged, 0 assets (0.00 GiB) deleted.

Current Behavior

[735.29%] 1000 files scanned: 0 assets (0.00 GiB) tagged, 30 assets (5.63 GiB) deleted.
image

The printed output was incorrect and rather than tagging it began deleting straight away.

Reproduction Steps

cdk gc aws://<my-account-id>/<my-only-used-region> --type ecr --unstable=gc --created-buffer-days 0 --action full --confirm=true

Possible Solution

No response

Additional Information/Context

When running this for an account which has an ECR repo with ~8000 images the progression printing is again displayed incorrectly. As the progression cycled and increased with the loop, the number progression progression prints for each progression iteration double and the number of images deleted cumulatively decreases until it deletes no images for each iteration.

CDK CLI Version

2.172.0

Framework Version

No response

Node.js Version

v20.11.1

OS

Ubuntu 22.04.4 LTS

Language

Python

Language Version

No response

Other information

No response

@tainsworth102 tainsworth102 added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Dec 12, 2024
@github-actions github-actions bot added the @aws-cdk/aws-ecr Related to Amazon Elastic Container Registry label Dec 12, 2024
@khushail khushail added investigating This issue is being investigated and/or work is in progress to resolve the issue. p2 @aws-cdk/core Related to core CDK functionality and removed needs-triage This issue or PR still needs to be triaged. labels Dec 13, 2024
@khushail khushail self-assigned this Dec 13, 2024
@khushail
Copy link
Contributor

khushail commented Dec 13, 2024

Hi @tainsworth102 , thanks for reaching out.

Although I did not have those many images, i tried with a small set.Here is a snippet (since I had only 2 images, it asked for deletion and then deleted)-

Screenshot 2024-12-13 at 2 37 49 PM

I assume it should have tagged the image as well as it was not used since long but the Asking part as well as the precentage looks good to me

After going through the CDK Doc for garbage collection, it seems that -

  • create-buffer-days as 0 which means 0 number of days an asset must exist before it is eligible for garbage collection actions.
  • tag – Tags any newly identified unused assets, but doesn’t delete any assets within the range of buffer days that you provide.

So considering these , AFAIU, in your command -
cdk gc aws://<my-account-id>/<my-only-used-region> --type ecr --unstable=gc --created-buffer-days 0 --action full --confirm=true

CDK GC should tag the image which does not seem to be the case.

Will try to repro this scenario with more images and share my findings.

@khushail
Copy link
Contributor

khushail commented Dec 20, 2024

@tainsworth102 , I tried deleting images in other region and got this -

⏳  Garbage Collecting environment aws://<<ACCOUNT-ID>>/us-west-2...
Found 1 assets to delete based off of the following criteria:
- assets have been isolated for > 0 days
- assets were created > 0 days ago

Delete this batch (yes/no/delete-all)? yes
[100.00%] 1 files scanned: 0 assets (0.00 MiB) tagged, 1 assets (103.23 MiB) deleted.

Seems like there is an issue with tagging , for the images which are getting deleted. However I am not sure of whether in situation where many 1000 images exist, how the percentage would be displayed. For smaller number, percentage seems correct .

I am marking this issue as P2 which means it would be on team's radar , won't be immediately addressed by the team. However contributions reg the resolution, are welcome from the community as well team.

Hope that is helpful.

@khushail khushail added effort/small Small work item – less than a day of effort and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Dec 20, 2024
@khushail khushail removed their assignment Dec 20, 2024
@sakurai-ryo
Copy link
Contributor

I was able to reproduce the bug, and I'll open a PR to fix it.

@kaizencc kaizencc added p1 package/tools Related to AWS CDK Tools or CLI and removed p2 @aws-cdk/aws-ecr Related to Amazon Elastic Container Registry @aws-cdk/core Related to core CDK functionality labels Dec 29, 2024
@kaizencc kaizencc self-assigned this Dec 29, 2024
@mergify mergify bot closed this as completed in #32679 Dec 29, 2024
mergify bot pushed a commit that referenced this issue Dec 29, 2024
…e collector for ECR (#32679)

### Issue # (if applicable)

Closes #32498

### Reason for this change
When `listImagesCommand` returns nextToken in the `readRepoInBatches` function, nextToken is not passed as an argument for the subsequent `listImagesCommand` execution, causing `listImagesCommand` to continue executing.
https://github.com/aws/aws-cdk/blob/v2.173.4/packages/aws-cdk/lib/api/garbage-collection/garbage-collector.ts#L621

According to the `listImagesCommand` documentation, if maxResults is not specified, a maximum of 100 images will be returned, so this bug requires at least 100 images in the asset repository.
https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/Package/-aws-sdk-client-ecr/Interface/ListImagesCommandInput/

#### Reproduction Steps
The following bash script and Dockerfile saved locally and executed, will push 120 container images to the asset repository.

```bash
#!/usr/bin/env bash

set -eu

ACCOUNT_ID="your account id"
REGION="your region"
REPO_NAME="cdk-hnb659fds-container-assets-${ACCOUNT_ID}-${REGION}"
IMAGE_NAME="test-image"
AWS_PROFILE="your AWS profile"

echo "Logging in to ECR..."
aws ecr get-login-password --region "${REGION}" --profile "${AWS_PROFILE}" \
| docker login --username AWS --password-stdin "${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com"

for i in $(seq 1 120); do
  hash=$(head -c 32 /dev/urandom | xxd -p -c 64)
  echo "Building and pushing image with tag: ${hash}"
  touch "${i}.txt"

  docker build \
    --build-arg BUILD_NO="${i}" \
    -t "${IMAGE_NAME}:${i}" \
    .

  docker tag "${IMAGE_NAME}:${i}" \
    "${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${REPO_NAME}:${hash}"

  docker push \
    "${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${REPO_NAME}:${hash}"

  rm "${i}.txt"

  sleep 0.01
done

echo "Done!"
```

```dockerfile
FROM scratch

ARG BUILD_NO
ENV BUILD_NO=${BUILD_NO}

COPY ${BUILD_NO}.txt /
```

You can reproduce this bug by running the following command after the images have been pushed.
```bash
$ cdk gc aws://{account id}/{region} --type ecr --unstable=gc --created-buffer-days 0 --action full --confirm=true
```

### Description of changes
Fix the problem of correctly handling nextToken when executing `listImagesCommand` in the `readRepoInBatches` function.

### Describe any new or updated permissions being added
Nothing.

### Description of how you validated changes
Verifying that this bug has been fixed using the CLI integration tests is difficult, so only unit tests are added.

### Checklist
- [x] My code adheres to the [CONTRIBUTING GUIDE](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and [DESIGN GUIDELINES](https://github.com/aws/aws-cdk/blob/main/docs/DESIGN_GUIDELINES.md)

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Copy link

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 29, 2024
iankhou pushed a commit that referenced this issue Jan 13, 2025
…e collector for ECR (#32679)

### Issue # (if applicable)

Closes #32498

### Reason for this change
When `listImagesCommand` returns nextToken in the `readRepoInBatches` function, nextToken is not passed as an argument for the subsequent `listImagesCommand` execution, causing `listImagesCommand` to continue executing.
https://github.com/aws/aws-cdk/blob/v2.173.4/packages/aws-cdk/lib/api/garbage-collection/garbage-collector.ts#L621

According to the `listImagesCommand` documentation, if maxResults is not specified, a maximum of 100 images will be returned, so this bug requires at least 100 images in the asset repository.
https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/Package/-aws-sdk-client-ecr/Interface/ListImagesCommandInput/

#### Reproduction Steps
The following bash script and Dockerfile saved locally and executed, will push 120 container images to the asset repository.

```bash
#!/usr/bin/env bash

set -eu

ACCOUNT_ID="your account id"
REGION="your region"
REPO_NAME="cdk-hnb659fds-container-assets-${ACCOUNT_ID}-${REGION}"
IMAGE_NAME="test-image"
AWS_PROFILE="your AWS profile"

echo "Logging in to ECR..."
aws ecr get-login-password --region "${REGION}" --profile "${AWS_PROFILE}" \
| docker login --username AWS --password-stdin "${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com"

for i in $(seq 1 120); do
  hash=$(head -c 32 /dev/urandom | xxd -p -c 64)
  echo "Building and pushing image with tag: ${hash}"
  touch "${i}.txt"

  docker build \
    --build-arg BUILD_NO="${i}" \
    -t "${IMAGE_NAME}:${i}" \
    .

  docker tag "${IMAGE_NAME}:${i}" \
    "${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${REPO_NAME}:${hash}"

  docker push \
    "${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${REPO_NAME}:${hash}"

  rm "${i}.txt"

  sleep 0.01
done

echo "Done!"
```

```dockerfile
FROM scratch

ARG BUILD_NO
ENV BUILD_NO=${BUILD_NO}

COPY ${BUILD_NO}.txt /
```

You can reproduce this bug by running the following command after the images have been pushed.
```bash
$ cdk gc aws://{account id}/{region} --type ecr --unstable=gc --created-buffer-days 0 --action full --confirm=true
```

### Description of changes
Fix the problem of correctly handling nextToken when executing `listImagesCommand` in the `readRepoInBatches` function.

### Describe any new or updated permissions being added
Nothing.

### Description of how you validated changes
Verifying that this bug has been fixed using the CLI integration tests is difficult, so only unit tests are added.

### Checklist
- [x] My code adheres to the [CONTRIBUTING GUIDE](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and [DESIGN GUIDELINES](https://github.com/aws/aws-cdk/blob/main/docs/DESIGN_GUIDELINES.md)

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue is a bug. effort/small Small work item – less than a day of effort p1 package/tools Related to AWS CDK Tools or CLI
Projects
None yet
4 participants