Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(aws): handle ECR repositories in different regions #6217

Merged
merged 1 commit into from
Sep 2, 2024

Conversation

knrc
Copy link
Contributor

@knrc knrc commented Feb 28, 2024

Description

This PR modified the ECR integration so that it obtains authorization tokens from the region hosting the ECR. Current behaviour would be to use the default, resulting in authentication errors such as

2024-02-26T18:56:03.738Z	ERROR	Error during vulnerabilities or misconfiguration scan: scan error: unable to initialize a scanner: unable to initialize an image scanner: 4 errors occurred:
	* docker error: unable to inspect the image (127647282379.dkr.ecr.us-east-1.amazonaws.com/undistro-test-image:1.25.3): Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
	* containerd error: containerd socket not found: /run/containerd/containerd.sock
	* podman error: unable to initialize Podman client: no podman socket found: stat podman/podman.sock: no such file or directory
	* remote error: GET https://127647282379.dkr.ecr.us-east-1.amazonaws.com/v2/undistro-test-image/manifests/1.25.3: unexpected status code 401 Unauthorized: Not Authorized

2024-02-26T18:56:03.738Z	ERROR	Error during vulnerabilities or misconfiguration scan: scan error: unable to initialize a scanner: unable to initialize an image scanner: 4 errors occurred:
	* docker error: unable to inspect the image (127647282379.dkr.ecr.sa-east-1.amazonaws.com/undistro-test-image:1.25.3): Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
	* containerd error: containerd socket not found: /run/containerd/containerd.sock
	* podman error: unable to initialize Podman client: no podman socket found: stat podman/podman.sock: no such file or directory
	* remote error: GET https://127647282379.dkr.ecr.sa-east-1.amazonaws.com/v2/undistro-test-image/manifests/1.25.3: unexpected status code 401 Unauthorized: Not Authorized

This was raised in #1026, which is now closed although the underlying issue doesn't appear to be addressed.

Related issues

Checklist

  • I've read the guidelines for contributing to this repository.
  • I've followed the conventions in the PR title.
  • I've added tests that prove my fix is effective or that my feature works.
  • I've updated the documentation with the relevant information (if needed).
  • I've added usage information (if the PR introduces new options)
  • I've included a "before" and "after" example to the description (if the PR is a user interface change).

@CLAassistant
Copy link

CLAassistant commented Feb 28, 2024

CLA assistant check
All committers have signed the CLA.

@knrc knrc changed the title bug(aws): handle ECR repositories in different regions fix(aws): handle ECR repositories in different regions Feb 28, 2024
Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @knrc
Thanks for your report!

Left comments. Take a look, when you have time, please.

Regards, Dmitriy

pkg/fanal/image/registry/ecr/ecr.go Outdated Show resolved Hide resolved
pkg/fanal/image/registry/ecr/ecr.go Outdated Show resolved Hide resolved
pkg/fanal/image/registry/ecr/ecr_test.go Outdated Show resolved Hide resolved
pkg/fanal/image/registry/ecr/ecr.go Outdated Show resolved Hide resolved
@knrc knrc force-pushed the ecr_multi_region branch 4 times, most recently from 1d8902b to ed18edb Compare February 29, 2024 13:48
Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knrc Thanks for your work!

@@ -46,11 +46,34 @@ func (e *ECR) CheckOptions(domain string, option types.RegistryOptions) error {
return err
}

// override region with the value from the repository domain
cfg.Region = region
Copy link
Contributor

@DmitriyLewen DmitriyLewen Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knrc I found 1 interesting case:
if AWS_REGION env != region from domain:
Should we use AWS_REGION (we are overwriting this value now)?

IIUC this case is user mistake (wrongAWS_REGION). But perhaps it make sense to show warning log message about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DmitriyLewen The point of the PR is to override the AWS_REGION setting, if we don't do that then we end up with an authentication token for one region and have no visibility of containers hosted in other regions.

Our use case is multiple private repositories in multiple regions.

Copy link
Contributor

@DmitriyLewen DmitriyLewen Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that we need to tell the user that region from AWS_REGION != region from domain.
Something like that:

func getSession(region string, option types.RegistryOptions) (aws.Config, error) {
	// create custom credential information if option is valid
	if option.AWSSecretKey != "" && option.AWSAccessKey != "" && option.AWSRegion != "" {
		if region != option.AWSRegion {
			log.Logger.Warnf("The region from AWS_REGION (%s) is incorrect. The region from domain (%s) was used.", option.AWSRegion, region)
		}
		return config.LoadDefaultConfig(
			context.TODO(),
			config.WithRegion(region),
			config.WithCredentialsProvider(credentials.NewStaticCredentialsProvider(option.AWSAccessKey, option.AWSSecretKey, option.AWSSessionToken)),
		)
	}
	return config.LoadDefaultConfig(context.TODO(), config.WithRegion(region))
}

Also i am worried about asff template. We use AWS_REGION env for this template. Perhaps we need to set AWS_REGION env when we have overwritten the region.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DmitriyLewen Ah gotcha. We can certainly add a message, although I'm not sure it would make much sense as it is likely to have been set by the webhook to match the EKS installation. If you consider our use case, with multiple private repositories in different regions, then it would be impossible for the user to set the region appropriately so it would be defaulted to the webhook's view.

I can take a look at the template today, I didn't consider that, and can certainly pass the parameter through to getSession as that seems cleaner than rewriting it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed the changes for getSession and the warning, looking at the template.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it looks like when using asff template, users will have AWS_REGION set. In addition, we display a warning.
We can start with these changes.

If problems arise, we will think about fixing them (as another solution, we can add your regex to asff.tpl).

Copy link
Contributor Author

@knrc knrc Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should Region match the image region?

AWS docs(https://docs.aws.amazon.com/securityhub/1.0/APIReference/API_AwsSecurityFinding.html#securityhub-Type-AwsSecurityFinding-ProductArn) say:

Region

    The Region from which the finding was generated.

+1, in my view Arn and Region are not related but the template assumes they are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it looks like when using asff template, users will have AWS_REGION set. In addition, we display a warning. We can start with these changes.

If problems arise, we will think about fixing them (as another solution, we can add your regex to asff.tpl).

Yes, since the output doesn't change with this PR we are no worse off. I do think there is change needed for the template but that should be a separate issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DmitriyLewen I don't think those other links change anything for this PR.

@knrc knrc force-pushed the ecr_multi_region branch from ed18edb to a8f52b1 Compare March 1, 2024 14:32
Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work @knrc

@knqyf263 i approved this PR.
If you agree with #6217 (comment) - we can merge it.

@knrc
Copy link
Contributor Author

knrc commented Mar 21, 2024

@DmitriyLewen I ran into another problem, it turns out the registry code (at least in 0.49.1) is broken. The three registries (google, azure and ECR) are invoked concurrently, which means their state gets overwritten each time while still being used.

I have some fixes for that, I'll check with 0.50.0 and push up another version of this PR some time next week.

@knrc knrc force-pushed the ecr_multi_region branch from a8f52b1 to cf22fc8 Compare March 26, 2024 14:27
@knrc
Copy link
Contributor Author

knrc commented Mar 26, 2024

@DmitriyLewen I pushed up the changes so you can see the difference, I'm just about to test them on 0.50.0. I'll rebase the PR on the latest once I've validated it.

@knrc knrc force-pushed the ecr_multi_region branch from cf22fc8 to 2c9ca4f Compare March 26, 2024 20:53
@knrc
Copy link
Contributor Author

knrc commented Mar 26, 2024

@DmitriyLewen I've tested and rebased the PR, it's ready again

@DmitriyLewen
Copy link
Contributor

Hello @knrc

The three registries (google, azure and ECR) are invoked concurrently, which means their state gets overwritten each time while still being used.

I'm a little confused

Trivy checks registries sequentially:

for _, registry := range registries {
err := registry.CheckOptions(domain, opt)
if err != nil {
continue
}
username, password, err := registry.GetCredential(ctx)
if err != nil {
// only skip check registry if error occurred
log.Logger.Debug(err)
break
}
return authn.Basic{
Username: username,
Password: password,
}
}
return authn.Basic{}

Which field is overwritten?

@knrc
Copy link
Contributor Author

knrc commented Mar 27, 2024

I'm a little confused

Trivy checks registries sequentially:

It does, within GetToken, however GetToken is called concurrently.

Which field is overwritten?

Line 37 calls CheckOptions, which will create client resources in the singleton registry based on the domain. This client is then used later within GetCredentials. Since the singletons are being accessed concurrently the client is not guaranteed to be the one created within the previous call to CheckOptions in the loop.

@DmitriyLewen
Copy link
Contributor

I think I understand your logic.

But I don't see any place where we use GetToken function (or upper function) using goroutines.
We also use 1 image. Therefore, if we overwrite ECRClient -> it will be a new ECRClient but with the same settings.

But i can missing something. Will be great if you can show some example.

Anyway, I think these changes should be made in another PR.
Can you undo the last changes and create a new PR with those changes and the example in the new PR?

@knrc
Copy link
Contributor Author

knrc commented Apr 3, 2024

But I don't see any place where we use GetToken function (or upper function) using goroutines. We also use 1 image. Therefore, if we overwrite ECRClient -> it will be a new ECRClient but with the same settings.

But i can missing something. Will be great if you can show some example.

The artifacts are scanned by different workers in parallel, see

p := parallel.NewPipeline(s.opts.Parallel, !s.opts.Quiet, resourceArtifacts, onItem, onResult)
err = p.Do(ctx)

Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knrc sorry for the wait for an answer.

The artifacts are scanned by different workers in parallel, see

Thank you for showing me this. I'm currently seeing this problem!

Your changes look correct for this case.

I left 1 comment about google test.

@knrc
Copy link
Contributor Author

knrc commented Apr 18, 2024

@knrc sorry for the wait for an answer.

No worries, we all have our day jobs.

The artifacts are scanned by different workers in parallel, see

Thank you for showing me this. I'm currently seeing this problem!

:)

Your changes look correct for this case.

I left 1 comment about google test.

Sounds good, I'll take a look. I'm at Open Source Summit this week, but will try to get to this as quickly as I can.

@knrc knrc force-pushed the ecr_multi_region branch 2 times, most recently from d0f4e93 to ed0beb2 Compare April 18, 2024 13:33
@knrc
Copy link
Contributor Author

knrc commented Apr 18, 2024

Sounds good, I'll take a look. I'm at Open Source Summit this week, but will try to get to this as quickly as I can.

@DmitriyLewen I rebased and updated the PR for your comment, can you take another look?

@DmitriyLewen
Copy link
Contributor

@knrc can you fix linter error?

@knrc
Copy link
Contributor Author

knrc commented Apr 19, 2024

@knrc can you fix linter error?

Yes, I can add that to the list since I'm in those files anyway.

@knrc knrc force-pushed the ecr_multi_region branch from ed0beb2 to dff6701 Compare April 19, 2024 17:50
@knrc
Copy link
Contributor Author

knrc commented Apr 19, 2024

@DmitriyLewen try now

Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knrc Thanks for your work!

@knqyf263 take a look, when you have time, please.

@knqyf263
Copy link
Collaborator

Thanks for the contribution. This PR needs to resolve conflicts now.

Copy link

This PR is stale because it has been labeled with inactivity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and will be auto-closed. label Aug 21, 2024
@knrc
Copy link
Contributor Author

knrc commented Aug 21, 2024

I'll take a look at this over the next few days

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and will be auto-closed. label Aug 22, 2024
@knrc knrc force-pushed the ecr_multi_region branch from dff6701 to 9461d74 Compare August 29, 2024 17:05
@knqyf263 knqyf263 enabled auto-merge August 30, 2024 07:29
@knqyf263 knqyf263 added this pull request to the merge queue Aug 30, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 30, 2024
@knrc
Copy link
Contributor Author

knrc commented Aug 30, 2024

The integration tests appear to be hitting a limit

    library_test.go:165: 
        	Error Trace:	/home/runner/work/trivy/trivy/pkg/fanal/test/integration/library_test.go:165
        	Error:      	Received unexpected error:
        	            	unable to find the specified image "ghcr.io/aquasecurity/trivy-test-images:opensuse-tumbleweed" in ["remote"]:
        	            	    github.com/aquasecurity/trivy/pkg/fanal/image.NewContainerImage
        	            	        /home/runner/work/trivy/trivy/pkg/fanal/image/image.go:58
        	            	  - 1 error occurred:
        	            		* remote error: GET https://ghcr.io/v2/aquasecurity/trivy-test-images/manifests/opensuse-tumbleweed: TOOMANYREQUESTS: retry-after: 750.362µs, allowed: 44000/minute

@knrc
Copy link
Contributor Author

knrc commented Aug 30, 2024

The test does run successfully when executed locally

@DmitriyLewen DmitriyLewen added this pull request to the merge queue Sep 2, 2024
Merged via the queue into aquasecurity:main with commit feaef96 Sep 2, 2024
12 checks passed
@aqua-bot aqua-bot mentioned this pull request Sep 2, 2024
@DmitriyLewen
Copy link
Contributor

Hello @knrc
Looks like it was bug in CI/CD.

PR has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS ECR registry authentication only works in the same/default region as caller
4 participants