Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor endpoint resolver for AWS services #32921

Merged
merged 11 commits into from
Sep 8, 2022

Conversation

sayden
Copy link
Contributor

@sayden sayden commented Aug 30, 2022

What does this PR do?

Custom endpoint resolver in AWS services initialisation was broken after the aws sdk upgrade.
This PR remove the custom endpoint resolver, but for the case of non aws bucket in direct aws s3 input

@sayden sayden added Team:Cloud-Monitoring Label for the Cloud Monitoring team bugfix labels Aug 30, 2022
@sayden sayden requested review from aspacca and kaiyan-sheng August 30, 2022 06:41
@sayden sayden self-assigned this Aug 30, 2022
@sayden sayden requested a review from a team as a code owner August 30, 2022 06:41
@sayden sayden requested review from belimawr and fearful-symmetry and removed request for a team August 30, 2022 06:41
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Aug 30, 2022
@mergify
Copy link
Contributor

mergify bot commented Aug 30, 2022

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @sayden? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@mergify
Copy link
Contributor

mergify bot commented Aug 30, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b aws-fix-region-not-being-used upstream/aws-fix-region-not-being-used
git merge upstream/main
git push upstream aws-fix-region-not-being-used

Copy link
Contributor

@tommyers-elastic tommyers-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice - does this resolve the reported issue in SQS ?

do we have an understanding of the difference between this and the built-in default resolver? i.e. can we switch over to that instead of maintaining this one?

@sayden
Copy link
Contributor Author

sayden commented Aug 30, 2022

This is the least invasive way to update the current library without proper testing. We can use the default in a follow up, once we understand how it works and why AWS introduced a breaking change there.

@elasticmachine
Copy link
Collaborator

elasticmachine commented Aug 30, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-09-06T00:24:25.954+0000

  • Duration: 136 min 46 sec

Test stats 🧪

Test Results
Failed 0
Passed 8717
Skipped 503
Total 9220

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@tommyers-elastic
Copy link
Contributor

without proper testing

could you expand on this a little? 😛

@sayden
Copy link
Contributor Author

sayden commented Aug 30, 2022

could you expand on this a little?

There are no tests, at all, that actually checks this across all Beats. I like to understand side effects of changing something that affects 5-6 packages.

Hey, but I disagree and commit. Seriously, no prob. 🙂

@ruflin
Copy link
Contributor

ruflin commented Aug 30, 2022

Is there a way that the user can change this manually in config or environment variables without having to change it in the code? I'm asking to see if for users that are hit by this issue, if there is also currently a manual workaround?

@sayden
Copy link
Contributor Author

sayden commented Aug 30, 2022

Is there a way that the user can change this manually in config or environment variables without having to change it in the code? I'm asking to see if for users that are hit by this issue, if there is also currently a manual workaround?

Good question, I have looked at the code and I don't see a workaround atm. The only way to avoid the issue will be to not use a customized endpoint as far as I can see.

This function has 22 references in other parts of the code: billing, cloudwatch, rds, sqs... so it's complicated to predict side effects.

Looking at the old and new code, this change looks like the missing piece. But I think we should consider to decouple those 22 functions to alleviate the side effects produced by small changes like this in the near future.

This issue comes from a breaking change in the AWS library where they removed the generic resolver and introduced specific resolvers for each service: ec2 resolver, sqs resolver, etc. I have some degree of confidence, after looking at alternatives and other solutions in the AWS SDK, that this should solve the issue. But as I said before, we should look at those specific resolvers because they probably do much more for users than the custom generic solution we have now.

@aspacca
Copy link

aspacca commented Aug 30, 2022

Hey, but disagree and commit. Seriously, no prob.

I disagreed and committed :)

Given other issues the problem does not seem to be limited to the missing SigningRegion, but to the overall behaviour of the endpoint resolver (see elastic/integrations#3990 for example).

My point to the change I made is that I expect the internal endpoint resolver of the packages in the aws sdk to be much more tested than any coverage and manual testing we can do for an hot fix.

@aspacca aspacca force-pushed the aws-fix-region-not-being-used branch from d91b80f to 9818f62 Compare August 30, 2022 12:26
@mergify
Copy link
Contributor

mergify bot commented Aug 30, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b aws-fix-region-not-being-used upstream/aws-fix-region-not-being-used
git merge upstream/main
git push upstream aws-fix-region-not-being-used

@aspacca
Copy link

aspacca commented Aug 30, 2022

after discussing with @sayden we agreed on reducing the surface of the change

we need proper manual testing of different scenarios

@aspacca aspacca requested review from a team as code owners August 31, 2022 02:30
@aspacca aspacca added the backport-v8.4.0 Automated backport with mergify label Aug 31, 2022
@aspacca aspacca changed the title Add signing region into aws credentials Refactor endpoint resolver for AWS services Aug 31, 2022
@aspacca aspacca assigned aspacca and unassigned sayden Aug 31, 2022
@tommyers-elastic tommyers-elastic self-requested a review August 31, 2022 06:50
@tommyers-elastic
Copy link
Contributor

thanks for getting deeper into this @sayden and @aspacca.

i think it would be have been best to merge the small fix as originally reviewed, then approach the larger refactor in a new PR. this way the short term fix for SQS could be merged, whilst the larger PR goes through review + test. i have removed my original ✅ for now.

CHANGELOG.next.asciidoc Outdated Show resolved Hide resolved
@mergify
Copy link
Contributor

mergify bot commented Sep 1, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b aws-fix-region-not-being-used upstream/aws-fix-region-not-being-used
git merge upstream/main
git push upstream aws-fix-region-not-being-used

@legoguy1000
Copy link
Contributor

legoguy1000 commented Sep 2, 2022

I'm looking at it now, but there may be an issue with removing the endpoint config option for metrics/sqs/... is for non regular AWS, to include the China AWS and US gov isolated (not govcloud) environments which all have different domains than amazonaws.com

Looks like that has been handled via https://github.com/aws/aws-sdk-go-v2/blob/main/service/s3/internal/endpoints/endpoints.go. It appears as long as the region is set it will autoconfigure the proper endpoint. I would say that the docs should be updated to enforce the importance of setting the default_region when using the non-regular AWS domain (amazonaws.com)

@tlee-elastic
Copy link

Quick question, will we expose a UI option for setting the default_region for fleet managed agents as I dont think there is a way to set this currently?

@aspacca aspacca merged commit d0bc413 into elastic:main Sep 8, 2022
mergify bot pushed a commit that referenced this pull request Sep 8, 2022
* Add signing region

* Add changelog entry

* remove awscommon.EnrichAWSConfigWithEndpoint

* add nonAWSBucketResolver when recreating the s3 client for new region

* remove unused functions

* changelog

* fix changelog

* fix docs

Co-authored-by: Andrea Spacca <[email protected]>
(cherry picked from commit d0bc413)

# Conflicts:
#	x-pack/libbeat/common/aws/credentials.go
#	x-pack/libbeat/common/aws/credentials_test.go
@aspacca
Copy link

aspacca commented Sep 8, 2022

@tlee-elastic

Quick question, will we expose a UI option for setting the default_region for fleet managed agents as I dont think there is a way to set this currently?

@legoguy1000 is taking care of it at elastic/integrations#4158 👍

aspacca pushed a commit that referenced this pull request Sep 8, 2022
…33014)

* Refactor endpoint resolver for AWS services (#32921)

* Add signing region

* Add changelog entry

* remove awscommon.EnrichAWSConfigWithEndpoint

* add nonAWSBucketResolver when recreating the s3 client for new region

* remove unused functions

* changelog

* fix changelog

* fix docs

Co-authored-by: Andrea Spacca <[email protected]>
(cherry picked from commit d0bc413)

# Conflicts:
#	x-pack/libbeat/common/aws/credentials.go
#	x-pack/libbeat/common/aws/credentials_test.go

* fix merge

Co-authored-by: Mario Castro <[email protected]>
Co-authored-by: Andrea Spacca <[email protected]>
chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
* Add signing region

* Add changelog entry

* remove awscommon.EnrichAWSConfigWithEndpoint

* add nonAWSBucketResolver when recreating the s3 client for new region

* remove unused functions

* changelog

* fix changelog

* fix docs

Co-authored-by: Andrea Spacca <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.4.0 Automated backport with mergify bugfix Team:Cloud-Monitoring Label for the Cloud Monitoring team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[filebeat] Credentials error with S3+SQS input in 8.4.0
8 participants