Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-sdk-core > 3.78.0 slows down the fetching of credentials #2177

Closed
grdw opened this issue Nov 26, 2019 · 27 comments · Fixed by #2182
Closed

aws-sdk-core > 3.78.0 slows down the fetching of credentials #2177

grdw opened this issue Nov 26, 2019 · 27 comments · Fixed by #2182
Assignees
Labels
SECURITY This is a security issue. third-party This issue is related to third-party libraries or applications.

Comments

@grdw
Copy link

grdw commented Nov 26, 2019

Issue description

Ever since moving from aws-sdk-core version 3.78.0 to 3.79.0 the way credentials are fetched have changed from:

curl -H "User-Agent aws-sdk-ruby3/3.78.0" "http://169.254.169.254/latest/meta-data/iam/security-credentials/"
curl -H "User-Agent aws-sdk-ruby3/3.78.0" "http://169.254.169.254/latest/meta-data/iam/security-credentials/<response-from-first-get>"

to:

curl -v -H "User-Agent aws-sdk-ruby3/3.79.0" -H "x-aws-ec2-metadata-token-ttl-seconds 21600" -X PUT http://169.254.169.254/latest/api/token

I guess that has it's reasons, but this PUT request returns a 400 at my end, which in turn causes it to retry 5 times (?). After that unsuccessful chain of events it falls back to the old way of fetching credentials. This is of course painfully slow. Is there some special magic required to not make it return a 400?

Gem name ('aws-sdk', 'aws-sdk-resources' or service gems like 'aws-sdk-s3') and its version

aws-sdk-core 3.79.0 and up

Version of Ruby, OS environment

Ruby 2.6.0
Kubernetes/Docker

Code snippets / steps to reproduce

See curl examples

@mullermp
Copy link
Contributor

duplicate of #2174

@mullermp
Copy link
Contributor

@grdw How are you invoking that curl request? Is that directly from the SDK? If so that's broken behavior. For the header, you're missing a : between the key and value. On an EC2 host, the following command works:

curl -v -H "User-Agent aws-sdk-ruby3/3.79.0" -H "x-aws-ec2-metadata-token-ttl-seconds:21600" -X PUT http://169.254.169.254/latest/api/token

Regarding the issue, could you try increasing the hop limit? It fails on 1 hop by design, but if you're running kubernetes, the container will need to forward your request, adding an additional hop. See the SDK GO issue as well as the other Ruby SDK issue (#2174).

aws ec2 modify-instance-metadata-options --instance-id <instance ID> --http-put-response-hop-limit 3 --http-endpoint enabled --region <region>

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html

@mullermp
Copy link
Contributor

mullermp commented Nov 26, 2019

Hop limits should be increased for EKS/ECS users automatically, but that appears not to be the case. I'm working with other SDK language teams and forwarding that feedback to relevant EC2/ECS teams to see what the appropriate fix should be.

@btalbot
Copy link

btalbot commented Nov 27, 2019

I'd like to point out again that the hop limit is just one of several anti-proxy protections added to the PUT token request. Tuning the number of TCP hops is meant to accommodate layer-3 proxies (like NAT) but does nothing to enable support for HTTP (layer 7) proxies.

In our case, we're running Kubernetes clusters on EC2 (which we have had deployed long before AWS offered any k8s support at all so please don't say we should have just used EKS in 2016 instead). For such clusters the jtblin/kube2iam#241 proxy is a very popular way to manage roles apps may use. However, the kube2iam runs as an HTTP proxy which adds the X-Forwarded-For header. These are rejected if the IMDSv2 PUT token call is made and currently any use of a current AWS SDK prevents the app from working (without long delays). The only work-around is to fall back to older SDK versions.

@grdw
Copy link
Author

grdw commented Nov 27, 2019

curl -v -H "User-Agent aws-sdk-ruby3/3.79.0" -H "x-aws-ec2-metadata-token-ttl-seconds:21600" -X PUT http://169.254.169.254/latest/api/token

@mullermp if I make the change to the curl-request as suggested, by adding a : I see that the repsonse goes from a 400 Bad request to a 403 Forbidden.

@mullermp
Copy link
Contributor

@grdw Are you running that from within the container or on the ec2 host? What type of EC2 host is it?

@mullermp mullermp self-assigned this Nov 27, 2019
@mullermp mullermp added service-api General API label for AWS Services. work-in-progress dependencies This issue is a problem in a dependency. SECURITY This is a security issue. and removed service-api General API label for AWS Services. labels Nov 27, 2019
@grdw
Copy link
Author

grdw commented Nov 28, 2019

@mullermp Yes I was running that curl request from within a container indeed. The EC2 host type looks like a c5.4xlarge . If you need more details I can include one of our platform engineers in this conversation.

@mullermp
Copy link
Contributor

@grdw Are you unblocked for now? The curl request was malformed (400) and now correct (403) if you're running inside the container. Did you increase the hop limit as per the documentation? IMDSv2 expects one hop by design, and as the nature of a container, it must be routed an additional time. I'm expecting to hear more details next week when relevant people are back from holidays..

@mullermp
Copy link
Contributor

mullermp commented Dec 4, 2019

@grdw How slow is startup for you on the newest versions?

I created an EC2 instance and created a docker container. On the instance I set the hop limit to 1.

aws ec2 modify-instance-metadata-options --instance-id --http-put-response-hop-limit 1 --http-endpoint enabled --region us-west-2

My curl request will hang (expected)

curl -v -H "User-Agent aws-sdk-ruby3/3.79.0" -H "x-aws-ec2-metadata-token-ttl-seconds:21600" -X PUT http://169.254.169.254/latest/api/token

In the container, I installed git, ruby, etc, and cloned the repo / ran bundle install. I tried using the REPL console:

gems/aws-sdk-resources/bin/aws-v3.rb

The number of retries was set to 0 by default, the IMDSv2 token was not fetched, and correctly falls back to IMDSv1 within a couple seconds. I then set the hop limit to 3 and the token was correctly fetched on the first try.

Are you by chance setting instance_profile_credentials_retries in your configuration? You could also try configuring instance_profile_credentials_timeout. Does my example best verify your case?

@btalbot For your case, you mentioned startup is not long. I think that the kube2iam package will need to remove that X-forwarded-for header -- it sounds like a permanent design decision from EC2. I will be talking with some members of EKS/EC2 on Monday - I can provide a soft update.

@mullermp mullermp added pending and removed dependencies This issue is a problem in a dependency. work-in-progress labels Dec 4, 2019
@grdw
Copy link
Author

grdw commented Dec 5, 2019

Thanks for your answer. I'm not sure how and if that is configured:

Are you by chance setting instance_profile_credentials_retries in your configuration? You could also try configuring instance_profile_credentials_timeout. Does my example best verify your case?

I could give it a try and see if it fixes the speed problems. I'm okay if it does one extra request I guess 💭 . I'll get back to you.


Does my example best verify your case?

Yes. This does verify my case. However in my case the retries default is set to 5 (as it reads here)

@mullermp
Copy link
Contributor

mullermp commented Dec 5, 2019

@grdw When looking at how InstanceProfileCredentials is initialized (only two cases I could find), a value should be passed in for retries or defaulting to 0, so it shouldn't be picking 5 unless you are initializing the class yourself or perhaps a bug somewhere? When I did my testing (last comment), I printed @retries and it was 0, that was without any configuration.

https://github.com/aws/aws-sdk-ruby/blob/master/gems/aws-sdk-core/lib/aws-sdk-core/shared_config.rb#L357

https://github.com/aws/aws-sdk-ruby/blob/master/gems/aws-sdk-core/lib/aws-sdk-core/credential_provider_chain.rb#L28

@mullermp
Copy link
Contributor

mullermp commented Dec 5, 2019

Are you initializing any clients with credentials: InstanceProfileCredentials.new in any way?

@grdw
Copy link
Author

grdw commented Dec 6, 2019

Are you initializing any clients with credentials: InstanceProfileCredentials.new in any way?

No we don't do anything out of the ordinary over there. We only use the ENV variables (i.e. AWS_ACCESS_KEY_ID and the AWS_SECRET_ACCESS_KEY) inside of a container on a k8s cluster. It is made clear to me that for fetching the credentials we rely on kiam, so maybe it's something kiam related (including @thomasvnoort).

@mullermp mullermp added third-party This issue is related to third-party libraries or applications. and removed pending labels Dec 6, 2019
@btalbot
Copy link

btalbot commented Dec 6, 2019

Since https://github.com/uswitch/kiam uses the same Go-lang reverse proxy that kube2iam does (https://github.com/uswitch/kiam/blob/master/pkg/aws/metadata/server.go#L60) it will also cause any PUT token request to be rejected with the 403 response for the same X-Forwarded-For header reason.

@btalbot
Copy link

btalbot commented Dec 6, 2019

IMO, each of the lang sdk need to support an env variable so that deployments can enable or disable IMDSv2 on a per-deployment basis without having app developers determine (via sdk version) which version must be used.

The default could be IMDSv2 -- though you are still likely to get a ton of "IMDS is broken bugs" like this one -- but as long as there is a way for deployers (not only developers) can use the still-supported IMDSv1 I think that will be acceptable.

@btalbot
Copy link

btalbot commented Dec 6, 2019

It's pretty trivial to reproduce assuming you have a k8s cluster with kube2iam, kiam, or similar IMDS proxy enabled. Run a new container and start a shell which has enough ruby tools to run bundler and install the aws-sdk-core. I'm using
./kubectl run bryan --rm -i --tty --image=fingershock/ruby:2.5.7-builder --restart=Never sh
but there are many ways to do this.

IMDSv1 sdk 3.78.0 works fast .. just a few miliseconds

$> cat Gemfile
source "https://rubygems.org"
gem 'json'
gem 'aws-sdk-core', '3.78.0'

$> bundle install
... output elided ...

$> bundle exec ruby -e 'require "benchmark"; require "aws-sdk-core"; puts Benchmark.measure { pp Aws::InstanceProfileCredentials.new }'
#<Aws::InstanceProfileCredentials:0x000055c9f525a480
 @backoff=
  #<Proc:0x000055c9f525a3e0@/usr/lib/ruby/gems/2.5.0/gems/aws-sdk-core-3.78.0/lib/aws-sdk-core/instance_profile_credentials.rb:65 (lambda)>,
 @credentials=#<Aws::Credentials access_key_id="ASIA6D4HKVYTA5JSVVGB">,
 @expiration=2019-12-06 23:21:28 UTC,
 @http_debug_output=nil,
 @http_open_timeout=5,
 @http_read_timeout=5,
 @ip_address="169.254.169.254",
 @mutex=#<Thread::Mutex:0x000055c9f525a340>,
 @port=80,
 @retries=5>
  0.005531   0.000000   0.005531 (  0.034940)

Newer sdk which forces use of IMDSv2

$> cat Gemfile
source "https://rubygems.org"
gem 'json'
gem 'aws-sdk-core', '3.79.0'

$> bundle install
... output elided ...

$> bundle exec ruby -e 'require "benchmark"; require "aws-sdk-core"; puts Benchmark.measure { pp Aws::InstanceProfileCredentials.new }'
#<Aws::InstanceProfileCredentials:0x0000555ad9fa35c8
 @backoff=
  #<Proc:0x0000555ad9fa3488@/usr/lib/ruby/gems/2.5.0/gems/aws-sdk-core-3.79.0/lib/aws-sdk-core/instance_profile_credentials.rb:83 (lambda)>,
 @credentials=#<Aws::Credentials access_key_id="ASIA6D4HKVYTA5JSVVGB">,
 @expiration=2019-12-06 23:21:28 UTC,
 @http_debug_output=nil,
 @http_open_timeout=5,
 @http_read_timeout=5,
 @ip_address="169.254.169.254",
 @mutex=#<Thread::Mutex:0x0000555ad9fa33e8>,
 @port=80,
 @retries=5,
 @token=nil,
 @token_ttl=21600>
  0.008241   0.000839   0.009080 (  7.457265)

@mullermp
Copy link
Contributor

mullermp commented Dec 9, 2019

Soft update. Members of the SDK teams met with EC2 and EKS today.

Highlights are -

  1. We cannot implement an option to disable IMDSv2
  2. EKS teams will push 3rd party implementations (kube/kube2iam) to support IMDSv2
  3. SDKs will change retry strategies. For Ruby, I am thinking 3 retries at 1 second each for InstanceProfileCredentials to alleviate pain points. backoff may be a problem here
  4. EC2 will increase default hop limits for managed services.

Blog posts for some of the security reasonings: https://aws.amazon.com/blogs/security/defense-in-depth-open-firewalls-reverse-proxies-ssrf-vulnerabilities-ec2-instance-metadata-service/

@grdw
Copy link
Author

grdw commented Dec 12, 2019

I recently upgraded to 3.85.1 . From our performance monitoring tool requests to AWS are still slow. Above all it still retries 5 times:

Screenshot 2019-12-12 at 12 18 23

So I released a version 1.1.28 of our app which only includes the aws-sdk-core 3.85.1 update and the speed goes from 39ms to 7.5s

For the 39ms these are the amount of requests to AWS (GET http://169.254.169.254):

Screenshot 2019-12-12 at 12 19 29

For the 7.5s request this is what happens:

Screenshot 2019-12-12 at 12 20 35

Again 6 PUT http://169.254.169.254 and 2 GET http://169.254.169.254

The main thing I can conclude from this is this Ruby gem still retries 5 times and than proceed to fallback to the old way. I'm not sure exactly how or why, but it's the way it is.

@mullermp
Copy link
Contributor

mullermp commented Dec 12, 2019

@grdw I'm sorry that it's still happening. From our end, it doesn't seem like any of our defaults would be causing that. Have you tried configuring instance_profile_credentials_retries or instance_profile_credentials_timeout? If it's true that you're getting credentials via IAM using a 3rd party, it sounds like you don't need instance profile credentials at all? You could set ENV['AWS_EC2_METADATA_DISABLED'] = 'true'

@mrlevitas
Copy link

mrlevitas commented Dec 12, 2019

I'm experiencing the same delays when using aws-sdk-core with aws-sdk-s3 to download files from s3.
When I use ENV['AWS_EC2_METADATA_DISABLED'] = "true", I get Aws::Sigv4::Errors::MissingCredentialsError when initializing an s3 client.

We're having to downgrade to 3.78 to use our api without significant latency but that breaks downloading from s3 in this way:
Undefined method 's3_use_arn_region' for #<Aws::SharedConfig:x0...>

3.85.1 did not help.

@btalbot
Copy link

btalbot commented Dec 12, 2019

If you want to pin aws-sdk-core to 3.78.0 to avoid IMDSv2, then you'll also need to pin aws-sdk-s3 to 1.57.0 to avoid #s3_use_arn_region which was introduced in aws-sdk-core after 3.78.0

@grdw
Copy link
Author

grdw commented Dec 13, 2019

When I use ENV['AWS_EC2_METADATA_DISABLED'] = true, I get Aws::Sigv4::Errors::MissingCredentialsError when initializing an s3 client.

Can confirm; the same behavior is present when I try this.

@mullermp
Copy link
Contributor

Just FYI - 'true' must be a string. Edited my original post. https://github.com/aws/aws-sdk-ruby/blob/master/gems/aws-sdk-core/lib/aws-sdk-core/instance_profile_credentials.rb#L150. If you were using a string, that is still not working for you?

@mrlevitas
Copy link

yes, I used a string. edited my post.

@mullermp
Copy link
Contributor

@mrlevitas It would make sense to see that error if you're relying on IMDS for credentials and haven't supplied your own. I suppose if you're all using kube2iam then it needs to actually hit the metadata service (which is proxied) so disabling isn't actually work around.. Sorry that I was wrong on that one.

To be candid, I'm not sure what else I can really do. EC2 has mandated that we can't disable IMDSv2 only. EKS team will be owning the task of adding support to these 3rd party libraries. I think the best move might be to pin your version. I'm not going to ask you to move to EKS (that's not my intention and you should be free to use whatever implementations you want) but it would work more seamlessly. Since @btalbot seems to be the resident expert here, would removing the X-forwarded-for headers in kube2iam be a solution? It sounded like, ideally, that header should be present (and correct to use) but would that be a breaking change in any way?

@btalbot
Copy link

btalbot commented Dec 13, 2019

The XFF header is added by by the go-lang http proxy class and is typically added by every proxy. To remove it from the go-lang proxy would remove its use for every user of the go-lang proxy and make the IMDSv2 blocking of it pointless. If IMDSv2 is blocking XFF requests but then pushes proxy libraries to not include the header, then the blocking of requests with XFF is no longer effective.

@ghost
Copy link

ghost commented Dec 16, 2021

I was able to get around this issue by setting my docker's network mode to host. Not sure if anyone else out there has tried it but if host networking is an option things seem to work out nicely.
image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SECURITY This is a security issue. third-party This issue is related to third-party libraries or applications.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants