aws-sdk-core > 3.78.0 slows down the fetching of credentials #2177

grdw · 2019-11-26T10:52:07Z

Issue description

Ever since moving from aws-sdk-core version 3.78.0 to 3.79.0 the way credentials are fetched have changed from:

curl -H "User-Agent aws-sdk-ruby3/3.78.0" "http://169.254.169.254/latest/meta-data/iam/security-credentials/"
curl -H "User-Agent aws-sdk-ruby3/3.78.0" "http://169.254.169.254/latest/meta-data/iam/security-credentials/<response-from-first-get>"

to:

curl -v -H "User-Agent aws-sdk-ruby3/3.79.0" -H "x-aws-ec2-metadata-token-ttl-seconds 21600" -X PUT http://169.254.169.254/latest/api/token

I guess that has it's reasons, but this PUT request returns a 400 at my end, which in turn causes it to retry 5 times (?). After that unsuccessful chain of events it falls back to the old way of fetching credentials. This is of course painfully slow. Is there some special magic required to not make it return a 400?

Gem name ('aws-sdk', 'aws-sdk-resources' or service gems like 'aws-sdk-s3') and its version

aws-sdk-core 3.79.0 and up

Version of Ruby, OS environment

Ruby 2.6.0
Kubernetes/Docker

Code snippets / steps to reproduce

See curl examples

The text was updated successfully, but these errors were encountered:

mullermp · 2019-11-26T19:00:35Z

duplicate of #2174

mullermp · 2019-11-26T23:33:21Z

@grdw How are you invoking that curl request? Is that directly from the SDK? If so that's broken behavior. For the header, you're missing a : between the key and value. On an EC2 host, the following command works:

curl -v -H "User-Agent aws-sdk-ruby3/3.79.0" -H "x-aws-ec2-metadata-token-ttl-seconds:21600" -X PUT http://169.254.169.254/latest/api/token

Regarding the issue, could you try increasing the hop limit? It fails on 1 hop by design, but if you're running kubernetes, the container will need to forward your request, adding an additional hop. See the SDK GO issue as well as the other Ruby SDK issue (#2174).

aws ec2 modify-instance-metadata-options --instance-id <instance ID> --http-put-response-hop-limit 3 --http-endpoint enabled --region <region>

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html

mullermp · 2019-11-26T23:49:07Z

Hop limits should be increased for EKS/ECS users automatically, but that appears not to be the case. I'm working with other SDK language teams and forwarding that feedback to relevant EC2/ECS teams to see what the appropriate fix should be.

btalbot · 2019-11-27T00:40:42Z

I'd like to point out again that the hop limit is just one of several anti-proxy protections added to the PUT token request. Tuning the number of TCP hops is meant to accommodate layer-3 proxies (like NAT) but does nothing to enable support for HTTP (layer 7) proxies.

In our case, we're running Kubernetes clusters on EC2 (which we have had deployed long before AWS offered any k8s support at all so please don't say we should have just used EKS in 2016 instead). For such clusters the jtblin/kube2iam#241 proxy is a very popular way to manage roles apps may use. However, the kube2iam runs as an HTTP proxy which adds the X-Forwarded-For header. These are rejected if the IMDSv2 PUT token call is made and currently any use of a current AWS SDK prevents the app from working (without long delays). The only work-around is to fall back to older SDK versions.

grdw · 2019-11-27T08:40:32Z

curl -v -H "User-Agent aws-sdk-ruby3/3.79.0" -H "x-aws-ec2-metadata-token-ttl-seconds:21600" -X PUT http://169.254.169.254/latest/api/token

@mullermp if I make the change to the curl-request as suggested, by adding a : I see that the repsonse goes from a 400 Bad request to a 403 Forbidden.

mullermp · 2019-11-27T18:28:29Z

@grdw Are you running that from within the container or on the ec2 host? What type of EC2 host is it?

grdw · 2019-11-28T07:55:13Z

@mullermp Yes I was running that curl request from within a container indeed. The EC2 host type looks like a c5.4xlarge . If you need more details I can include one of our platform engineers in this conversation.

mullermp · 2019-11-29T19:42:06Z

@grdw Are you unblocked for now? The curl request was malformed (400) and now correct (403) if you're running inside the container. Did you increase the hop limit as per the documentation? IMDSv2 expects one hop by design, and as the nature of a container, it must be routed an additional time. I'm expecting to hear more details next week when relevant people are back from holidays..

mullermp · 2019-12-04T22:26:46Z

@grdw How slow is startup for you on the newest versions?

I created an EC2 instance and created a docker container. On the instance I set the hop limit to 1.

aws ec2 modify-instance-metadata-options --instance-id --http-put-response-hop-limit 1 --http-endpoint enabled --region us-west-2

My curl request will hang (expected)

curl -v -H "User-Agent aws-sdk-ruby3/3.79.0" -H "x-aws-ec2-metadata-token-ttl-seconds:21600" -X PUT http://169.254.169.254/latest/api/token

In the container, I installed git, ruby, etc, and cloned the repo / ran bundle install. I tried using the REPL console:

gems/aws-sdk-resources/bin/aws-v3.rb

The number of retries was set to 0 by default, the IMDSv2 token was not fetched, and correctly falls back to IMDSv1 within a couple seconds. I then set the hop limit to 3 and the token was correctly fetched on the first try.

Are you by chance setting instance_profile_credentials_retries in your configuration? You could also try configuring instance_profile_credentials_timeout. Does my example best verify your case?

@btalbot For your case, you mentioned startup is not long. I think that the kube2iam package will need to remove that X-forwarded-for header -- it sounds like a permanent design decision from EC2. I will be talking with some members of EKS/EC2 on Monday - I can provide a soft update.

grdw · 2019-12-05T09:58:11Z

Thanks for your answer. I'm not sure how and if that is configured:

Are you by chance setting instance_profile_credentials_retries in your configuration? You could also try configuring instance_profile_credentials_timeout. Does my example best verify your case?

I could give it a try and see if it fixes the speed problems. I'm okay if it does one extra request I guess 💭 . I'll get back to you.

Does my example best verify your case?

Yes. This does verify my case. However in my case the retries default is set to 5 (as it reads here)

mullermp · 2019-12-05T18:11:48Z

@grdw When looking at how InstanceProfileCredentials is initialized (only two cases I could find), a value should be passed in for retries or defaulting to 0, so it shouldn't be picking 5 unless you are initializing the class yourself or perhaps a bug somewhere? When I did my testing (last comment), I printed @retries and it was 0, that was without any configuration.

https://github.com/aws/aws-sdk-ruby/blob/master/gems/aws-sdk-core/lib/aws-sdk-core/shared_config.rb#L357

https://github.com/aws/aws-sdk-ruby/blob/master/gems/aws-sdk-core/lib/aws-sdk-core/credential_provider_chain.rb#L28

mullermp · 2019-12-05T19:35:08Z

Are you initializing any clients with credentials: InstanceProfileCredentials.new in any way?

grdw · 2019-12-06T09:28:28Z

Are you initializing any clients with credentials: InstanceProfileCredentials.new in any way?

No we don't do anything out of the ordinary over there. We only use the ENV variables (i.e. AWS_ACCESS_KEY_ID and the AWS_SECRET_ACCESS_KEY) inside of a container on a k8s cluster. It is made clear to me that for fetching the credentials we rely on kiam, so maybe it's something kiam related (including @thomasvnoort).

btalbot · 2019-12-06T21:07:51Z

Since https://github.com/uswitch/kiam uses the same Go-lang reverse proxy that kube2iam does (https://github.com/uswitch/kiam/blob/master/pkg/aws/metadata/server.go#L60) it will also cause any PUT token request to be rejected with the 403 response for the same X-Forwarded-For header reason.

btalbot · 2019-12-06T21:13:07Z

IMO, each of the lang sdk need to support an env variable so that deployments can enable or disable IMDSv2 on a per-deployment basis without having app developers determine (via sdk version) which version must be used.

The default could be IMDSv2 -- though you are still likely to get a ton of "IMDS is broken bugs" like this one -- but as long as there is a way for deployers (not only developers) can use the still-supported IMDSv1 I think that will be acceptable.

btalbot · 2019-12-06T23:01:04Z

It's pretty trivial to reproduce assuming you have a k8s cluster with kube2iam, kiam, or similar IMDS proxy enabled. Run a new container and start a shell which has enough ruby tools to run bundler and install the aws-sdk-core. I'm using
./kubectl run bryan --rm -i --tty --image=fingershock/ruby:2.5.7-builder --restart=Never sh
but there are many ways to do this.

IMDSv1 sdk 3.78.0 works fast .. just a few miliseconds

$> cat Gemfile
source "https://rubygems.org"
gem 'json'
gem 'aws-sdk-core', '3.78.0'

$> bundle install
... output elided ...

$> bundle exec ruby -e 'require "benchmark"; require "aws-sdk-core"; puts Benchmark.measure { pp Aws::InstanceProfileCredentials.new }'
#<Aws::InstanceProfileCredentials:0x000055c9f525a480
 @backoff=
  #<Proc:0x000055c9f525a3e0@/usr/lib/ruby/gems/2.5.0/gems/aws-sdk-core-3.78.0/lib/aws-sdk-core/instance_profile_credentials.rb:65 (lambda)>,
 @credentials=#<Aws::Credentials access_key_id="ASIA6D4HKVYTA5JSVVGB">,
 @expiration=2019-12-06 23:21:28 UTC,
 @http_debug_output=nil,
 @http_open_timeout=5,
 @http_read_timeout=5,
 @ip_address="169.254.169.254",
 @mutex=#<Thread::Mutex:0x000055c9f525a340>,
 @port=80,
 @retries=5>
  0.005531   0.000000   0.005531 (  0.034940)

Newer sdk which forces use of IMDSv2

$> cat Gemfile
source "https://rubygems.org"
gem 'json'
gem 'aws-sdk-core', '3.79.0'

$> bundle install
... output elided ...

$> bundle exec ruby -e 'require "benchmark"; require "aws-sdk-core"; puts Benchmark.measure { pp Aws::InstanceProfileCredentials.new }'
#<Aws::InstanceProfileCredentials:0x0000555ad9fa35c8
 @backoff=
  #<Proc:0x0000555ad9fa3488@/usr/lib/ruby/gems/2.5.0/gems/aws-sdk-core-3.79.0/lib/aws-sdk-core/instance_profile_credentials.rb:83 (lambda)>,
 @credentials=#<Aws::Credentials access_key_id="ASIA6D4HKVYTA5JSVVGB">,
 @expiration=2019-12-06 23:21:28 UTC,
 @http_debug_output=nil,
 @http_open_timeout=5,
 @http_read_timeout=5,
 @ip_address="169.254.169.254",
 @mutex=#<Thread::Mutex:0x0000555ad9fa33e8>,
 @port=80,
 @retries=5,
 @token=nil,
 @token_ttl=21600>
  0.008241   0.000839   0.009080 (  7.457265)

mullermp · 2019-12-09T21:55:59Z

Soft update. Members of the SDK teams met with EC2 and EKS today.

Highlights are -

We cannot implement an option to disable IMDSv2
EKS teams will push 3rd party implementations (kube/kube2iam) to support IMDSv2
SDKs will change retry strategies. ~~For Ruby, I am thinking 3 retries at 1 second each for InstanceProfileCredentials to alleviate pain points.~~ backoff may be a problem here
EC2 will increase default hop limits for managed services.

Blog posts for some of the security reasonings: https://aws.amazon.com/blogs/security/defense-in-depth-open-firewalls-reverse-proxies-ssrf-vulnerabilities-ec2-instance-metadata-service/

grdw · 2019-12-12T11:24:18Z

I recently upgraded to 3.85.1 . From our performance monitoring tool requests to AWS are still slow. Above all it still retries 5 times:

So I released a version 1.1.28 of our app which only includes the aws-sdk-core 3.85.1 update and the speed goes from 39ms to 7.5s

For the 39ms these are the amount of requests to AWS (GET http://169.254.169.254):

For the 7.5s request this is what happens:

Again 6 PUT http://169.254.169.254 and 2 GET http://169.254.169.254

The main thing I can conclude from this is this Ruby gem still retries 5 times and than proceed to fallback to the old way. I'm not sure exactly how or why, but it's the way it is.

mullermp · 2019-12-12T18:03:05Z

@grdw I'm sorry that it's still happening. From our end, it doesn't seem like any of our defaults would be causing that. Have you tried configuring instance_profile_credentials_retries or instance_profile_credentials_timeout? If it's true that you're getting credentials via IAM using a 3rd party, it sounds like you don't need instance profile credentials at all? You could set ENV['AWS_EC2_METADATA_DISABLED'] = 'true'

mrlevitas · 2019-12-12T23:16:55Z

I'm experiencing the same delays when using aws-sdk-core with aws-sdk-s3 to download files from s3.
When I use ENV['AWS_EC2_METADATA_DISABLED'] = "true", I get Aws::Sigv4::Errors::MissingCredentialsError when initializing an s3 client.

We're having to downgrade to 3.78 to use our api without significant latency but that breaks downloading from s3 in this way:
Undefined method 's3_use_arn_region' for #<Aws::SharedConfig:x0...>

3.85.1 did not help.

btalbot · 2019-12-12T23:50:36Z

If you want to pin aws-sdk-core to 3.78.0 to avoid IMDSv2, then you'll also need to pin aws-sdk-s3 to 1.57.0 to avoid #s3_use_arn_region which was introduced in aws-sdk-core after 3.78.0

grdw · 2019-12-13T09:37:52Z

When I use ENV['AWS_EC2_METADATA_DISABLED'] = true, I get Aws::Sigv4::Errors::MissingCredentialsError when initializing an s3 client.

Can confirm; the same behavior is present when I try this.

mullermp · 2019-12-13T17:51:09Z

Just FYI - 'true' must be a string. Edited my original post. https://github.com/aws/aws-sdk-ruby/blob/master/gems/aws-sdk-core/lib/aws-sdk-core/instance_profile_credentials.rb#L150. If you were using a string, that is still not working for you?

mrlevitas · 2019-12-13T18:44:33Z

yes, I used a string. edited my post.

mullermp · 2019-12-13T21:49:52Z

@mrlevitas It would make sense to see that error if you're relying on IMDS for credentials and haven't supplied your own. I suppose if you're all using kube2iam then it needs to actually hit the metadata service (which is proxied) so disabling isn't actually work around.. Sorry that I was wrong on that one.

To be candid, I'm not sure what else I can really do. EC2 has mandated that we can't disable IMDSv2 only. EKS team will be owning the task of adding support to these 3rd party libraries. I think the best move might be to pin your version. I'm not going to ask you to move to EKS (that's not my intention and you should be free to use whatever implementations you want) but it would work more seamlessly. Since @btalbot seems to be the resident expert here, would removing the X-forwarded-for headers in kube2iam be a solution? It sounded like, ideally, that header should be present (and correct to use) but would that be a breaking change in any way?

btalbot · 2019-12-13T22:19:13Z

The XFF header is added by by the go-lang http proxy class and is typically added by every proxy. To remove it from the go-lang proxy would remove its use for every user of the go-lang proxy and make the IMDSv2 blocking of it pointless. If IMDSv2 is blocking XFF requests but then pushes proxy libraries to not include the header, then the blocking of requests with XFF is no longer effective.

ghost · 2021-12-16T20:23:28Z

I was able to get around this issue by setting my docker's network mode to host. Not sure if anyone else out there has tried it but if host networking is an option things seem to work out nicely.

mullermp closed this as completed Nov 26, 2019

mullermp mentioned this issue Nov 26, 2019

IMDSv2 use cannot be disabled #2174

Closed

mullermp reopened this Nov 26, 2019

mullermp self-assigned this Nov 27, 2019

mullermp added service-api General API label for AWS Services. work-in-progress dependencies This issue is a problem in a dependency. SECURITY This is a security issue. and removed service-api General API label for AWS Services. labels Nov 27, 2019

simukappu mentioned this issue Nov 30, 2019

Support or ignore IMDSv2 in instance profile test awslabs/aws-fluent-plugin-kinesis#191

Closed

mullermp added pending and removed dependencies This issue is a problem in a dependency. work-in-progress labels Dec 4, 2019

mullermp added third-party This issue is related to third-party libraries or applications. and removed pending labels Dec 6, 2019

joelittlejohn mentioned this issue Dec 9, 2019

InstanceProfileCredentialsProvider is slow to get credentials since 1.11.678 aws/aws-sdk-java#2171

Closed

mullermp mentioned this issue Dec 10, 2019

Instance profile credentials cleanup and fixes #2182

Merged

mkantzer mentioned this issue Dec 10, 2019

Provide an environment variable to disable IMDSv2 path aws/aws-sdk-go#2980

Closed

mullermp closed this as completed in #2182 Dec 11, 2019

barrywoolgar mentioned this issue Jan 7, 2020

Very long delay when enqueuing new jobs (~38 seconds) active-elastic-job/active-elastic-job#109

Open

rbvigilante mentioned this issue Feb 17, 2020

Unable to access new AWS metadata api uswitch/kiam#359

Closed

wdittmer-mp mentioned this issue Sep 10, 2020

Since Version 2.575.0 - CognitoIdentity.getOpenIdTokenForDeveloperIdentity is too slow aws/aws-sdk-js#3005

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aws-sdk-core > 3.78.0 slows down the fetching of credentials #2177

aws-sdk-core > 3.78.0 slows down the fetching of credentials #2177

grdw commented Nov 26, 2019 •

edited

Loading

mullermp commented Nov 26, 2019

mullermp commented Nov 26, 2019

mullermp commented Nov 26, 2019 •

edited

Loading

btalbot commented Nov 27, 2019

grdw commented Nov 27, 2019

mullermp commented Nov 27, 2019

grdw commented Nov 28, 2019 •

edited

Loading

mullermp commented Nov 29, 2019

mullermp commented Dec 4, 2019 •

edited

Loading

grdw commented Dec 5, 2019 •

edited

Loading

mullermp commented Dec 5, 2019

mullermp commented Dec 5, 2019

grdw commented Dec 6, 2019

btalbot commented Dec 6, 2019 •

edited

Loading

btalbot commented Dec 6, 2019

btalbot commented Dec 6, 2019

mullermp commented Dec 9, 2019 •

edited

Loading

grdw commented Dec 12, 2019

mullermp commented Dec 12, 2019 •

edited

Loading

mrlevitas commented Dec 12, 2019 •

edited

Loading

btalbot commented Dec 12, 2019

grdw commented Dec 13, 2019

mullermp commented Dec 13, 2019

mrlevitas commented Dec 13, 2019

mullermp commented Dec 13, 2019

btalbot commented Dec 13, 2019

ghost commented Dec 16, 2021

aws-sdk-core > 3.78.0 slows down the fetching of credentials #2177

aws-sdk-core > 3.78.0 slows down the fetching of credentials #2177

Comments

grdw commented Nov 26, 2019 • edited Loading

Issue description

Gem name ('aws-sdk', 'aws-sdk-resources' or service gems like 'aws-sdk-s3') and its version

Version of Ruby, OS environment

Code snippets / steps to reproduce

mullermp commented Nov 26, 2019

mullermp commented Nov 26, 2019

mullermp commented Nov 26, 2019 • edited Loading

btalbot commented Nov 27, 2019

grdw commented Nov 27, 2019

mullermp commented Nov 27, 2019

grdw commented Nov 28, 2019 • edited Loading

mullermp commented Nov 29, 2019

mullermp commented Dec 4, 2019 • edited Loading

grdw commented Dec 5, 2019 • edited Loading

mullermp commented Dec 5, 2019

mullermp commented Dec 5, 2019

grdw commented Dec 6, 2019

btalbot commented Dec 6, 2019 • edited Loading

btalbot commented Dec 6, 2019

btalbot commented Dec 6, 2019

mullermp commented Dec 9, 2019 • edited Loading

grdw commented Dec 12, 2019

mullermp commented Dec 12, 2019 • edited Loading

mrlevitas commented Dec 12, 2019 • edited Loading

btalbot commented Dec 12, 2019

grdw commented Dec 13, 2019

mullermp commented Dec 13, 2019

mrlevitas commented Dec 13, 2019

mullermp commented Dec 13, 2019

btalbot commented Dec 13, 2019

ghost commented Dec 16, 2021

grdw commented Nov 26, 2019 •

edited

Loading

mullermp commented Nov 26, 2019 •

edited

Loading

grdw commented Nov 28, 2019 •

edited

Loading

mullermp commented Dec 4, 2019 •

edited

Loading

grdw commented Dec 5, 2019 •

edited

Loading

btalbot commented Dec 6, 2019 •

edited

Loading

mullermp commented Dec 9, 2019 •

edited

Loading

mullermp commented Dec 12, 2019 •

edited

Loading

mrlevitas commented Dec 12, 2019 •

edited

Loading