-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws-sdk-core > 3.78.0 slows down the fetching of credentials #2177
Comments
duplicate of #2174 |
@grdw How are you invoking that curl request? Is that directly from the SDK? If so that's broken behavior. For the header, you're missing a
Regarding the issue, could you try increasing the hop limit? It fails on 1 hop by design, but if you're running kubernetes, the container will need to forward your request, adding an additional hop. See the SDK GO issue as well as the other Ruby SDK issue (#2174).
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html |
Hop limits should be increased for EKS/ECS users automatically, but that appears not to be the case. I'm working with other SDK language teams and forwarding that feedback to relevant EC2/ECS teams to see what the appropriate fix should be. |
I'd like to point out again that the hop limit is just one of several anti-proxy protections added to the PUT token request. Tuning the number of TCP hops is meant to accommodate layer-3 proxies (like NAT) but does nothing to enable support for HTTP (layer 7) proxies. In our case, we're running Kubernetes clusters on EC2 (which we have had deployed long before AWS offered any k8s support at all so please don't say we should have just used EKS in 2016 instead). For such clusters the jtblin/kube2iam#241 proxy is a very popular way to manage roles apps may use. However, the kube2iam runs as an HTTP proxy which adds the X-Forwarded-For header. These are rejected if the IMDSv2 PUT token call is made and currently any use of a current AWS SDK prevents the app from working (without long delays). The only work-around is to fall back to older SDK versions. |
@mullermp if I make the change to the curl-request as suggested, by adding a |
@grdw Are you running that from within the container or on the ec2 host? What type of EC2 host is it? |
@mullermp Yes I was running that curl request from within a container indeed. The EC2 host type looks like a |
@grdw Are you unblocked for now? The curl request was malformed (400) and now correct (403) if you're running inside the container. Did you increase the hop limit as per the documentation? IMDSv2 expects one hop by design, and as the nature of a container, it must be routed an additional time. I'm expecting to hear more details next week when relevant people are back from holidays.. |
@grdw How slow is startup for you on the newest versions? I created an EC2 instance and created a docker container. On the instance I set the hop limit to 1.
My curl request will hang (expected)
In the container, I installed git, ruby, etc, and cloned the repo / ran bundle install. I tried using the REPL console:
The number of retries was set to 0 by default, the IMDSv2 token was not fetched, and correctly falls back to IMDSv1 within a couple seconds. I then set the hop limit to 3 and the token was correctly fetched on the first try. Are you by chance setting @btalbot For your case, you mentioned startup is not long. I think that the kube2iam package will need to remove that X-forwarded-for header -- it sounds like a permanent design decision from EC2. I will be talking with some members of EKS/EC2 on Monday - I can provide a soft update. |
Thanks for your answer. I'm not sure how and if that is configured:
I could give it a try and see if it fixes the speed problems. I'm okay if it does one extra request I guess 💭 . I'll get back to you.
Yes. This does verify my case. However in my case the retries default is set to 5 (as it reads here) |
@grdw When looking at how InstanceProfileCredentials is initialized (only two cases I could find), a value should be passed in for retries or defaulting to 0, so it shouldn't be picking 5 unless you are initializing the class yourself or perhaps a bug somewhere? When I did my testing (last comment), I printed @retries and it was 0, that was without any configuration. |
Are you initializing any clients with |
No we don't do anything out of the ordinary over there. We only use the ENV variables (i.e. |
Since https://github.com/uswitch/kiam uses the same Go-lang reverse proxy that kube2iam does (https://github.com/uswitch/kiam/blob/master/pkg/aws/metadata/server.go#L60) it will also cause any PUT token request to be rejected with the 403 response for the same X-Forwarded-For header reason. |
IMO, each of the lang sdk need to support an env variable so that deployments can enable or disable IMDSv2 on a per-deployment basis without having app developers determine (via sdk version) which version must be used. The default could be IMDSv2 -- though you are still likely to get a ton of "IMDS is broken bugs" like this one -- but as long as there is a way for deployers (not only developers) can use the still-supported IMDSv1 I think that will be acceptable. |
It's pretty trivial to reproduce assuming you have a k8s cluster with kube2iam, kiam, or similar IMDS proxy enabled. Run a new container and start a shell which has enough ruby tools to run bundler and install the aws-sdk-core. I'm using IMDSv1 sdk 3.78.0 works fast .. just a few miliseconds
Newer sdk which forces use of IMDSv2
|
Soft update. Members of the SDK teams met with EC2 and EKS today. Highlights are -
Blog posts for some of the security reasonings: https://aws.amazon.com/blogs/security/defense-in-depth-open-firewalls-reverse-proxies-ssrf-vulnerabilities-ec2-instance-metadata-service/ |
I recently upgraded to 3.85.1 . From our performance monitoring tool requests to AWS are still slow. Above all it still retries 5 times: So I released a version 1.1.28 of our app which only includes the For the 39ms these are the amount of requests to AWS (GET http://169.254.169.254): For the 7.5s request this is what happens: Again 6 PUT http://169.254.169.254 and 2 GET http://169.254.169.254 The main thing I can conclude from this is this Ruby gem still retries 5 times and than proceed to fallback to the old way. I'm not sure exactly how or why, but it's the way it is. |
@grdw I'm sorry that it's still happening. From our end, it doesn't seem like any of our defaults would be causing that. Have you tried configuring |
I'm experiencing the same delays when using We're having to downgrade to 3.78 to use our api without significant latency but that breaks downloading from s3 in this way: 3.85.1 did not help. |
If you want to pin aws-sdk-core to 3.78.0 to avoid IMDSv2, then you'll also need to pin aws-sdk-s3 to 1.57.0 to avoid #s3_use_arn_region which was introduced in aws-sdk-core after 3.78.0 |
Can confirm; the same behavior is present when I try this. |
Just FYI - 'true' must be a string. Edited my original post. https://github.com/aws/aws-sdk-ruby/blob/master/gems/aws-sdk-core/lib/aws-sdk-core/instance_profile_credentials.rb#L150. If you were using a string, that is still not working for you? |
yes, I used a string. edited my post. |
@mrlevitas It would make sense to see that error if you're relying on IMDS for credentials and haven't supplied your own. I suppose if you're all using kube2iam then it needs to actually hit the metadata service (which is proxied) so disabling isn't actually work around.. Sorry that I was wrong on that one. To be candid, I'm not sure what else I can really do. EC2 has mandated that we can't disable IMDSv2 only. EKS team will be owning the task of adding support to these 3rd party libraries. I think the best move might be to pin your version. I'm not going to ask you to move to EKS (that's not my intention and you should be free to use whatever implementations you want) but it would work more seamlessly. Since @btalbot seems to be the resident expert here, would removing the X-forwarded-for headers in kube2iam be a solution? It sounded like, ideally, that header should be present (and correct to use) but would that be a breaking change in any way? |
The XFF header is added by by the go-lang http proxy class and is typically added by every proxy. To remove it from the go-lang proxy would remove its use for every user of the go-lang proxy and make the IMDSv2 blocking of it pointless. If IMDSv2 is blocking XFF requests but then pushes proxy libraries to not include the header, then the blocking of requests with XFF is no longer effective. |
Issue description
Ever since moving from aws-sdk-core version 3.78.0 to 3.79.0 the way credentials are fetched have changed from:
to:
I guess that has it's reasons, but this PUT request returns a 400 at my end, which in turn causes it to retry 5 times (?). After that unsuccessful chain of events it falls back to the old way of fetching credentials. This is of course painfully slow. Is there some special magic required to not make it return a 400?
Gem name ('aws-sdk', 'aws-sdk-resources' or service gems like 'aws-sdk-s3') and its version
aws-sdk-core 3.79.0 and up
Version of Ruby, OS environment
Ruby 2.6.0
Kubernetes/Docker
Code snippets / steps to reproduce
See curl examples
The text was updated successfully, but these errors were encountered: