-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get Instance Meta Data Call Has Become Very Slow After Moving from IMDBv1 to IMDBv2 on RHEL/SELinux instance #3864
Comments
We have noticed this issue has not received attention in 1 year. We will close this issue for now. If you think this is in error, please feel free to comment and reopen the issue. |
Hi, is this still persisting with the newest version of SDK? |
closing this for no response. Please reopen if this problem is still persisting. |
|
An interesting (relevant?) article on how IMDBv2 changed the token API call to return a reply with TTL=1 in the IP header. This causes problems when an EC2 instance has an internal router (e.g., containers using NAT; maybe also SELinux) because the TTL=1 packet gets dropped. Timeouts ensue before falling back to IMDBv1, and this causes a much slower response time (like >2sec instead of <3ms). |
This issue was experienced on a client m5.large instance with RHEL/SELinux configured via CloudFormation. The slowness introduced when we upgraded from IMDBv2 to IMDBv2 was observed in several working sessions focused on merging/validating/finalizing a 3 phase Cloudformation (CFN) deployment for this instance
The CFN template that gets instance meta data went from running in under a second to approx 50 seconds each time.
We had just switched a week before from IMDBv1 to IMDBv2 and the slowness started right after the swtich.
It is not specific to GO SDK and also occurs in the aws cli.
I am posting here initially since I found this similar 2972 issue in this repo that is also not specific to GO SDK and can occur with same cli call to get instance meta data:
#2972
a sample cli command to get instance meta data is:
[ec2-user ~]$ TOKEN=
curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"
&& curl -H "X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/
from this page describing the IMDBv1 vs IMDBv2:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
A similar get instance meta data cli call on client ec2 instance eventually returns with the requested info - without triggering an HTTP or other errors , except for timeout warnings. This is very similar to the behavior described in issue 2972.
But there is another factor in the client env- The AMI that was used for the instance is a client AMI (approved by their Security team) that has SELinux installed on an m5.large instance with RHEL .
Client is not sure as to whether or not there are any actual SE policies configured in the AMI ( i will attempt to get this info and post back here over next few days). the SELinux mode of the AMI was running as "enforced" on initial start up, and for the first phase of the CFN configuration steps.
"Enforced" mode means if there are defined SE policies, they would be enforced.
The CFN templates runs in 3 phases/set of steps with a reboot in between each phase.
Prior to reboot at the end of this first phase, the SELinux mode was configured to "permissive" (which means no SE policies would be enforced - even if some did exist.)
Interestingly - Upon reboot, and during the second phase of cfn template configuration steps, the slowness went away and the curl get instance meta data calls again ran in under a second. This is why it seems that issue 2972 cause/resolution was not the same as for this case. 2972 cause is IMDBv2 changing HOPS limit default to 1 and fix was to increase it to higher number like 3.
It seems that just disabling any SELinux policies defined solved the problem.
It is unknown whether or not there may have been something else in the CFN phase 1 steps prior to the first reboot that also increased the HOPS limit, or maybe changed back to using IMDBv1 instead of IMDBv2. Again - will see if we can verify these other aspects.
I am wondering if anyone else has come across SELinux with IMDBv2 upgrade causing slowness in sdk/cli communications to get instance meta data - and hoping this issue can identify any potential specific policies that may cause this here.
Describe the bug
Get Instance Meta Data cli call slows way down(go from running in 1 second to 50 seconds) after upgrading from IMDBv1 to IMDBv2 on RHEL/SELinux m5.large instance
Version of cli:
we are installing/using the latest aws cli version on the ec2 instance where behavior is observed
To Reproduce (observed behavior)
Steps to reproduce the behavior (please share code or minimal repo)
M5.large with RHEL and SeLinux AMI for Oracle configuration
Note -will try to get more info on the policies configured for SELinux. will collect today / tomorrow/ over next week and update this.
posting in advance to see if someone might have encountered similar - SELinux slowing sdk/cli get instance meta data communications with IMDBv2 (and not IMDBv1)
Expected behavior
Without yet knowing the exact nature of any SELinux policies that may be in place, and how they may interact with IMDB changes introduced in V2 - it is hard to say specifically what expected behavior should be.
Definitely would be nice to get some SDK/cli detection messages / warnings of defined SE policies that may cause issues with SDK/Cli communications.
Will fill in more as I get more info ...
Additional context
Hoping any info gathered via this issue helps others identify why things may have slowed down dramatically on an instance with SELinux that has just been upgraded to IMDBv2.
In our CFN case, the call to get instance meta data was occurring over 60 times to get different properties for our cfn phase 1 scripts - so it slowed the process by an hour for just the first phase - making it extremely difficult to debug the cfn scripts themselves and iterate in an agile way.
The text was updated successfully, but these errors were encountered: