Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSM agent preinstalled #127

Closed
tecnobrat opened this issue Dec 20, 2018 · 24 comments
Closed

SSM agent preinstalled #127

tecnobrat opened this issue Dec 20, 2018 · 24 comments

Comments

@tecnobrat
Copy link

What would you like to be added:

Have the SSM agent preinstalled

Why is this needed:

All of the docs say that SSM should be preinstalled onto amazon linux 2, and since this is an Amazon Linux 2 based AMI, it should probably also have SSM installed.

@travcunn
Copy link

travcunn commented Dec 26, 2018

@tecnobrat Thanks for submitting this.

It appears that because we are using the amzn2-ami-minimal-hvm as a base image for this AMI, it does not come preinstalled with this agent. I'm going to discuss this with my team and I'll get back to you soon.

@travcunn
Copy link

At this time, the SSM agent isn't required for EKS functionality. If you need this installed, it may be best to create a custom AMI that installs and enables SSM. You may also be able to install this agent using user data scripts when the instances launch.

https://docs.aws.amazon.com/systems-manager/latest/userguide/sysman-manual-agent-install.html
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html

@toricls
Copy link

toricls commented Jan 8, 2019

@tecnobrat I found a nice fully-working daemonset example that places the SSM agent on your EKS nodes.

https://github.com/mumoshu/kube-ssm-agent

I believe this example is way more better and secure than installing the SSM agent directly into your node AMI because you can apply the daemonset only when you need it (and it also can be removable!).

Thanks @mumoshu for sharing this :D

@max-rocket-internet
Copy link
Contributor

fully-working daemonset example that places the SSM agent on your EKS nodes

Yes, a much more "kubernetes" type of solution. Nice!

@dlaidlaw
Copy link

dlaidlaw commented Jan 8, 2019

Dangerous though. Host networking, privileged access, run as root, access to /etc/sudoers.d/.

@max-rocket-internet
Copy link
Contributor

Yes, but if you install ssm-agent in the AMI then all that applies anyway. Probably more dangerous is the non-AWS produced docker image mumoshu/aws-ssm-agent:canary 😆

@toricls
Copy link

toricls commented Jan 8, 2019

agreed with @max-rocket-internet. so if I use the repo, I would create and run my own custom image by using the Dockerfile and the other files 💯

@tecnobrat
Copy link
Author

@toricls @mumoshu thanks for this! I agree running "random images off the internet is not a great idea". I'll probably take the same approach and build the image myself.

Thanks!

@h3adache
Copy link

h3adache commented Jan 9, 2019

This is an extra step that requires having to build a custom ami (with SSM) or having to remember to apply the daemon.
The docs already state that SSM is included in Amazon Linux 2.
Why make people go the extra step?

@mumoshu
Copy link

mumoshu commented Jan 9, 2019

@h3adache I don't have a strong opinion whether ssm-agent should be included or not into the EKS AMIs.

But my understanding is that AWS omits SSM agent for EKS AMIs for slightly higher security. The less the AMI contains, the more it is secure. Hope someone from AWS clarifies!

JFYI, I've inclined to go with the daemonset way for (1) easy updates of ssm-agent without write access to node's root volume (2) fine-grained control of what to allow SSM commands and sessions to read/write (3) ability to disable it whenever unnecessary which hardens your nodes and clusters (4) collecting logs for ssm-agents like other pods.

@max-rocket-internet
Copy link
Contributor

Why make people go the extra step?

Because most people don't want it in their AMI and because a daemonset is the proper way of doing this. Some other examples, similar to ssm-agent, would be things like fluentd (for sending logs), sysdig (for monitoring), node-problem-detector (fault detection) etc. All run as daemonsets 🙂

Another advantage to using a daemonset is that the resources that are used by the container are accounted for by k8s, just like any other container. If the process is run on the host, outside of k8s, then it is neither accounted for or monitored.

@h3adache
Copy link

h3adache commented Jan 9, 2019

mm good points @max-rocket-internet. Can you at least fix the eks ami optimized docs to say that it's NOT based on Amazon Linux 2? :)

@max-rocket-internet
Copy link
Contributor

Can you at least fix the eks ami optimized docs to say that it's NOT based on Amazon Linux 2? :)

Feel free to make a PR for that 😆

@micahhausler
Copy link
Member

From @mumoshu

But my understanding is that AWS omits SSM agent for EKS AMIs for slightly higher security. The less the AMI contains, the more it is secure. Hope someone from AWS clarifies!

This is exactly right. While we're not by any means implying that the SSM agent is insecure, all code has the potential for bugs and thus is a liability. Since EKS doesn't need the SSM agent to operate, we choose not to install it in our AMI.

From @h3adache

mm good points @max-rocket-internet. Can you at least fix the eks ami optimized docs to say that it's NOT based on Amazon Linux 2? :)

The EKS AMI is using Amazon Linux 2, but its the AL2 minimal image which doesn't include SSM. We'll update our documentation to indicate this.

We will be leaving the SSM agent out of the EKS AMI for now. You don't have to build a custom AMI to install SSM, you could do it in the user-data script, or as specified above use a DaemonSet (and build/host your own container image for enhanced security).

@millermatt
Copy link

Counterpoint to the arguments for using a daemonset on demand:

We're occasionally seeing nodes lose communication with the control plane and become NotReady. In this case there is no way to spin up a daemonset on the node to investigate issues. You'd want the SSM already running on the instance.

@max-rocket-internet
Copy link
Contributor

In this case there is no way to spin up a daemonset on the node to investigate issues. You'd want the SSM already running on the instance.

Then pass your own user-data to install and run SSM agent. Simple!

@whereisaaron
Copy link

You don't have to only create a DaemonSet when you need it @millermatt you can just leave the DaemonSet in place and nodes will always be running SSM. DaemonSets automatically tolerate NotReady, so they will keep running for your diagnostics. And it has all the advantages that @mumoshu points out.

BTW good discussion on #79 about intermittent NotReady @millermatt and some improvements to avoid it in more recent AMI. If you haven't checked that out, or updated you nodes, might be worth a look.

@millermatt
Copy link

millermatt commented Mar 19, 2019

@max-rocket-internet You are correct! That's what we're doing. I was stating one reason not to do on-demand SSM agent deploys for others considering it as an option.

@whereisaaron Thanks for the link! We use Terraform, so I'll start specifically enabling EBS Optimized instances in the launch config and go from there.

@myoung34
Copy link

myoung34 commented Jan 6, 2021

our TAM's solution to EKS not having the ssm agent was to use this as a daemonset: https://github.com/mumoshu/kube-ssm-agent

The problem? It wont work for inventory, patch sets, or documents

why? because running those runs on the container

so hardening via ssm document? hardens the container. not the host
inventory? inventory of the container
patch manager? patches the container

My issue with user-data:

We have a mixture of worker groups and node groups, so thats double the user-data
Node groups use launch templates which isnt an append to the user-data, but a replacement so we'd have to maintain the upstream user-data and pull in any future changes by hoping we catch them

Additionally, the CA is injected into the node groups user-data.
To maintain this without hardcoding the cluster CA into all the user-data's we'd have to use the AWS API, which means adding the aws-cli package and new IAM permissions to every single box in order to reflect on the cluster by tag and get the CA dynamically. My concern here is rate limiting on that API call at scale.

Next-ly: node groups need additional kubelet args in order to go healthy as Node groups. So having a single userdata isnt possible, youll need one user-data per node group it seems unless (again) you do boot-time reflection and hopefully not blow up the rate limits

My issue with forking the AMI:

That requires a pipeline process we just dont have in place, and would have to do for a single RPM package.
That pipeline would have to publish the AMI per account per region, which is costly and cumbersome.

If were talking about keeping the AMI clean, lets discuss the vim-minmal, screen, sqlite and other packages that are in the AMI that arent needed for basic k8s operations

@myoung34
Copy link

Update: I was wrong about user-data. Its still not my preferred method, but if anyone needs to do this with the terraform aws eks module:

data "cloudinit_config" "userdata" {
  gzip          = false
  base64_encode = true
  # https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html
  boundary = "==BOUNDARY=="

  part {
    content_type = "text/x-shellscript"
    content      = "yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm"
  }
}

resource "aws_launch_template" "vpn" {
... snip ...
user_data = data.cloudinit_config.userdata.rendered

then use the LT

locals {

  node_groups = {
    foo: {
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1

      iam_role_arn            = ...
      launch_template_id      = aws_launch_template.vpn.id
      launch_template_version = aws_launch_template.vpn.default_version

      subnets = [
        ...
      ]

      additional_tags = {
        ...
      }
    }
module "eks" {
  source                          = "terraform-aws-modules/eks/aws"
  node_groups                     = local.node_groups
...snip...
}

Id still prefer not doing this though.

@nitrocode
Copy link

nitrocode commented Jan 17, 2021

Due to this surprising limitation, we've successfully been using the insecure method of a key based solution. After reading @myoung34's method of adding the ssm install to the userdata, that seems be the best solution so far and we'll convert to that in the mean time.

He makes a good point. If the idea is to keep the ami "clean" then why are there preinstalled packages that aren't necessary. I don't fully understand the argument against preinstalling the ssm agent directly into the ami.

Nvm i see it on the roadmap now: aws/containers-roadmap#593

@tachang
Copy link

tachang commented Apr 13, 2021

Does the user-data override the built in user-data and thus never joins the cluster?

@kajanth-tceu
Copy link

@tachang I don't think so as we are using the 'boundary = "==BOUNDARY=="' to keep separate user added content.

@nakamume
Copy link

nakamume commented Jun 28, 2021

SSM installed by default in latest AMIs (https://github.com/awslabs/amazon-eks-ami/releases/tag/v20210621)
aws/containers-roadmap#593 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests