Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DESIGN][Agent] Minimizing Elastic-Agent privileges #147

Closed
4 tasks
andrewvc opened this issue Sep 14, 2021 · 24 comments
Closed
4 tasks

[DESIGN][Agent] Minimizing Elastic-Agent privileges #147

andrewvc opened this issue Sep 14, 2021 · 24 comments
Labels
discuss enhancement New feature or request Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team V2-Architecture v8.3.0

Comments

@andrewvc
Copy link
Contributor

andrewvc commented Sep 14, 2021

Action plan after meeting today with @blakerouse @fntlnz and @justinkambic

There are three use cases for elastic-agent with different security requirements, where we can have three different behaviors.

For docker containers specifically, we need a clear path to running as non-root for two reasons:

  1. It will be flagged by many orgs as insecure,
  2. Some software (synthetics) cannot run as root, so we need consistent guidance, today we need to advise people to run as different users for different use cases.

New Behavior by Use Case

Install command on local machine

  1. Keep running as root
  2. Individual beats can downgrade privileges / setuid as needed (see [Heartbeat] Setuid to regular user / lower capabilities when possible beats#27878 which does this in just heartbeat as an example)

Run in docker with docker run

  1. No need to run as root because we don't run elastic endpoint security, we should recommend running as elastic-agent
  2. We will need to use setcap to add privileges to the elastic-agent binary
  3. Individual beats should downgrade privileges via setcap as needed
  4. If you want to run endpoint then you'll need to run a separate container with
docker run --network agent elastic-agent
docker run --network agent --privileged elastic-endpoint

Run in kubernetes

  1. Run a pod for agent that contains an unprivileged container for elastic-agent, and a privileged container for elastic-endpoint

Tasks:

  • Elastic-agent docs updated to recommend running as regular user
  • Use setcap in elastic-agent docker container to add all required capabilities as inheritable so subprocesses can use privs
  • Modify individual beats to setuid / setcap/ downgrade for the local machine use case
    • Use setcap in subprocesses in container to drop unneeded privileges
@elasticmachine
Copy link
Contributor

Pinging @elastic/agent (Team:Agent)

@andrewvc
Copy link
Contributor Author

I believe that in a k8s environment hostPath volumes still present a problem. See elastic/beats#19600 . @jsoriano can you add your thoughts here?

@jsoriano
Copy link
Member

jsoriano commented Sep 15, 2021

  1. No need to run as root because we don't run elastic endpoint security

Is this issue focused on Uptime?

In any case I think this is a risky assumption, a user of Elastic Agent for any use-case may decide to install a different integration in the future that may need further privileges, if they do, they will probably find weird failures, and they will end up having to replace their installation of Elastic Agent, or run multiple of them, what may undermine the user experience intended with Agent/Fleet.

The default experience should assume that Agent can run any integration. As a process supervisor, it should be understandable that its default is running full privileged. There can be options to run with less privileges, and we should document them, but we have to think on this as unified user experience, considering what happens if a user associates a policy with an agent that doesn't have privileges to run it.

4. If you want to run endpoint then you'll need to run a separate container with

docker run --network agent elastic-agent
docker run --network agent --privileged elastic-endpoint

This would also undermine the experience intended with Agent/Fleet. What is the benefit of this new experience if you still need to run agents individually?

Individual beats should downgrade privileges via setcap as needed

Modify individual beats to setuid / setcap/ downgrade for the local machine use case

I consider this a good practice for any application, but I think it'd be better if we don't rely on this to ensure the minimum privileges principle. I would propose a security model where Elastic Agent has the control of the privileges of the processes executed. The main reasons for that:

  • Elastic Agent may execute processes of different nature, what will require different implementations for capabilities management, what is error-prone. Think that Agent already runs Beats and Endpoint, and may run other different collectors in the future. Running all of them with full privileges, trusting that they will do the right thing after that is a risk.
  • This is a common practice (docker and other container runtimes run containers by default with a reduced set of capabilities, execution can be tuned to increase privileges, systemd and other service supervisors have features to control the capabilities of the services they run...).
  • This can allow in the future to decide the capabilities required per enabled integration, for example metricbeat with the system module enabled is executed with more capabilities than metricbeat monitoring only a remote apache.
  • As well as controlling privileges, it could also run collectors as different users, solving the mentioned problem with synthetics.
  • When running with reduced privileges, Elastic Agent may inform Fleet of its capabilities so it can give feedback to the user about the available options to run more privileged integrations. Or it can reject the execution of a policy if it doesn't have enough privileges, providing meaningful guidance to the user at the moment of trying to associate the policy (instead of blindly running it till something fails, and then having to investigate through logs and so on).

This model would be based on:

  • Elastic Agent runs any collector by default with a reduced set of capabilities.
  • Any collector (or integration in the future?) may override these defaults with configuration in their spec.
  • As a good practice, collectors may still further downgrade their privileges if wanted, but not required.

I believe that in a k8s environment hostPath volumes still present a problem. See elastic/beats#19600 . @jsoriano can you add your thoughts here?

In some restricted k8s environments hostPath cannot be used. This is a problem with use cases where you want to persist state between executions or after upgrades. This is specially important for filebeat, probably not so much for heartbeat. Solutions for this are not straight-forward, they will depend on the available volume providers in the environment.

@andrewvc
Copy link
Contributor Author

All good points @jsoriano, however, one concern @joshbressers has had is that users may be reluctant or unable to run the docker container as root, esp. in large environments with strict security policies. I'd argue that elastic-agent is less akin to systemd or another "process supervisor" in that context, it's simply the user app to be run.

WRT how the processes are invoked, I agree it'd be nice to have elastic-agent do it instead of the processes themselves. Another model could be just using the setcap command to set capabilities on the filesystem for the respective binaries, we could do that at build time if elastic/beats#27651 were implemented.

@jsoriano
Copy link
Member

I'd argue that elastic-agent is less akin to systemd or another "process supervisor" in that context, it's simply the user app to be run.

Yes, you are right, Agent being a process supervisor is an implementation detail, nothing that a user can see as a reason to have more privileges.
Still, I think we have to count with users configuring integrations that require more privileges than the ones given to the Agents.

Another model could be just using the setcap command to set capabilities on the filesystem for the respective binaries, we could do that at build time if elastic/beats#27651 were implemented.

Yes, this could be a good idea in any case.

@andrewvc
Copy link
Contributor Author

I think for now, given the valid concerns @jsoriano has raised, let's proceed with merging elastic/beats#27878 , and postpone future work for now. That solves the use cases we need on our team, and we probably don't have the bandwidth for a larger scale fix at this point.

@marclop
Copy link
Contributor

marclop commented Nov 29, 2021

I'm taking a look at having the apm-server not run as root when the elastic-agent is run as root and what our options are. We seem to have decided to not manage the user/group for binaries that are run by the elastic-agent and have the beats themselves change their user/group and set capabilities.

I would like us to revisit that decision, ideally allowing beats to specify which user:group they would like to be run as, instead of requiring each individual beat to implement the logic that heartbeat currently has to change its user:group and optionally set specific capabilities.

Ideally, the elastic-agent should allow beats to specify the user:group that it should be run as, as well as any additional capabilities that the beat requires in order to run successfully:

name: APM-Server
cmd: apm-server
artifact: apm-server
...
user: elastic-agent
group: elastic-agent
# APM server doesn't require any additional capabilities, but they could be specified as:
# linux_capabilities: 'cap_net_raw+ep'

Another option would be to recommend that the elastic-agent be run with an unprivileged user, I see the issue has a bullet point to update the documentation to recommend elastic-agent be run with an unprivileged user, are here any blockers to update the docs / references to recommend using a regular user?

@jsoriano jsoriano added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Nov 29, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@jsoriano
Copy link
Member

We seem to have decided to not manage the user/group for binaries that are run by the elastic-agent and have the beats themselves change their user/group and set capabilities.

I am not sure if there has been an active decision on this after this issue was opened. This is only the way it currently works.

I would like us to revisit that decision, ideally allowing beats to specify which user:group they would like to be run as, instead of requiring each individual beat to implement the logic that heartbeat currently has to change its user:group and optionally set specific capabilities.

+1 to this, this would be in line of my proposal in #147, where Elastic Agent controls the privileges, based on info given on each collector spec. I don't think that an approach like this one has been discarded, only that it would need more work.

@jlind23
Copy link
Contributor

jlind23 commented Dec 3, 2021

@ruflin seems to be a requirement to consider for the V2 design you are doing.

@ruflin
Copy link
Member

ruflin commented Dec 6, 2021

@jlind23 I added a note to the design doc to dig into it.

@eedugon
Copy link
Contributor

eedugon commented Jan 27, 2022

@jsoriano , please take in mind that the current elastic-agent docker image (7.16.2) is adding the elastic-agent user to the root group and the main directory (elastic-agent) is owned by root:root without permissions to anyone.

In platforms like azure containers our image doesn't work at all because of security restrictions (elastic-agent user will NOT belong to root group hence it won't have permissions to see any of the content of the elastic-agent directory).

The following small change solves the problem:

FROM docker.elastic.co/beats/elastic-agent:7.16.2
USER root
RUN chown -R :elastic-agent /usr/share/elastic-agent
USER elastic-agent

The previous just changes the group ownership of the elastic-agent directory and all its content to the elastic-agent group. Then, in the hypothetical case of the elastic-agent user not belonging to root group at least it will have access to the content of the directory to run the agent.

At the moment we are not running as root but adding the non-root user to root group, which looks weird.

@jlind23
Copy link
Contributor

jlind23 commented Jan 27, 2022

@ph This is something we may consider to avoid having issues on cloud container solutions such as azure containers..

@jsoriano
Copy link
Member

@eedugon these changes to add files and users to the root user group were done in the context of supporting OpenShift guidelines, you can read more about this in elastic/beats#12905 (reverted and reapplied in elastic/beats#18873).

If we change this to support Azure, we have to check that we keep supporting these OpensShift guidelines.

@jlind23
Copy link
Contributor

jlind23 commented Jan 27, 2022

@blakerouse @ruflin what is your opinion here? Any particular path we should take?

@ph
Copy link
Contributor

ph commented Jan 27, 2022

If I understand the guideline, making that change will be incompatible with openshift.

For an image to support running as an arbitrary user, directories and files that are written to by processes in the image must be owned by the root group and be read/writable by that group. Files to be executed must also have group execute permissions.

From: https://docs.openshift.com/container-platform/4.9/openshift_images/create-images.html

@eedugon
Copy link
Contributor

eedugon commented Jan 27, 2022 via email

@eedugon
Copy link
Contributor

eedugon commented Jan 27, 2022 via email

@jsoriano
Copy link
Member

jsoriano commented Jan 28, 2022

Because the container user is always a member of the root group, the
container user can read and write these files.

I don’t know if that’s generic on Linux dockers or it’s just an openshift proposal, just looked weird from sysadmin and security point of view.

Yes, this seems to be the case for containers started with Docker with arbitrary uids:

$ docker run -it --rm -u 1000 ubuntu:20.04 id
uid=1000 gid=0(root) groups=0(root)
$ docker run -it --rm -u 1000 alpine id
uid=1000 gid=0(root)

And yes, this effectively allows to access (mounted) host files with permissions for the root (0) group.

What I think that OpenShift additionaly does is to use user namespacing, this way the id 0 in the container belongs to a random unprivileged user and group in the host. (Update, more info about this: https://cloud.redhat.com/blog/a-guide-to-openshift-and-uids, https://cookbook.openshift.org/users-and-role-based-access-control/why-do-my-applications-run-as-a-random-user-id.html)

@ph
Copy link
Contributor

ph commented Jan 28, 2022

@jsoriano Is that correct to believe that we might need to have a different docker images for the azure case?

@jsoriano
Copy link
Member

@jsoriano Is that correct to believe that we might need to have a different docker images for the azure case?

Yes, it may be possible that we need an specific image for Azure if their runtime is different enough. We would need to investigate a bit more.

@jlind23
Copy link
Contributor

jlind23 commented Feb 4, 2022

@ph first thing to do will be to have a single config running on both openshift and azure container, and if it's not working then we should consider shipping a specific azure image which i will definitely try to avoid.
Something we should investigate in one of our coming release.

@jlind23 jlind23 transferred this issue from elastic/beats Mar 7, 2022
@jlind23 jlind23 changed the title [Agent] Minimizing Elastic-Agent privileges [Design][Agent] Minimizing Elastic-Agent privileges Mar 16, 2022
@jlind23 jlind23 changed the title [Design][Agent] Minimizing Elastic-Agent privileges [DESIGN][Agent] Minimizing Elastic-Agent privileges Mar 16, 2022
@nicpenning
Copy link
Contributor

nicpenning commented May 22, 2023

Is this FR / issue still alive?

As a user, I would like to be able to set which user context each integration executes as.

For example, we can run Filebeat today as a service on Windows with a specific user to access files and folders that cannot be accessed by system. This is a slight blocker for us to migrate a few different integrations.

A work around is deploying an agent locally to said systems but we would prefer to use network mapped drives (even though discouraged, this works very well) to reduce overhead on the servers themselves and have less agents to manage.

Also, it's best to have reduced permissions anyways, especially when you are simply reading log files and forwarding them on to another resource.

Please do let me know if this concept is worth considering here or a new issue/FR makes sense.

Thanks!

@jlind23
Copy link
Contributor

jlind23 commented May 27, 2024

Elastic Agent can now be run as non root on Linux, Mac and Windows hence closing this as done.
cc @ycombinator @nimarezainia

@jlind23 jlind23 closed this as completed May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss enhancement New feature or request Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team V2-Architecture v8.3.0
Projects
None yet
Development

No branches or pull requests

10 participants