Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Heartbeat] Changes to default beats directories prevent startup #2315

Closed
emilioalvap opened this issue Feb 23, 2023 · 5 comments · Fixed by #2330
Closed

[Heartbeat] Changes to default beats directories prevent startup #2315

emilioalvap opened this issue Feb 23, 2023 · 5 comments · Fixed by #2330
Assignees
Labels

Comments

@emilioalvap
Copy link
Contributor

  • Version: > 8.4.3
  • Operating System: Docker

Summary

Changes to default permission levels on default beat directories prevent heartbeat startup on docker container when running as root

This is a regression of #202 and #368.

How to repro

  • Set up a synthetics private location and add a monitor.
  • Enroll elastic-agent on the private location's policy via docker/kubernetes, running as root.
  • Wait for heartbeat/synthetics-* component to report as unhealthy.
{"log.level":"error","@timestamp":"2023-02-23T17:07:53.260Z","message":"Exiting: could not start the HTTP server for the API: listen unix /usr/share/elastic-agent/state/data/tmp/synthetics-http-default.sock: bind: permission denied","component":{"binary":"heartbeat","dataset":"elastic_agent.heartbeat","id":"synthetics/http-default","type":"synthetics/http"},"log":{"source":"synthetics/http-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-02-23T17:07:53.366Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":857},"message":"Component state changed synthetics/http-default (STARTING->FAILED): Failed: pid '943' exited with code '1'","log":{"source":"elastic-agent"},"component":{"id":"synthetics/http-default","state":"FAILED","old_state":"STARTING"},"ecs.version":"1.6.0"}

Comparing default directory perms betwen 8.4.3 and 8.6.2, permission changed from 775 to 755 in some dest:

//8.4.3
root@08b6fec352b0:/usr/share/elastic-agent# ls -al state/data/{run,logs,tmp}/ | grep default
drwxrwx--- 2 root root 4096 Feb 23 21:15 default
drwxrwx--- 7 root root 4096 Feb 23 21:15 default
drwxrwx--- 5 root root 4096 Feb 23 21:15 default
// 8.6.2
root@80c40e9af141:/usr/share/elastic-agent# ls -al state/data/{run,logs,tmp}/ | grep " \.$"
drwxr-x--- 2 root root    4096 Feb 23 20:28 .
drwxrwx--- 8 root root 4096 Feb 23 20:28 .
drwxr-x--- 2 root root 4096 Feb 23 20:28 .
@emilioalvap
Copy link
Contributor Author

cc @andrewvc

@cmacknz
Copy link
Member

cmacknz commented Feb 24, 2023

I am trying to read through all the history here to figure out what is different about Heartbeat that it doesn't work in this configuration.

If I start a docker container I see every other input can successfully bind to their HTTP unix sockets:

elastic-agent@befff0407eb1:~$ ls -l state/data/tmp/
total 0
srwxr----- 1 elastic-agent elastic-agent 0 Feb 24 14:43 beat-metrics-monitoring.sock
srwx------ 1 elastic-agent elastic-agent 0 Feb 24 14:43 elastic-agent-control.sock
srwxr----- 1 elastic-agent elastic-agent 0 Feb 24 14:43 filestream-monitoring.sock
srwxr----- 1 elastic-agent elastic-agent 0 Feb 24 14:43 http-metrics-monitoring.sock
srwxr----- 1 elastic-agent elastic-agent 0 Feb 24 14:43 log-default.sock
srwxr----- 1 elastic-agent elastic-agent 0 Feb 24 14:43 system-metrics-default.sock

We explicitly create the run directory for each process as 755, this is the working directory for the executable:

// differentiate data paths
dataPath := filepath.Join(paths.Home(), "run", c.current.ID)
_ = os.MkdirAll(dataPath, 0755)
args = append(args, "-E", "path.data="+dataPath)

@cmacknz
Copy link
Member

cmacknz commented Feb 24, 2023

Looking at elastic/beats#30869 I think this is the explanation:

When elastic-agent is executed as root, all beats (filebeat, metricbeat, ...) run as root and heartbeat will run as user specified on BEAT_SETUID_AS env variable, elastic-agent by default. This user needs permissions to write to local directories and we enable that by making the user belong to root group. But some of the directories that the user need access to, that are created during runtime, do not allow for group write permission, meaning heartbeat won't be able to start and it will eventually report as degraded:

@emilioalvap
Copy link
Contributor Author

emilioalvap commented Feb 24, 2023

Looking at elastic/beats#30869 I think this is the explanation:

@cmacknz Yes, that's correct. default directory used to have /default created with 770 where heartbeat could make use of group permissions.

@cmacknz
Copy link
Member

cmacknz commented Feb 24, 2023

Thanks, I think the only fix is to change the permissions back to what they were then. In general we default to the most restrictive permission that seem reasonable, but we didn't consider this.

In the future we may want to allow indicating what permissions a binary needs directly in the spec file to avoid this bug and having to set less restrictive permissions for directories that don't actually need it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants