Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nomad: deep dive into permissions & networking #52

Open
3 tasks
noahehall opened this issue Jan 24, 2023 · 0 comments
Open
3 tasks

nomad: deep dive into permissions & networking #52

noahehall opened this issue Jan 24, 2023 · 0 comments

Comments

@noahehall
Copy link
Contributor

noahehall commented Jan 24, 2023

C

  • see perm issues: nomad: core refactor and setup #48
  • we need to fully understand nomad permissions before moving on
  • i only see this getting more frustrating as we move into more complex stacks
  • while this was a good (and necessary) dive in to the docs
    • the whole issue was around su-exec it must be run as root
      • setting user: root in the docker driver config, will start as root, but then drop privs to the docker img user
      • this is a similar setup to haproxy requiring to start as root, but run as anyone
      • i'm sure there are ways around this, but this seems to be the most straight forward
      • check ps logs to confirm that the consul agent is indeed running as the consul user
  • not quite sure wtf is going on, but it seems to be working now with user = consul
    • lol check ps logs below
    • i'm going to chalk this up as caching/config issue/just a long day, the first time we set user it failed, now its working
    • an interesting idea is to remove all images/cache/etc and start from scratch,
      • it could be that when we switched to root, it was able to execute the scripts, cached the image,
      • then when we switch back to consul the scripts didnt need to run

T

  • spike: ensure data/volumes are placed within the task working dir or one of the 3 NOMAD_POOP dirs
    • figure out wtf is the difference between the working dir and the three dirs below, its not surfaced in the docs
    • lol the working dir is just the work dir like in docker, silly me
    • NOMAD_ALLOC_DIR
    • NOMAD_TASK_DIR
    • NOMAD_SECRETS_DIR
  • spike: lifecycle prestart task to setup user and chown runtime dirs
    • this is related to not being able to specify USER consul in the docker image
    • but that may be related to how we forced the gid/uid on the consul user to match the host
  • spike: review the csi_plugins for something appropriate for validation
    • most seem relevant for cloud stores, e.g. aws ebs

A


  • core-consul perm issues
# docker compose: everything good
/consul $ ps
PID   USER     TIME  COMMAND
    1 consul    0:00 /sbin/docker-init -- ./consul.compose.boots
    7 consul    0:00 {consul.compose.} /bin/sh ./consul.compose.
    9 consul    0:00 consul agent -config-dir=/consul/config -da
   33 consul    0:00 sh
   40 consul    0:00 ps

# nomad > task > user = "root", sans volumes
# runtime drops privs to user consul, but must be run as root cuz su-exec must be run as root
/consul # ps
PID   USER     TIME  COMMAND
    1 root      0:00 /sbin/docker-init -- docker-entrypoint.sh a
    7 root      0:00 {docker-entrypoi} /usr/bin/dumb-init /bin/s
    8 consul    0:00 consul agent -data-dir=/consul/data -config
   33 root      0:00 sh
   39 root      0:00 ps

# with user = consul: dunno maybe i fixed something in the configs
$ script.exec.cunt.sh consul
OCI runtime exec failed: exec failed: unable to start container process: exec: "bash": executable file not found in $PATH: unknown

/consul $ ps
PID   USER     TIME  COMMAND
    1 consul    0:00 {consul.compose.} /bin/sh ./consul.compos
    8 consul    0:00 consul agent -auto-reload-config -config-
   32 consul    0:00 sh
   38 consul    0:00 ps
/consul $ 

# with volumes
## ==> Failed to load cert/key pair: open /run/secrets/consul_server.pem: no such file or directory
# with secrets as volumes: w00p w00p
# ^ docker secrets need to be translated to nomad secrets 


#### networking
## issue 1:  cert is valid for localhost, not ...
# likely just need to set the extra_hosts in the container

  • core-proxy perm issues
# on initial execution when all env vars are transposed from docker > nomad
# nomad > task > user = haproxy
/consul/consul.compose.bootstrap.sh: 11: cannot create /consul/config/env.token.hcl: Permission denied
/consul/consul.compose.bootstrap.sh: 31: cannot create /consul/pid.envoy: Permission denied
su: only root can specify alternative groupssu: 
only root can specify alternative groups
[NOTICE]   (14) : haproxy version is 2.7.1-3e4af0e
[NOTICE]   (14) : path to executable is /usr/local/sbin/haproxy
[WARNING]  (14) : config : [/var/lib/haproxy/configs/002-001-vault.cfg:19] : 'server lb-vault/core-vault-c-dns1' : could not resolve address 'core-vault.service.search', disabling server.
[WARNING]  (14) : config : [/var/lib/haproxy/configs/002-001-vault.cfg:20] : 'server lb-vault/core-vault-d-dns1' : could not resolve address 'core-vault', disabling server.
[ALERT]    (14) : Binding [/var/lib/haproxy/configs/000-000-global.cfg:37] for frontend GLOBAL: cannot bind UNIX socket (Permission denied) [/var/run/api.sock]
[ALERT]    (14) : [haproxy.main()] Some protocols failed to start their listeners! Exiting.

# as with consul, switch user to "root" fixed it which makes sense
# haproxy is different than consul anyway, as haproxy recommends starting as root, but running as X
root@9ffca265061c:/usr/local/etc/haproxy# ps -aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0   1136     4 ?        Ss   02:12   0:00 /sbin/docker-init -- ./haproxy.compose.boo
root           7  0.0  0.0   2616   524 ?        S    02:12   0:00 /bin/sh ./haproxy.compose.bootstrap.sh
root          11  0.0  0.0   2616    96 ?        S    02:12   0:00 /bin/sh /consul/consul.compose.bootstrap.s
root          12  0.0  0.0   4524  2672 ?        S    02:12   0:00 su -g consul - consul sh -c cd /consul/env
root          13  0.0  0.0   2616    96 ?        S    02:12   0:00 /bin/sh /consul/consul.compose.bootstrap.s
root          14  0.0  0.0   4524  2680 ?        S    02:12   0:00 su -g consul - consul sh -c consul agent -
root          15  0.0  0.0  90584  9876 ?        S    02:12   0:00 haproxy -W -db -f /var/lib/haproxy/configs
consul        17  0.0  0.0   2616   592 ?        Ss   02:12   0:00 -sh -c cd /consul/envoy && envoy -c envoy.
consul        18  0.0  0.0   2616   592 ?        Ss   02:12   0:00 -sh -c consul agent -node=core-proxy-9ffca
consul        23  0.4  0.2 811016 76212 ?        Sl   02:12   0:00 consul agent -node=core-proxy-9ffca265061c
consul        24  0.5  0.1 2420640 45916 ?       Sl   02:12   0:00 envoy -c envoy.yaml
haproxy       71  0.0  0.0 846364 13700 ?        Sl   02:12   0:00 haproxy -W -db -f /var/lib/haproxy/configs
root          93  0.0  0.0   4248  3404 pts/0    Ss   02:13   0:00 bash
root         103  0.0  0.0   5904  2792 pts/0    R+   02:14   0:00 ps -aux
@noahehall noahehall added this to nirvai Jan 24, 2023
@noahehall noahehall converted this from a draft issue Jan 24, 2023
@noahehall noahehall changed the title nomad: deep dive into permissions nomad: deep dive into permissions & networking Jan 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: THE GROOVE
Development

No branches or pull requests

1 participant