Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not run nova services as root #598

Merged

Conversation

gibizer
Copy link
Contributor

@gibizer gibizer commented Nov 16, 2023

@gibizer
Copy link
Contributor Author

gibizer commented Nov 16, 2023

/hold I need to test this

@gibizer gibizer changed the title Do not run nova services as root Do not run placement service as root Nov 16, 2023
@gibizer gibizer changed the title Do not run placement service as root Do not run nova services as root Nov 16, 2023
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/0a07729177b944599c3cca2a74e37ebd

✔️ nova-operator-content-provider SUCCESS in 1h 32m 07s
nova-operator-kuttl FAILURE in 44m 27s
nova-operator-tempest-multinode FAILURE in 1h 11m 56s

@gibizer
Copy link
Contributor Author

gibizer commented Nov 16, 2023

db sync job fails but must gather does not collect the job's log :/ I need to try this with a local build

@gibizer
Copy link
Contributor Author

gibizer commented Nov 20, 2023

@SeanMooney you mentioned that kolla set config needs sudo and that the sudo is managed by rootwrap in the k8s nova images.

❯ oc exec -t nova-cell0-conductor-0 -- cat /etc/sudoers.d/nova
Defaults:nova !requiretty

nova ALL = (root) NOPASSWD: /usr/bin/nova-rootwrap /etc/nova/rootwrap.conf *
nova ALL = (root) NOPASSWD: /usr/bin/privsep-helper *
❯ oc exec -t nova-cell0-conductor-0 -- grep filters_path /etc/nova/rootwrap.conf
filters_path=/etc/nova/rootwrap.d,/usr/share/nova/rootwrap
❯ oc exec -t nova-cell0-conductor-0 -- ls /etc/nova/rootwrap.d /usr/share/nova/rootwrap
ls: cannot access '/etc/nova/rootwrap.d': No such file or directory
ls: cannot access '/usr/share/nova/rootwrap': No such file or directory
command terminated with exit code 2

It seems to me that the sudo config is incomplete as we have no filters defined. For dbsync we directly call kolla_set_configs at
https://github.com/openstack-k8s-operators/nova-operator/blob/main/pkg/novaconductor/dbsync.go#L33
For the service containers we call /usr/local/bin/kolla_set_configs && /usr/local/bin/kolla_start

If I understand correctly kolla_set_config needs sudo. How should we provide it in this env?

@gibizer
Copy link
Contributor Author

gibizer commented Nov 20, 2023

@SeanMooney you mentioned that kolla set config needs sudo and that the sudo is managed by rootwrap in the k8s nova images.

❯ oc exec -t nova-cell0-conductor-0 -- cat /etc/sudoers.d/nova
Defaults:nova !requiretty

nova ALL = (root) NOPASSWD: /usr/bin/nova-rootwrap /etc/nova/rootwrap.conf *
nova ALL = (root) NOPASSWD: /usr/bin/privsep-helper *
❯ oc exec -t nova-cell0-conductor-0 -- grep filters_path /etc/nova/rootwrap.conf
filters_path=/etc/nova/rootwrap.d,/usr/share/nova/rootwrap
❯ oc exec -t nova-cell0-conductor-0 -- ls /etc/nova/rootwrap.d /usr/share/nova/rootwrap
ls: cannot access '/etc/nova/rootwrap.d': No such file or directory
ls: cannot access '/usr/share/nova/rootwrap': No such file or directory
command terminated with exit code 2

It seems to me that the sudo config is incomplete as we have no filters defined. For dbsync we directly call kolla_set_configs at https://github.com/openstack-k8s-operators/nova-operator/blob/main/pkg/novaconductor/dbsync.go#L33 For the service containers we call /usr/local/bin/kolla_set_configs && /usr/local/bin/kolla_start

If I understand correctly kolla_set_config needs sudo. How should we provide it in this env?

Ahh there is an extra line the sudoers file:

# anyone in the kolla group may run /usr/local/bin/kolla_set_configs as the
# root user via sudo without password confirmation
%kolla ALL=(root) NOPASSWD: /usr/local/bin/kolla*

that suggest that we only need a user in the kolla group for the kolla commands to work.

And nova is in the kolla group:

sh-5.1# cat /etc/group | grep kolla
kolla:x:42400:nova

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/nova-operator for 598,c4dc4cc0493d5ce118bc80c37b6ca09edf71c65f

@gibizer
Copy link
Contributor Author

gibizer commented Nov 20, 2023

So lets try to switch from root(0) to nova(42436) in the pods

@gibizer
Copy link
Contributor Author

gibizer commented Nov 20, 2023

/test functional

@SeanMooney
Copy link
Contributor

so this is a bug we should never call kolla_set_configs directly

we have two options

cellDBSyncCommand = "sudo /usr/local/bin/kolla_set_configs && /bin/sh -c /var/lib/openstack/bin/dbsync.sh"

as you said

or

cellDBSyncCommand = "/usr/local/bin/kolla_start" and provide a kolla config.json file with
"/bin/sh -c /var/lib/openstack/bin/dbsync.sh" as the command for kolla start to run after it has set up the config.

that would be my preference.

i.e. instead of bypassing the kolla image API we should correctly provide the kolla config.json to the job container and specify the command as normal in that.

@gibizer
Copy link
Contributor Author

gibizer commented Nov 20, 2023

so this is a bug we should never call kolla_set_configs directly

we have two options

cellDBSyncCommand = "sudo /usr/local/bin/kolla_set_configs && /bin/sh -c /var/lib/openstack/bin/dbsync.sh"

as you said

or

cellDBSyncCommand = "/usr/local/bin/kolla_start" and provide a kolla config.json file with "/bin/sh -c /var/lib/openstack/bin/dbsync.sh" as the command for kolla start to run after it has set up the config.

that would be my preference.

i.e. instead of bypassing the kolla image API we should correctly provide the kolla config.json to the job container and specify the command as normal in that.

Thanks. I will rework this.

FYI, we have an even worse situation in placement-operator openstack-k8s-operators/placement-operator#107 (comment) as I think the placement image itself is broken (placement user is no in the kolla group)

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/8ac4e9b28fff4df89503fdc8fb772371

✔️ nova-operator-content-provider SUCCESS in 57m 20s
nova-operator-kuttl FAILURE in 40m 28s
nova-operator-tempest-multinode RETRY_LIMIT in 5s

@gibizer
Copy link
Contributor Author

gibizer commented Nov 20, 2023

This is now using sudo for kolla_set_config just to see if that make the rest of the pieces work. If so I can add another commit to use kolla config files to execute the job commands.

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/34595c044c1c4e929b1bead7576f47fb

nova-operator-content-provider NODE_FAILURE Node request 200-0006654352 failed in 0s
⚠️ nova-operator-kuttl SKIPPED Skipped due to failed job nova-operator-content-provider
⚠️ nova-operator-tempest-multinode SKIPPED Skipped due to failed job nova-operator-content-provider

@gibizer
Copy link
Contributor Author

gibizer commented Nov 20, 2023

recheck

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/14de4d53311d4a6c90c60cea8e3a172e

nova-operator-content-provider NODE_FAILURE Node request 200-0006654921 failed in 0s
⚠️ nova-operator-kuttl SKIPPED Skipped due to failed job nova-operator-content-provider
⚠️ nova-operator-tempest-multinode SKIPPED Skipped due to failed job nova-operator-content-provider

@gibizer
Copy link
Contributor Author

gibizer commented Nov 21, 2023

recheck zuul nodepool should be back in business

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/bc20599914454911845ad66d2485539e

✔️ nova-operator-content-provider SUCCESS in 1h 25m 42s
nova-operator-kuttl FAILURE in 40m 24s
nova-operator-tempest-multinode FAILURE in 1h 07m 10s

@gibizer
Copy link
Contributor Author

gibizer commented Nov 21, 2023

Still no success even with sudo. And must gather still not provide job logs (and the related pod is deleted) so I need to go back and deploy it locally to see the error.

@gibizer
Copy link
Contributor Author

gibizer commented Nov 21, 2023

/test functional

@gibizer
Copy link
Contributor Author

gibizer commented Nov 22, 2023

/unhold

Copy link
Contributor

@SeanMooney SeanMooney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im +1.5 on this im just wondering if we really need to change the script mode or not

pkg/nova/volumes.go Outdated Show resolved Hide resolved
templates/nova-manage/config/cell-mapping-config.json Outdated Show resolved Hide resolved
templates/novametadata/config/httpd.conf Show resolved Hide resolved
@bogdando
Copy link
Contributor

lgtm

Now each podified service runs as nova user. This needed
a bit of change how we run our jobs. Now the scripts of the job is
moved by kolla and started via kolla_start.

As apache is also running with the nova user we needed to set some
permissions for the api containers. At the same time we make sure that
apache logs everything to its stdout / stderr to make the logs visible.

Implements: https://issues.redhat.com/browse/OSPRH-1374
Copy link
Contributor

openshift-ci bot commented Nov 22, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gibizer, SeanMooney

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit d644726 into openstack-k8s-operators:main Nov 22, 2023
2 checks passed
@gibizer
Copy link
Contributor Author

gibizer commented Nov 23, 2023

Setting scc to nonroot-v2 (or nonroot) does not work with kolla + sudo. It seem sudo needs uid 0 to exists

❯ oc logs nova-cell0-conductor-db-sync-bbtdl
+ sudo -E kolla_set_configs
sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the 'nosuid' option set or an NFS file system without root privileges?

@bogdando
Copy link
Contributor

Setting scc to nonroot-v2 (or nonroot) does not work with kolla + sudo. It seem sudo needs uid 0 to exists

❯ oc logs nova-cell0-conductor-db-sync-bbtdl
+ sudo -E kolla_set_configs
sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the 'nosuid' option set or an NFS file system without root privileges?

To have a user with uid 0 mapped to some (nova) id user in the host, usernsmappings should work for cri-o in openshift the similar thay it works for podman, see cri-o/cri-o#5294

@gibizer
Copy link
Contributor Author

gibizer commented Nov 24, 2023

Setting scc to nonroot-v2 (or nonroot) does not work with kolla + sudo. It seem sudo needs uid 0 to exists

❯ oc logs nova-cell0-conductor-db-sync-bbtdl
+ sudo -E kolla_set_configs
sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the 'nosuid' option set or an NFS file system without root privileges?

To have a user with uid 0 mapped to some (nova) id user in the host, usernsmappings should work for cri-o in openshift the similar thay it works for podman, see cri-o/cri-o#5294

Tried the manual mapping in #605 it does not seem to work.

@bogdando
Copy link
Contributor

bogdando commented Nov 24, 2023

Tried the manual mapping in #605 it does not seem to work.

there is still something to research in that regard, see #605 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants