Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pubsub_zmq aborts when running within a container #658

Closed
rlenferink opened this issue Sep 25, 2023 · 2 comments · Fixed by #703
Closed

pubsub_zmq aborts when running within a container #658

rlenferink opened this issue Sep 25, 2023 · 2 comments · Fixed by #703
Labels
component/pubsub Categorizes an issue or PR as relevant to pubsub. kind/bug Categorizes issue or PR as related to a bug.

Comments

@rlenferink
Copy link
Member

The pubsub_zmq tests fail (SEGV) when running within a container. This is due to the user in the container possibly being the root user (uid = 0), which makes this check succeed:

//NOTE. ZMQ will abort when performing a sched_setscheduler without permission.
//As result permission has to be checked first.
//TODO update this to use cap_get_pid and cap-get_flag instead of check user is root (note adds dep to -lcap)
bool gotPermission = false;
if (getuid() == 0) {
gotPermission = true;
}

The gotPermission is later on used to determine whether the scheduling priority can be set:

zmq_ctx_set(receiver->zmqCtx, ZMQ_THREAD_PRIORITY, (int) prio);

When this is called with the user root within a container (uid 0), but the user outside the container being a rootless user, the tests segfault (unable to call pthread_setschedparam).

This is the line where libzmq in the end crashes:

https://github.com/zeromq/libzmq/blob/4097855ddaaa65ed7b5e8cb86d143842a594eebd/src/thread.cpp#L345

libzmq doesn't handle this too nicely and I am not sure whether this can be solved.

I tried with the suggest libcap and after that simply falling back to using the capsh command, but there the cap_sys_nice can be set:

root@fedora:/home/rlenferink/workspace/asf/celix/celix-container# capsh --print
Current: =ep
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,cap_perfmon,cap_bpf,cap_checkpoint_restore

Any suggestions to solve this?

@rlenferink rlenferink added kind/bug Categorizes issue or PR as related to a bug. component/pubsub Categorizes an issue or PR as relevant to pubsub. labels Sep 25, 2023
@pnoltes
Copy link
Contributor

pnoltes commented Sep 25, 2023

I would like to drop support for PubSub bundles for Apache Celix 3.0.0 and if we do that, IMO this does not need to be solved.

If we would like to keep the PubSub bundles, I think the best solution is only set ZMQ_THREAD_PRIORITY or ZMQ_THREAD_SCHED_POLICY if this is explicitly enabled through a config property.

@PengZheng
Copy link
Contributor

PengZheng commented Sep 26, 2023

It is said by the documentation that the host machine's kernel should be configured properly(CONFIG_RT_GROUP_SCHED): https://docs.docker.com/config/containers/resource_constraints/#configure-the-realtime-scheduler
And my local Ubuntu does not support this.

PubSub correctly provides configuration options for this. It seems to me a pure testing configuration issue: an additional CMake option like RUN_IN_CONTAINER(and corresponding Conan option) should be enough to control these tests to use another set of *.properties.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/pubsub Categorizes an issue or PR as relevant to pubsub. kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants