Skip to content
This repository has been archived by the owner on Oct 16, 2020. It is now read-only.

pthread_create priority thread returns EPERM when run as root #410

Closed
onedsc opened this issue Jul 13, 2015 · 13 comments
Closed

pthread_create priority thread returns EPERM when run as root #410

onedsc opened this issue Jul 13, 2015 · 13 comments

Comments

@onedsc
Copy link

onedsc commented Jul 13, 2015

Summary:

Setting a thread priority to anything other than zero using a scheduling policy of Round Robin fails with an EPERM when running as root.

Details:

After building the SDK and an image on an Ubuntu 14.04 LTS I tried to set the priority of a thread created as root and received an EPERM. I then pulled the source code below is directly from the man pages for pthread_setschedparam to investigate further and it too returns EPERM when run as root with the command line options -ar20 -ie. Just to make sure my SDK or image where not at fault I scp'd the program to a Rackspace machine running the Alpha release and it also failed with EPERM.

To compile: gcc -Wall pthreads_sched_test.c -lpthread -o sched_test

Usage:

Usage: ./sched_test [options]
Options are:
    -a<policy><prio> Set scheduling policy and priority in
                     thread attributes object
                     <policy> can be
                         f  SCHED_FIFO
                         r  SCHED_RR
                         o  SCHED_OTHER
    -A               Use default thread attributes object
    -i {e|i}         Set inherit scheduler attribute to
                     'explicit' or 'inherit'
    -m<policy><prio> Set scheduling policy and priority on
                     main thread before pthread_create() call

Receive an EPERM executing as root:

Set the scheduling policy to SCHED_RR (r), the priority to 20, and the inherit scheduling policy to "explicit". ./sched_test -ar20 -ie

localhost core # ./sched_test -ar20 -ie
Scheduler settings of main thread
    policy=SCHED_OTHER, priority=0

Scheduler settings in 'attr'
    policy=SCHED_RR, priority=20
    inheritsched is EXPLICIT

pthread_create: Operation not permitted

Ulimits are set to "unlimited":

localhost core # ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) unlimited
file size               (blocks, -f) unlimited
pending signals                 (-i) 3825
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) unlimited
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 3825
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Contents of /etc/os-release

core@localhost ~ $ cat /etc/os-release
NAME=CoreOS
ID=coreos
VERSION=734.0.0+2015-07-05-1552
VERSION_ID=734.0.0
BUILD_ID=2015-07-05-1552
PRETTY_NAME="CoreOS 734.0.0+2015-07-05-1552"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

Source code from pthread_setschedparam man page:

/* pthreads_sched_test.c */

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>

#define handle_error_en(en, msg) \
       do { errno = en; perror(msg); exit(EXIT_FAILURE); } while (0)

static void
usage(char *prog_name, char *msg)
{
   if (msg != NULL)
       fputs(msg, stderr);

   fprintf(stderr, "Usage: %s [options]\n", prog_name);
   fprintf(stderr, "Options are:\n");
#define fpe(msg) fprintf(stderr, "\t%s", msg);          /* Shorter */
   fpe("-a<policy><prio> Set scheduling policy and priority in\n");
   fpe("                 thread attributes object\n");
   fpe("                 <policy> can be\n");
   fpe("                     f  SCHED_FIFO\n");
   fpe("                     r  SCHED_RR\n");
   fpe("                     o  SCHED_OTHER\n");
   fpe("-A               Use default thread attributes object\n");
   fpe("-i {e|i}         Set inherit scheduler attribute to\n");
   fpe("                 'explicit' or 'inherit'\n");
   fpe("-m<policy><prio> Set scheduling policy and priority on\n");
   fpe("                 main thread before pthread_create() call\n");
   exit(EXIT_FAILURE);
}

static int
get_policy(char p, int *policy)
{
   switch (p) {
   case 'f': *policy = SCHED_FIFO;     return 1;
   case 'r': *policy = SCHED_RR;       return 1;
   case 'o': *policy = SCHED_OTHER;    return 1;
   default:  return 0;
   }
}

static void
display_sched_attr(int policy, struct sched_param *param)
{
   printf("    policy=%s, priority=%d\n",
           (policy == SCHED_FIFO)  ? "SCHED_FIFO" :
           (policy == SCHED_RR)    ? "SCHED_RR" :
           (policy == SCHED_OTHER) ? "SCHED_OTHER" :
           "???",
           param->sched_priority);
}

static void
display_thread_sched_attr(char *msg)
{
   int policy, s;
   struct sched_param param;

   s = pthread_getschedparam(pthread_self(), &policy, &param);
   if (s != 0)
       handle_error_en(s, "pthread_getschedparam");

   printf("%s\n", msg);
   display_sched_attr(policy, &param);
}

static void *
thread_start(void *arg)
{
   display_thread_sched_attr("Scheduler attributes of new thread");

   return NULL;
}

int
main(int argc, char *argv[])
{
   int s, opt, inheritsched, use_null_attrib, policy;
   pthread_t thread;
   pthread_attr_t attr;
   pthread_attr_t *attrp;
   char *attr_sched_str, *main_sched_str, *inheritsched_str;
   struct sched_param param;

   /* Process command-line options */

   use_null_attrib = 0;
   attr_sched_str = NULL;
   main_sched_str = NULL;
   inheritsched_str = NULL;

   while ((opt = getopt(argc, argv, "a:Ai:m:")) != -1) {
       switch (opt) {
       case 'a': attr_sched_str = optarg;      break;
       case 'A': use_null_attrib = 1;          break;
       case 'i': inheritsched_str = optarg;    break;
       case 'm': main_sched_str = optarg;      break;
       default:  usage(argv[0], "Unrecognized option\n");
       }
   }

   if (use_null_attrib &&
           (inheritsched_str != NULL || attr_sched_str != NULL))
       usage(argv[0], "Can't specify -A with -i or -a\n");

   /* Optionally set scheduling attributes of main thread,
      and display the attributes */

   if (main_sched_str != NULL) {
       if (!get_policy(main_sched_str[0], &policy))
           usage(argv[0], "Bad policy for main thread (-m)\n");
       param.sched_priority = strtol(&main_sched_str[1], NULL, 0);

       s = pthread_setschedparam(pthread_self(), policy, &param);
       if (s != 0)
           handle_error_en(s, "pthread_setschedparam");
   }

   display_thread_sched_attr("Scheduler settings of main thread");
   printf("\n");

   /* Initialize thread attributes object according to options */

   attrp = NULL;

   if (!use_null_attrib) {
       s = pthread_attr_init(&attr);
       if (s != 0)
           handle_error_en(s, "pthread_attr_init");
       attrp = &attr;
   }

   if (inheritsched_str != NULL) {
       if (inheritsched_str[0] == 'e')
           inheritsched = PTHREAD_EXPLICIT_SCHED;
       else if (inheritsched_str[0] == 'i')
           inheritsched = PTHREAD_INHERIT_SCHED;
       else
           usage(argv[0], "Value for -i must be 'e' or 'i'\n");

       s = pthread_attr_setinheritsched(&attr, inheritsched);
       if (s != 0)
           handle_error_en(s, "pthread_attr_setinheritsched");
   }

   if (attr_sched_str != NULL) {
       if (!get_policy(attr_sched_str[0], &policy))
           usage(argv[0],
                   "Bad policy for 'attr' (-a)\n");
       param.sched_priority = strtol(&attr_sched_str[1], NULL, 0);

       s = pthread_attr_setschedpolicy(&attr, policy);
       if (s != 0)
           handle_error_en(s, "pthread_attr_setschedpolicy");
       s = pthread_attr_setschedparam(&attr, &param);
       if (s != 0)
           handle_error_en(s, "pthread_attr_setschedparam");
   }

   /* If we initialized a thread attributes object, display
      the scheduling attributes that were set in the object */

   if (attrp != NULL) {
       s = pthread_attr_getschedparam(&attr, &param);
       if (s != 0)
           handle_error_en(s, "pthread_attr_getschedparam");
       s = pthread_attr_getschedpolicy(&attr, &policy);
       if (s != 0)
           handle_error_en(s, "pthread_attr_getschedpolicy");

       printf("Scheduler settings in 'attr'\n");
       display_sched_attr(policy, &param);

       s = pthread_attr_getinheritsched(&attr, &inheritsched);
       printf("    inheritsched is %s\n",
               (inheritsched == PTHREAD_INHERIT_SCHED)  ? "INHERIT" :
               (inheritsched == PTHREAD_EXPLICIT_SCHED) ? "EXPLICIT" :
               "???");
       printf("\n");
   }

   /* Create a thread that will display its scheduling attributes */

   s = pthread_create(&thread, attrp, &thread_start, NULL);
   if (s != 0)
       handle_error_en(s, "pthread_create");

   /* Destroy unneeded thread attributes object */

   if (!use_null_attrib) {
     s = pthread_attr_destroy(&attr);
     if (s != 0)
         handle_error_en(s, "pthread_attr_destroy");
   }

   s = pthread_join(thread, NULL);
   if (s != 0)
       handle_error_en(s, "pthread_join");

   exit(EXIT_SUCCESS);
}
@mischief
Copy link

this workaround here seems to fix the issue but i haven't found the underlying cause yet. what other systems have you tried this on and what are the results?

@onedsc
Copy link
Author

onedsc commented Jul 13, 2015

Yes. That fixed it - thank you.
sysctl -w kernel.sched_rt_runtime_us=-1

All of the other systems our code is running works out of the box without setting this parameter.
CentOS 5, 6, Debian 7, SLES 11 SP3, SLES 12, Ubuntu 12 and 14 LTS releases. If you want I can actually run this program, but the code it uses calls the same functions as in the example.

Update: Here is Ubuntu 14.04 LTS
(Linux trusty 3.13.0-57-generic #95-Ubuntu SMP Fri Jun 19 09:28:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux)

[root@vagrant-arch cppTest]#  sysctl kernel.sched_rt_runtime_us
kernel.sched_rt_runtime_us = 950000
root@trusty:~# sysctl kernel.sched_rt_runtime_us
kernel.sched_rt_runtime_us = 950000

root@trusty# ./sched_test -ar20 -ie
Scheduler settings of main thread
    policy=SCHED_OTHER, priority=0

Scheduler settings in 'attr'
    policy=SCHED_RR, priority=20
    inheritsched is EXPLICIT

Scheduler attributes of new thread
    policy=SCHED_RR, priority=20

Here is Arch Linux:
(Linux vagrant-arch.vagrantup.com 4.0.7-2-ARCH #1 SMP PREEMPT Tue Jun 30 07:50:21 UTC 2015 x86_64 GNU/Linux)

[root@vagrant-arch cppTest]#  sysctl kernel.sched_rt_runtime_us
kernel.sched_rt_runtime_us = 950000

root@vagrant-arch cppTest]# ./sched_test -ar20 -ie
Scheduler settings of main thread
    policy=SCHED_OTHER, priority=0

Scheduler settings in 'attr'
    policy=SCHED_RR, priority=20
    inheritsched is EXPLICIT

Scheduler attributes of new thread
    policy=SCHED_RR, priority=20

@onedsc
Copy link
Author

onedsc commented Jul 17, 2015

@mischief: I have updated my reply with two examples. Do you need more information from me to investigate this bug? I am more than willing to give output on other distributions if you think that will help.

@FirefighterBlu3
Copy link

what may or may not be very related, systemd defaults put ssh connections into a non-RT cgroup. if you're trying to ssh in and run FIFO/RR tasks, it just won't work. you need to move your shell into the default cpu:/ group which does permit priority scheduling.

cgclassify -g cpu:/ $$

your systemd unit file can start your service and get priority scheduling if you put ControlGroup=cpu:/ under [Service]

@onedsc
Copy link
Author

onedsc commented Jul 22, 2015

@FirefighterBlu3 Thank you - this is good information. I just built an image from the coreOS sdk and booted it using vmware fusion. I tried the same test executing from the console window and it failed. I am not up all of systemd yet - so this is all good info.

Here is the screenshot from the console of my coreos-image:
screen shot 2015-07-22 at 8 57 55 am

@FirefighterBlu3
Copy link

yup. now run cgclassify -g cpu:/ $$

then retry your sched_test program.

@onedsc
Copy link
Author

onedsc commented Jul 22, 2015

@FirefighterBlu3
Unless I missed something the cgclassify command is not part of the CoreOS distro.

@FirefighterBlu3
Copy link

you can do it alternatively with: echo $$ > /sys/fs/cgroup/cpu/tasks

(check path, this is from memory)

@onedsc
Copy link
Author

onedsc commented Jul 22, 2015

That worked. It looks like even the console is restricted. Very interesting.
Clearly I have much to learn about systemd. Where can get more information about what systemd is doing with cgroups? Is this a attribute of systemd, or CoreOS, cgroups best practices?

screen shot 2015-07-22 at 9 39 44 am

@onedsc
Copy link
Author

onedsc commented Jul 22, 2015

@FirefighterBlu3
What's also interesting is that other distros don't appear to do this out of the box. Here is Arch Linux's output from an ssh session:

[root@vagrant-arch vagrant]# cat /proc/$$/cgroup
8:blkio:/
7:memory:/
6:cpuset:/
5:net_cls:/
4:devices:/user.slice
3:cpu,cpuacct:/
2:freezer:/
1:name=systemd:/user.slice/user-1000.slice/session-c2.scope

@FirefighterBlu3
Copy link

perhaps it depends on your installation. my Arch installation that i'm working on does exactly this and i have a special need for sched_fifo so i had to figure out why it didn't work when by all expected accounts, it should have. unfortunately the documentation searches on google for cgroup information are still rather sparse but they're starting to get a lot better. arch has some good documentation, redhat has some too. if i had stumbled upon https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-Moving_a_Process_to_a_Control_Group.html some time back, then i wouldn't have been pulling my hair out for hours.

@crawford
Copy link
Contributor

Still an issue with 1262.0.0.

@bgilbert
Copy link
Contributor

Thank you for reporting this issue. Unfortunately, we don't think we'll end up addressing it in Container Linux.

We're now working on Fedora CoreOS, the successor to Container Linux, and we expect most major development to occur there instead. Meanwhile, Container Linux will be fully maintained into 2020 but won't see many new features. We appreciate your taking the time to report this issue and we're sorry that we won't be able to address it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants