Skip to content

likwid accessD

Thomas Gruber edited this page Jul 27, 2020 · 4 revisions

likwid-accessD: Access daemon for MSR and PCI based hardware performance counters

Introduction

Because of the security problems with readable and writeable MSR device files, LIKWID supports a daemon solution for use in security sensitive areas as computing centers, companies or HPC clusters. This is required if you want to access the Uncore on recent Intel Server processors which is implemented by means of PCI devices.

Basics

The idea is that not all LIKWID applications run with privileges but only the part which needs them, the MSR and PCI read and write routines.

The LIKWID applications needing read/write access to MSRs and PCI devices (likwid-perfctr, likwid-powermeter and likwid-agent) will start daemon(s) in their startup. Only this daemon has access to the MSR and PCI device files. Communication between the LIKWID application and the daemon is implemented with UNIX domain sockets. During exit of the application the daemon is terminated. This enables a secure setup for the MSR and PCI device access without complicating things for the user. Of course some overhead is introduced compared to the direct access mode which requires the users to have special privileges.

Update for Linux kernel 5.9 and newer: With Linux 5.9, the msr kernel module got some security fixes. The major change for LIKWID is, that now all MSR are non-writable by default. In order to change that, you have to change the boot options of your operating system to contain msr.allow_writes=on to enable writes again. This affects only ACCESSMODE=direct and ACCESSMODE=accessdaemon. If you use the perf_event backend, you don't have to change anything.

Enable and Build

To enable the communication with the daemon the variable ACCESSDAEMON in config.mk has to be set to a valid absolute path to the likwid-accessD executable. The access daemon is not built if BUILDDAEMON is set to false. The build process can be triggered explicitly with make likwid-accessD. The standard make install target will install the daemon with the other tools and set permissions. Two scenarios are possible for configuration:

  • You set the daemon setuid root. This is the easiest way.
  • You set the daemon setgid to some LIKWID related group which has access rights on the MSR device files.

For access to the PCI device files the daemon must be setuid root enabled because the devices are only writable by root. You can change the access permissions of PCI devices at their creation through udev rules.

Protocol

Every LIKWID instance starts its own daemon, in normal cases a single daemon is enough. Only for threaded applications instrumented with the Marker API multiple daemons are started. This client-server pair will communicate with a socket file in /tmp/likwid-$PID. The daemon only accepts one connection. As soon as the connect is successful the socket file will be deleted.

From there the communication consists of write read pairs issued from the client (LIKWID application). Before the daemon performs any actual access to MSRs, it checks if it is a valid performance counter register. Other register accesses will be dropped, logged to syslog and reported back to the LIKWID library. For PCI devices, there is currently no such check but it is intended to introduce one.

On shutdown the client will terminate the daemon with a exit message.

The daemon has the following error handling:

  • To prevent daemons not stopped correctly the daemon has a timeout on startup.
  • If the client prematurely disconnects the daemon terminates.
  • If the client disconnects between a read and write the daemon catches SIGPIPE and disconnects.

Security

The daemon was written with security in mind. The allowed registers are hardcoded in the access daemon code for MSR and PCI devices, so it is not possible to read or write to a register that may harm the system. In April 2015 the code was checked by the IT security department of the University Erlangen-Nuremburg and some changes were made based on their comments. Of course, still there might be problematic sections in the code and we the developer take no responsibility for any security problems but of course we will fix possible bugs as fast as possible.

Clone this wiki locally