Skip to content

Commit

Permalink
tests/bench_timers: Add benchmark for periph_timer and xtimer
Browse files Browse the repository at this point in the history
This test is intended to collect statistics about the runtime delays in
the periph_timer implementation. This tool is mainly intended to detect
problems in the low level driver implementation which can be difficult
to detect from higher level systems such as xtimer.
  • Loading branch information
Joakim Nohlgård committed May 29, 2018
1 parent 5be4b7f commit 857eaf4
Show file tree
Hide file tree
Showing 8 changed files with 1,777 additions and 0 deletions.
54 changes: 54 additions & 0 deletions tests/bench_timers/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
include ../Makefile.tests_common

# These boards only have a single timer in their periph_conf.h, needs special
# CFLAGS configuration to build properly
SINGLE_TIMER_BOARDS = \
native \
nucleo32-f031 \
nucleo32-f042 \
#

# These boards have too little RAM to support collecting detailed statistics
# with the default settings of TEST_MIN and TEST_MAX, so DETAILED_STATS will be
# disabled by default for these boards unless explicitly enabled
SMALL_RAM_BOARDS = \
nucleo32-f031 \
#

FEATURES_REQUIRED = periph_timer

# Use RTT as a wall clock reference, if available
FEATURES_OPTIONAL = periph_rtt

USEMODULE += random
USEMODULE += fmt
USEMODULE += matstat
USEMODULE += xtimer

ifeq (,$(findstring TIM_TEST_DEV,$(CFLAGS)))
ifneq (,$(filter $(BOARD),$(SINGLE_TIMER_BOARDS)))
CFLAGS += -DTIM_TEST_DEV=TIMER_DEV\(0\) -DTIM_REF_DEV=TIMER_DEV\(0\)
endif
endif

ifeq (,$(findstring DETAILED_STATS,$(CFLAGS)))
ifneq (,$(filter $(BOARD),$(SMALL_RAM_BOARDS)))
CFLAGS += -DDETAILED_STATS=0
endif
endif

# Shortcut to configure the build for testing xtimer against a periph_timer reference
.PHONY: test-xtimer
test-xtimer: CFLAGS+=-DTEST_XTIMER -DTIM_TEST_FREQ=XTIMER_HZ -DTIM_TEST_DEV=XTIMER_DEV
test-xtimer: all

# Shortcut to configure the build for testing Kinetis LPTMR against a PIT reference
# Usage: make BOARD=frdm-k22f test-kinetis-lptmr flash
.PHONY: test-kinetis-lptmr
test-kinetis-lptmr: CFLAGS+=-DTIM_TEST_DEV=TIMER_LPTMR_DEV\(0\) -DTIM_TEST_FREQ=32768 -DTIM_REF_DEV=TIMER_PIT_DEV\(0\)
test-kinetis-lptmr: all

# Reset the default goal.
.DEFAULT_GOAL :=

include $(RIOTBASE)/Makefile.include
341 changes: 341 additions & 0 deletions tests/bench_timers/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,341 @@
# Benchmark test for RIOT timers

This test is intended to collect statistics about the runtime delays in the
periph_timer implementation. This tool is mainly intended to detect problems in
the low level driver implementation which can be difficult to detect from
higher level systems such as xtimer.

## Testing periph/timer

The basic idea of the test is to generate a wide variety of calls to the
periph/timer API, in order to detect problems with the software implementation.
The test consists of a single thread of execution which will set random
timeouts on a periph_timer device. A reference timer is used to measure the
time it takes until the callback is called. The difference between the expected
time and the measured time is computed, and the mean and variance of the
recorded values are calculated. The results are printed as a table on stdout
every 30 seconds. All of the test scenarios used in this application are
based on experience from real world bugs encountered during the development of
RIOT and other systems.

### periph/timer functions tested

Both timer_set and timer_set_absolute calls are mixed in a random order, to
ensure that both functions are working correctly.

### Rescheduling

Some bugs may occur only when the application attempts to replace an already
set timer by calling timer_set again while the timer is running. This is dubbed
"resched" in this application. The application will issue rescheduling timer
calls in the normal operation of the test to catch this class of problems.

### Setting stopped timers

Another class of bugs is the kind where a timer is not behaving correctly if it
was stopped before setting a timer target. The timer should set the new target,
but not begin counting towards the target until timer_start is called. This is
covered by this application under the "stopped" category of test inputs.

## General benchmark behaviour

The benchmark application uses some techniques to improve the quality of the
tests.

### Avoiding phase lock

The CPU time used for generation and setting of the next timer target in
software is approximately constant across iterations. When the application flow
is driven by the timer under test, via a mutex which is unlocked from the timer
callback, the application will be "phase locked" to the timer under test. This
peculiarity may hide certain implementation race conditions which only occur at
specific phases of the timer ticks. In the test application, this problem is
avoided by inserting random CPU delay loops at key locations:

- After timer_stop
- Before timer_set/timer_set_absolute
- Before timer_start

### Estimating benchmark CPU overhead

An estimation of the overhead resulting from the CPU processing delays inside
the benchmarking code is performed during benchmark initialization. The
estimation algorithm attempts to perform the exact same actions that the real
benchmark will perform, but without setting any timers. The measured delay is
assumed to originate in the benchmark code and will be subtracted when
computing the difference between expected and actual values. A warning is
printed when the variance of the estimated CPU overhead is too high, this can
be a sign that some other process is running on the CPU and disrupting the
estimation, or a sign of serious problems inside the timer_read function.

## Testing xtimer

The benchmark application can be reconfigured for using xtimer instead of the
low level periph/timer interface. Use the Makefile target test-xtimer to build
a test for xtimer.

### xtimer functions tested

The application will mix calls to the following xtimer functions:

- `_xtimer_set`
- `_xtimer_set_absolute`
- `_xtimer_periodic_wakeup`
- `_xtimer_spin`

These functions cover almost all uses of xtimer. The higher level functions
such as `xtimer_usleep` and `xtimer_set_msg` all use these functions internally
in the implementations.

## Results

When the test has run for a certain amount of time, the current results will be
presented as a table on stdout. To assist with finding the source of any
discrepancies, the results are split according to three parameters:

- Function used: timer_set, timer_set_absolute
- Rescheduled: yes/no
- Stopped: yes/no

### Expected results

It is difficult to set a general expected result which applies to all
platforms. However, the mean and variance of the differences should be small,
in relation to the tick duration of the timer under test. What is "small" also
depends the CPU core frequency, because of CPU processing time spent in the
driver code and in the test code. For example, when testing a 1 MHz timer and
comparing with another 1 MHz timer, the mean difference should likely be in
single digits on a Cortex-M CPU. Testing a 32768 Hz timer and referencing a 1
MHz timer on the other hand should have a mean difference in double digits, the
variance will also be greater because of the quantization errors and rounding
in the tick conversion routine. If the mean or variance of the difference is
exceptionally large, the row will be marked with "SIC!" to draw attention to
the fact.

### Interpreting timer_set_xxx statistics

The first part of the results compares the reference time elapsed against the
expected time. A negative value means that the timer alarm is triggered too
soon, compared to the expected time. A positive value means that the timer
alarm is triggered late, compared to the expected time.

#### Expected variance

The variance of the results should generally be as small as possible, except
for the case where the reference timer is running at a higher frequency than
the timer under test. There will always be a truncation error when setting a
timer target because of the discrete timer counts that can be set. The
benchmark will compute limits for the expected variance and show a message if
the variance is less than or greater than the expected variance.

##### Variance too low

A variance lower than expected indicates that there is a phase correlation
between the reference timer and the timer under test. This can only be
explained by a software bug or a hardware failure. The primary suspect will be
the reference timer.

##### Variance too high

A variance greater than expected indicates that there is some extra randomness
in the timeout length, which can be caused by other processes interfering with
the test, a software bug in the timer under test, or hardware failure.

##### Mathematical explanation

Let `X0` be the exact time that a timer_set call occurs, `X0` is a real number.
Let `T` be the desired timeout, `T` is a positive integer because the timer can
only count integer number of ticks. Let `x0 = floor(X0)` be the count that the
timer is currently on when the timer is set, `x0` is an integer. If `X0` is
uniformly random, then the truncation error `X0' = (X0 - x0)` will be a sample
from a continuous uniform distribution in the interval `[0,1)` ticks. The timer
target, `x1`, in the timer under test can be expressed as `x1 = x0 + T`. Let
`k` be a real number that satisfies `Y = kx`, where `x` is an interval in the
unit used by the timer under test, and `Y` is an interval in the unit used by
the reference timer, `Y` is a real number. The time measured by the reference
timer can be expressed as
`Y = k(x1 - X0) = k(x0 + T - X0) = k(X0 - X0' + T - X0) = k(T - X0')`.
`Y` will be a sample from a uniform random distribution in the interval
`[kT, k(T + 1)]`. The variance of `Y` is defined as
`Var(Y) = (k(T + 1) - kT)^2 / 12`.

### Interpreting timer_read statistics

A separate table is displayed for statistics on timer_read. This table compares
the expected timer under test time after a timer has triggered, against the
actual reported value from `timer_read(TIM_TEST_DEV)`. A positive value means
that the TUT time has passed the timer target time. A negative value means
that, according to the timer under test, the alarm target time has not yet been
reached, even though the timer callback has been executed.

### A note on BOARD=native

When running on native, the application will be running as a normal process in
a multi process system which means that most results will have a much greater
variance than expected and the resulting differences will also be much larger
than on a bare metal system. Be careful when drawing conclusions on results
from native.

## Configuration

The timer under test and the reference timer can be chosen at compile time by
defining TIM_TEST_DEV and TIM_REF_DEV, respectively. The frequencies for
these timers can be configured through TIM_DEV_FREQ and TIM_REF_FREQ. For
example, to compare timer device 2 running at 32768 Hz against a reference timer
on timer device 0 (default) running at 1 MHz (default) on Mulle:

CFLAGS='-DTIM_TEST_DEV=TIMER_DEV\(2\) -DTIM_TEST_FREQ=32768' BOARD=mulle make flash

### Default configuration

reference timer: TIMER_DEV(0)
timer under test: TIMER_DEV(1)
timer frequency: 1 MHz

Define USE_REFERENCE=0 to compare a timer against itself. Be aware that this
may hide a class of errors where the timer is losing ticks in certain
situations, which a separate reference timer would be able to detect.

### Example output

Below is a short sample of a test of a 32768 Hz timer with a 1 MHz reference:

------------- BEGIN STATISTICS --------------
Limits: mean: [-10, 41], variance: [58, 99]
Target error (actual trigger time - expected trigger time), in reference timer ticks
positive: timer is late, negative: timer is early
=== timer_set running ===
interval count sum sum_sq min max mean variance
1 - 2: 25705 455646 2002373 2 35 17 77
3 - 4: 25533 457326 1973191 2 35 17 77
5 - 8: 25412 451046 1970014 2 35 17 77
9 - 16: 25699 458563 2017198 2 36 17 78
17 - 32: 25832 457768 2018909 1 35 17 78
33 - 64: 25758 440454 1978830 1 35 17 76
65 - 128: 25335 414320 1942196 0 34 16 76
129 - 256: 25254 367689 1979616 -3 32 14 78
257 - 512: 25677 285822 2020736 -7 31 11 78
TOTAL 230205 3788634 18459378 -7 36 16 80
...

The statistics above show that the timer implementation introduces a small
delay on short timers. Note however that it is not clear whether this delay is
in timer_set, or timer_read, but the combined effect of all error sources make
the timer overshoot its target by on average 17 microseconds. The difference
between min and max is close to the expected difference of approximately
1/32768 s = 30.51 µs. The lower min and mean for the longer timer intervals are
most likely caused by a drift between the 1 MHz clock and the 32.768 kHz clocks
inside the CPU, and is expected when using two separate clock sources for the
timers being compared.

Below is a short sample of a test where there is something wrong with the timer
implementation, again a 32768 Hz timer tested with a 1 MHz reference:

=== timer_set_absolute resched ===
interval count sum sum_sq min max mean variance
1 - 2: 152217 82324291 13339518497 38 1071 540 87635 <=== SIC!
3 - 4: 152199 82254909 13301794832 38 1071 540 87397 <=== SIC!
5 - 8: 152023 82085454 13264873203 38 1071 539 87256 <=== SIC!
9 - 16: 152156 82236539 13265351119 38 1071 540 87183 <=== SIC!
17 - 32: 152639 82606815 13267666812 38 1071 541 86922 <=== SIC!
33 - 64: 151883 81836155 13268661551 37 1070 538 87361 <=== SIC!
65 - 128: 152114 82151495 13228100922 36 1070 540 86962 <=== SIC!
129 - 256: 152363 81806042 13278055359 34 1069 536 87148 <=== SIC!
257 - 512: 152360 81235185 13303811951 29 1066 533 87318 <=== SIC!
TOTAL 1369954 738536885 119197349719 29 1071 539 87008 <=== SIC!

We can see that the variance, the maximum error, and the mean are very large.
This particular timer implementation needs some work on its timer_set_absolute
implementation.

## Configuration details

Configuration macros used by the application are described below

### Settings related to timer hardware

#### TIM_TEST_DEV

Timer under test. Default: `TIMER_DEV(1)`

#### TIM_REF_DEV

Reference timer used for measuring the callback time error. Default: `TIMER_DEV(0)`

#### TIM_TEST_FREQ

Frequency of the timer under test, used when calling timer_init during the
application startup. Default: `1000000`, 1 MHz

#### TIM_REF_FREQ

Frequency of the reference timer, used when calling timer_init during the
application startup. Default: `1000000`, 1 MHz

#### TIM_TEST_CHAN

Timer channel used on the timer under test. Default: `0`

#### USE_REFERENCE

Shorthand for setting the reference timer to the same as the timer under test.
Default: `0`

### Settings related to result processing


#### DETAILED_STATS

Keep statistics per timer target offset value if set to 1, otherwise keep
statistics only per scenario. Default: `1`

#### LOG2_STATS

Only used when `DETAILED_STATS == 1`. Statistics are grouped according to
2-logarithms of the timer offset value, e.g. 1-2, 3-4, 5-8, 9-16 etc. This
reduces memory consumption and creates a shorter result table for easier
overview. Default: `1`

#### TEST_PRINT_INTERVAL_TICKS

The result table will be printed to standard output when this many reference
timer ticks have passed since the last printout. Default: `((TIM_REF_FREQ) * 30)`

### Settings related to timer input generation

#### TEST_MIN

Minimum timer offset tested for `timer_set_absolute`, in timer under test
ticks. Default: `1` for timers < 1 MHz, `16` otherwise, to prevent setting a
timer target in the past

#### TEST_MIN_REL

Minimum timer offset tested for `timer_set`, in timer under test ticks.
Default: `0`

#### TEST_MAX

Maximum timer offset tested, in timer under test ticks. Default: `128`

#### SPIN_MAX_TARGET

The CPU busy wait loop will be calibrated during application start up so that
the maximum random spin length is approximately equal to this many timer under
test ticks. Default: `16`

#### TEST_UNEXPECTED_STDDEV

Mark the output row if the computed variance implies a standard deviation which
is greater than this value. This value will be automatically recomputed to
compensate for the variance resulting from the truncation in the tick
conversion if the reference timer is running at a higher frequency than the
timer under test. Default: `4`

#### TEST_UNEXPECTED_MEAN

Mark the output row if the computed mean absolute difference is greater than this
number of timer under test ticks. This value will be automatically recomputed
to compensate for the error resulting from the truncation in the tick
conversion if the reference timer is running at a higher frequency than the
timer under test. Default: `2`
Loading

0 comments on commit 857eaf4

Please sign in to comment.