Skip to content

Files

Latest commit

0c1ad73 · May 18, 2020

History

History

branch-misprediction

Branch misprediction

This example demonstrates mispredictions in the CPU branch-predictor. Modern CPUs have predictors that try to guess whether a branch will be taken or not. This information is vital for performance because the CPU instruction pipeline may be flushed if a wrong guess was made. If you sort your data in a way that makes a frequently executed branch easily predictable, you could gain a significant performance advantage.

Branch mispredictions belong to a group of control hazards that can affect pipeline execution (there are also data and structural hazards).

Usage:

$ branch-misprediction <sort>

The branch-misprediction program will fill a large array with uniformly distributed random numbers from 1 to 10. Then it will repeatedly go through the whole array and sum elements that are smaller than 6. If sort equals 1, the array will be sorted in ascending order at the beginning of the program. If sort equals 0 the array will not be sorted.

If the array is sorted, the branch predictor will learn very quickly that it should expect the array[i] < 6 condition to be true in the first half of the array and false in the second half. I used floats, therefore the compiler can't use CMOV (you can try it with an integer, then there should be no difference). If you don't observe a large performance difference between the sort/nosort scenarios, make sure that your compiler is not doing any vector tricks with the if followed by the sum.

The branch miss count can be easily observed with perf. Here's how it looks on my computer:

$  perf stat -e branch-misses branch-misprediction 0
2304
209 499 279      branch-misses

$  perf stat -e branch-misses branch-misprediction 1
460
70 238 390      branch-misses

The program has more than two times more branch mispredictions without the sort and runs almost six times slower. It's actually much faster even when counting the time to sort the array. There will probably be a point where the array is too large and it's no longer faster to sort it.

Branch target misprediction

Usage:

$ branch-target-misprediction <sort>

The branch-target-misprediction demonstrates the branch target predictor, which guesses where an indirect jump will go. The program creates a simple class hierarchy. struct A has a virtual method handle and two child classes, B and C. At the beginning of the program, a large array of objects is created, its objects are instances of B or C (chosen randomly). Then the program repeatedly goes over the array and calls the virtual method handle on all objects. The effect should be similar to the branch-misprediction, so if the array is not sorted, the performance should degrade. It is caused by the fact that the branch target predictor will be able to predict the target virtual method much better if the array is sorted by instance type (class).

Perf output for the branch target misprediction program on my computer:

$  perf stat -e branch-misses branch-target-misprediction 0
877
63 329 025      branch-misses

$  perf stat -e branch-misses branch-target-misprediction 1
792
16 139 166      branch-misses

You can use the provided benchmark.py script to plot the relative speeds of the sort/nosort variant.

$ python3 benchmark.py <path-to-executable>

Branch prediction explanation: https://en.wikipedia.org/wiki/Branch_predictor