This repository contains a benchmark application to test the performance of a ROS2 system. To run this benchmark, the user needs to provide a specific topology to simulate, in the form of a .json file. The application will load the complete ROS2 system from the topology file and will start doing dummy-message passing between the different nodes. Meanwhile, statistical data will be collected, such as the usage of resources (CPU utilization and RAM consumption) and message latencies. The application will run for a user-specified amount of time, and will output the results as human-readable log files. Tools to plot these results are provided.
Two default topologies are provided in the topology folder, called Sierra Nevada and Mont Blanc. Sierra Nevada is light 10-node system while Mont Blanc is a heavier and more complex 20-node system.
Follow the instructions for building the performance_test framework.
First, source the environment:
source performances_ws/install/local_setup.bash
Run:
cd performances_ws/install/lib/irobot_benchmark
./irobot_benchmark topology/sierra_nevada.json -t 60 --ipc on
This will run Sierra Nevada for 60 seconds and with Intra-Process-Communication activated.
For more options, run ./irobot_benchmark --help
.
After running the application, a folder log will be created along with four different files inside it:
- latency_all.txt
- latency_total.txt
- resources.txt
- events.txt
The following are sample files that have been obtained running Sierra Nevada on a RaspberryPi 3.
latency_all.txt:
node topic size[b] received[#] late[#] too_late[#] lost[#] mean[us] sd[us] min[us] max[us] freq[hz] duration[s]
lyon amazon 36 12001 11 0 0 602 145 345 4300 100 120
hamburg danube 8 12001 15 0 0 796 233 362 5722 100 120
hamburg ganges 16 12001 10 0 0 557 119 302 4729 100 120
hamburg nile 16 12001 18 0 0 658 206 300 5258 100 120
hamburg tigris 16 12000 17 0 0 736 225 310 5994 100 120
osaka parana 12 12001 32 0 0 636 236 346 4343 100 120
mandalay danube 8 12001 16 0 0 791 189 418 6991 100 120
mandalay salween 48 1201 1 0 0 663 297 391 6911 10 120
ponce danube 8 12001 15 0 0 882 203 437 7270 100 120
ponce missouri 10000 1201 0 0 0 881 245 434 3664 10 120
ponce volga 8 241 0 0 0 954 586 413 4010 2 120
barcelona mekong 100 241 0 0 0 844 297 425 2074 2 120
georgetown lena 50 1201 1 0 0 707 302 368 8392 10 120
geneva congo 16 1201 1 0 0 691 298 353 7218 10 120
geneva danube 8 12001 26 0 0 1008 227 480 7025 100 120
geneva parana 12 12001 40 0 0 760 275 368 4351 100 120
arequipa arkansas 16 1201 1 2 0 810 1079 379 37064 10 120
latency_total.txt:
received[#] mean[us] late[#] late[%] too_late[#] too_late[%] lost[#] lost[%]
126496 744 204 0.1613 2 0.001581 0 0
There are different message classifications depending on their latency.
A message is classified as too_late when its latency is greater than min(period, 50ms)
, where period
is the publishing period of that particular topic.
A message is classified as late if it's not classified as too_late but its latency is greater than min(0.2*period, 5ms)
.
The idea is that a real system could still work with a few late messages but not too_late messages.
Note that there are CL options to change these thresholds (for more info: ./irobot_benchmark --help
).
A lost message is a message that never arrived.
We can detect a lost message when the subscriber receives a message with a tracking number greater than the one expected.
The assumption here is that the messages always arrive in chronological order, i.e., a message A sent before a message B will either arrive before B or get lost, but will never arrive after B.
The rest of the messages are classified as on_time.
Message classifications by their latency
+ + +
| | |
| | |
| | |
| | |
| | |
+-------------------------------+-------------------------------+
<--------><---------------------><----------------------------------->
on_time late too_late
<------------------------------->
period
resources.txt (trimmed to fit):
time[ms] cpu[%] arena[KB] in_use[KB] mmap[KB] rss[KB] vsz[KB]
0 0 0 0 0 0 0
500 27 61080 60712 0 24948 305212
1000 51 167860 167539 0 47060 544012
1500 50 202844 202753 0 55496 660724
2000 43 202860 202769 0 55496 660724
2500 39 202876 202786 0 55496 660724
3000 36 202900 202801 0 55496 660724
3500 34 202928 202822 0 55496 660724
4000 33 202936 202837 0 55496 660724
4500 31 202948 202853 0 55496 660724
5000 31 202968 202868 0 55496 660724
5500 30 202992 202893 0 55496 660724
6000 29 203012 202909 0 55496 660724
6500 29 203028 202925 0 55496 660724
7000 28 203036 202940 0 55496 660724
7500 28 203052 202955 0 55496 660724
8000 28 203068 202971 0 55496 660724
8500 27 203092 202986 0 55496 660724
9000 27 203104 203001 0 55496 660724
9500 27 203140 203038 0 55496 660724
10000 27 203280 203054 0 55792 661748
10500 26 203284 203069 0 55792 661748
11000 26 203296 203086 0 55792 661748
11500 26 203296 203102 0 55792 661748
12000 26 203304 203117 0 55792 661748
12500 26 203312 203133 0 55792 661748
13000 26 203312 203148 0 55792 661748
13500 26 203320 203163 0 55792 661748
14000 25 203324 203179 0 55792 661748
14500 25 203332 203194 0 55792 661748
15000 25 203340 203209 0 55792 661748
15500 25 203348 203225 0 55792 661748
16000 25 203368 203240 0 55792 661748
16500 25 203380 203255 0 55792 661748
17000 25 203392 203271 0 55792 661748
17500 25 203408 203286 0 55792 661748
18000 25 203424 203301 0 55792 661748
18500 25 203444 203317 0 55792 661748
19000 25 203524 203379 0 55792 661748
19500 25 203536 203394 0 55792 661748
20000 25 203536 203409 0 55792 661748
The resources utilization are sampled periodically every 0.5 seconds (can be changed with the option --sampling
).
The utilization of the CPU is measured over the total cores, i.e., a 100% CPU utilization on a 4-core platform means that all 4 cores are 100% busy.
The fields arena, in_use (uordblks) and mmap (hblkhd) are obtained by calling mallinfo.
These fields represent the total memory allocated by the sbrk()
and mmap()
system calls.
The field rss is the actual allocated memory that was mapped into physical memory.
Note that an allocated memory page is not mapped into physical memory until the executing process demands it (demand paging).
vsz represents the size of the virtual memory space.
For our benchmark, rss is the most important memory metric.
events.txt (trimmed to fit):
Time[ms] Caller Code Description
90 SYSTEM 0 [discovery] PDP completed
151 SYSTEM 0 [discovery] EDP completed
156 amazon->lyon 1 msg 0 late. 4300us > 2000us
156 danube->mandalay 1 msg 0 late. 4082us > 2000us
250 danube->ponce 1 msg 10 late. 2303us > 2000us
250 danube->geneva 1 msg 10 late. 2617us > 2000us
338 arkansas->arequipa 2 msg 0 too late. 184382us > 50000us
338 arkansas->arequipa 2 msg 1 too late. 136936us > 50000us
339 arkansas->arequipa 1 msg 2 late. 37064us > 5000us
2314 parana->geneva 1 msg 216 late. 2097us > 2000us
2314 parana->osaka 1 msg 216 late. 2131us > 2000us
3614 parana->osaka 1 msg 346 late. 2044us > 2000us
3615 parana->geneva 1 msg 346 late. 2682us > 2000us
3644 nile->hamburg 1 msg 349 late. 2234us > 2000us
3645 tigris->hamburg 1 msg 349 late. 2081us > 2000us
4650 danube->mandalay 1 msg 450 late. 2432us > 2000us
5145 parana->osaka 1 msg 499 late. 3243us > 2000us
5149 danube->ponce 1 msg 500 late. 2120us > 2000us
5149 danube->geneva 1 msg 500 late. 2140us > 2000us
5155 parana->osaka 1 msg 500 late. 2440us > 2000us
5155 parana->geneva 1 msg 500 late. 2536us > 2000us
5784 parana->geneva 1 msg 563 late. 2157us > 2000us
6647 ganges->hamburg 1 msg 650 late. 2302us > 2000us
7004 parana->geneva 1 msg 685 late. 2172us > 2000us
7004 parana->osaka 1 msg 685 late. 2138us > 2000us
8147 tigris->hamburg 1 msg 799 late. 4431us > 2000us
8148 ganges->hamburg 1 msg 800 late. 3775us > 2000us
11149 danube->mandalay 1 msg 1100 late. 2019us > 2000us
11150 danube->hamburg 1 msg 1100 late. 2736us > 2000us
12650 danube->hamburg 1 msg 1250 late. 2314us > 2000us
12650 danube->geneva 1 msg 1250 late. 2897us > 2000us
This file stores special events with their associated timestamp, such as:
- late message
- too late message
- lost message
- system nodes discovery
The target performance for different topologies on specific platforms can be found in the folder performance_target. For example, sierra_nevada_rpi3.json:
{
"topology_file": "sierra_nevada.json",
"platform": "rpi3 b 1.2",
"additional_options": "-t 600 --ipc on -s 1000 --late-percentage 20 --late-absolute 5000 --too-late-percentage 100 --too-late-absolute 50000",
"comments": "scaling governor should be set to 'performance' at 800MHz",
"resources": {
"cpu[%]": 15,
"rss[KB]": 10240
},
"latency_total": {
"late[%]": 1.9,
"too_late[%]": 0.1,
"lost[%]": 0.0
}
}
After you have run the application, you can plot the results using the plot scripts described in the performance_test library.
Moreover, it's possible to directly compare the results with a performance target defined in a .json file. For example, you can run:
python3 <path_to_performance_test_pkg>/scripts/visualization/benchmark_app_evaluation.py --target <path_to_benchmark_pkg>/performance_target/sierra_nevada_rpi3.json --resources log/resources.txt --latency log/latency_total.txt
Also, you can use the performance target .json file together with the cpu_ram_plot.py
script
python3 <path_to_performance_test_pkg>/scripts/visualization/cpu_ram_plot.py log/resources.txt --x time --y cpu --y2 rss --target <path_to_benchmark_pkg>/perf_target.json
For reference only, these are the results obtained by running the default topologies on an RPi3 using ROS2 Dashing.