-
Notifications
You must be signed in to change notification settings - Fork 6
Benchmarks
The benchmarks on this page are not intended to be definitive. They are intended to show the order-of-magnitude of the overhead associated with using PMDA++ over PCP's C API. If performance is absolutely critical for you (you're working with very high-speed sampling of hardware devices, or monitoring highly constrained devices?) then you should perform your own benchmarking on the relevant device(s) / platform(s).
Benchmarking was performed using a basic custom benchmark.sh script, which works as follows:
- Start a
pmie
instance to sample a specified metric (such astrivial.time
) from the PMDA-under-test, at a specified rate (such as once per millisecond, or 1kHz). - Use
pmval
to fetch the PMDA's user or system time over a specified period (such as 10 seconds). - Stop the
pmie
instance.
The above process was executed for both the C and C++ versions of the PMDA being tested to compare the overhead of the C++ implementation over the C implementation.
Note, depending on your version of PCP, and its access controls, you may need to run the benchmark script as root, or some other user with permission to monitor the PMDA-under-test via the proc
PMDA.
The following charts show the system and user time consumed by both the C and C++ (PMDA++) versions of the relevant metric, using sample intervals from 10 seconds down to 0.5 milliseconds.
These graphs show that the performance overhead (if any) of using PMDA++ over the standard PCP API is less than the variations caused by other services running on the test machine. If someone has access to a really consistent, idle, spare machine and would like to do some longer running tests (these were 60 second runs), then that would be great.
Other notes:
- sampling metrics at once every half a millisecond (0.0005 seconds) only just got the system time up to ~0.16%. At this rate,
pmcd
was consuming roughly 50% of one core. - when sampling faster than once per 0.0005 resulted in
pmie
reporting issues with clock skew. - PCP really is quite lean.
The actual user and system time reported by the above tests is shown here, as we used with Google Charts to create the above diagrams.
var simpleNowData = google.visualization.arrayToDataTable([
[ 'x', 'sys C', 'usr C', 'sys C++', 'usr C++'],
[ 10, 0.00000, 0.00000, 0.00000, 0.00000 ],
[ 1, 0.00050, 0.00000, 0.00033, 0.00000 ],
[ 0.1, 0.00267, 0.00000, 0.00383, 0.00433 ],
[ 0.01, 0.03233, 0.00650, 0.03450, 0.00683 ],
[ 0.001, 0.09866, 0.02250, 0.10400, 0.01650 ],
[ 0.0005, 0.17016, 0.03800, 0.16849, 0.02817 ]
]);
var simpleNumfetchData = google.visualization.arrayToDataTable([
[ 'x', 'sys C', 'usr C', 'sys C++', 'usr C++'],
[ 10, 0.00000, 0.00000, 0.00000, 0.00000 ],
[ 1, 0.00050, 0.00017, 0.00050, 0.00017 ],
[ 0.1, 0.00350, 0.00150, 0.00000, 0.00067 ],
[ 0.01, 0.02950, 0.00650, 0.02817, 0.01233 ],
[ 0.001, 0.10000, 0.02400, 0.10366, 0.01133 ],
[ 0.0005, 0.15933, 0.03333, 0.16566, 0.03217 ]
]);
var trivialData = google.visualization.arrayToDataTable([
[ 'x', 'sys C', 'usr C', 'sys C++', 'usr C++'],
[ 10, 0.00000, 0.00000, 0.00000, 0.00000 ],
[ 1, 0.00033, 0.00000, 0.00000, 0.00000 ],
[ 0.1, 0.00333, 0.00100, 0.00000, 0.00467 ],
[ 0.01, 0.03183, 0.00850, 0.03383, 0.00217 ],
[ 0.001, 0.10449, 0.00217, 0.09416, 0.00217 ],
[ 0.0005, 0.16300, 0.00567, 0.16383, 0.00500 ]
]);