Skip to content
Paul Colby edited this page Dec 18, 2013 · 15 revisions

Disclaimer

The benchmarks on this page are not intended to be definitive. They are intended to show the order-of-magnitude of the overhead associated with using PMDA++ over PCP's C API. If performance is absolutely critical for you (you're working with very high-speed sampling of hardware devices, or monitoring highly constrained devices?) then you should perform your own benchmarking on the relevant device(s) / platform(s).

Methodology

Benchmarking was performed using a basic custom benchmark.sh script, which works as follows:

  1. Start a pmie instance to sample a specified metric (such as trivial.time) from the PMDA-under-test, at a specified rate (such as once per millisecond, or 1kHz).
  2. Use pmval to fetch the PMDA's user or system time over a specified period (such as 10 seconds).
  3. Stop the pmie instance.

The above process was executed for both the C and C++ versions of the PMDA being tested to compare the overhead of the C++ implementation over the C implementation.

Note, depending on your version of PCP, and its access controls, you may need to run the benchmark script as root, or some other user with permission to monitor the PMDA-under-test via the proc PMDA.

Results

The following charts show the system and user time consumed by both the C and C++ (PMDA++) versions of the relevant metric, using sample intervals from 10 seconds down to 0.5 milliseconds.

simple now simple numfetch trivial time

These graphs show that the performance overhead (if any) of using PMDA++ over the standard PCP API is less than the variations caused by other services running on the test machine. If someone has access to a really consistent, idle, spare machine and would like to do some longer running tests (these were 60 second runs), then that would be great.

Other notes:

  • sampling metrics at once every half a millisecond (0.0005 seconds) only just got the system time up to ~0.16%. At this rate, PCP's pmcd process was consuming roughly 50% of one core, making the PMDA's CPU usage negligible.
  • when sampling faster than once per half millisecond, pmie began reporting issues with clock skew.

Chart Data

The actual user and system time reported by the above tests is shown here, as we used with Google Charts to create the above diagrams.

    var simpleNowData = google.visualization.arrayToDataTable([
      [ 'x', 'sys C', 'usr C', 'sys C++', 'usr C++'],
      [ 10,     0.00000, 0.00000, 0.00000, 0.00000 ],
      [ 1,      0.00050, 0.00000, 0.00033, 0.00000 ],
      [ 0.1,    0.00267, 0.00000, 0.00383, 0.00433 ],
      [ 0.01,   0.03233, 0.00650, 0.03450, 0.00683 ],
      [ 0.001,  0.09866, 0.02250, 0.10400, 0.01650 ],
      [ 0.0005, 0.17016, 0.03800, 0.16849, 0.02817 ]
    ]);

    var simpleNumfetchData = google.visualization.arrayToDataTable([
      [ 'x', 'sys C', 'usr C', 'sys C++', 'usr C++'],
      [ 10,     0.00000, 0.00000, 0.00000, 0.00000 ],
      [ 1,      0.00050, 0.00017, 0.00050, 0.00017 ],
      [ 0.1,    0.00350, 0.00150, 0.00000, 0.00067 ],
      [ 0.01,   0.02950, 0.00650, 0.02817, 0.01233 ],
      [ 0.001,  0.10000, 0.02400, 0.10366, 0.01133 ],
      [ 0.0005, 0.15933, 0.03333, 0.16566, 0.03217 ]
    ]);

    var trivialData = google.visualization.arrayToDataTable([
      [ 'x',    'sys C', 'usr C', 'sys C++', 'usr C++'],
      [ 10,     0.00000, 0.00000, 0.00000, 0.00000 ],
      [ 1,      0.00033, 0.00000, 0.00000, 0.00000 ],
      [ 0.1,    0.00333, 0.00100, 0.00000, 0.00467 ],
      [ 0.01,   0.03183, 0.00850, 0.03383, 0.00217 ],
      [ 0.001,  0.10449, 0.00217, 0.09416, 0.00217 ],
      [ 0.0005, 0.16300, 0.00567, 0.16383, 0.00500 ]
    ]);
Clone this wiki locally