Skip to content
Paul Colby edited this page Dec 18, 2013 · 15 revisions

Disclaimer

The benchmarks on this page are not intended to be definitive. They are intended to show the order-of-magnitude of the overhead associated with using PMDA++ over PCP's C API. If performance is absolutely critical for you (you're working with very high-speed sampling of hardware devices, or monitoring highly constrained devices?) then you should perform your own benchmarking on the relevant device(s) / platform(s).

Methodology

Benchmarking was performed by a basic benchmark.sh script, which works as follows:

  1. Start a pmie instance to sample a specified metric (such as trivial.time) from the PMDA-under-test, at a specified rate (such as once per millisecond, or 1KHz).
  2. Use pmval to fetch the PMDA's user or system time over a specified period (such as 10 seconds).
  3. Stop the pmie instance.

The above process is executed for both the C and C++ versions of the PMDA being tested to compare the overhead of the C++ wrapper over the underlying C API.

Note, depending on your version of PCP, and its access controls, you may need to run the benchmark script as root, or some other user with permission to monitor the PMDA-under-test via the proc PMDA.

Results

The following charts show the system and user time consumed by both the C and C++ (PMDA++) versions of the relevant metric, using sample intervals from 10 seconds down to 0.5 milliseconds.

simple now simple numfetch trivial time

These graphs show that the performance overhead (if any) of using PMDA++ over the standard PCP API is less than the variations caused by other services running on the test machine. If someone has access to a really consistent, idle, spare machine and would like to do some longer running tests (these were 60 second runs), then that would be great.

Other notes:

  • sampling metrics at once every half a millisecond (0.0005 seconds) only just got the system time up to ~0.16%. At this rate, pmcd was consuming roughly 50% of one core.
  • when sampling faster than once per 0.0005 resulted in pmie reporting issues with clock skew.
  • PCP really is quite lean.

Chart Data

    // Create and populate the data table.
    var simpleNowData = google.visualization.arrayToDataTable([
      [ 'x', 'sys C', 'usr C', 'sys C++', 'usr C++'],
      [ 10,     0.00000, 0.00000, 0.00000, 0.00000 ],
      [ 1,      0.00050, 0.00000, 0.00033, 0.00000 ],
      [ 0.1,    0.00267, 0.00000, 0.00383, 0.00433 ],
      [ 0.01,   0.03233, 0.00650, 0.03450, 0.00683 ],
      [ 0.001,  0.09866, 0.02250, 0.10400, 0.01650 ],
      [ 0.0005, 0.17016, 0.03800, 0.16849, 0.02817 ]
    ]);

    // Create and populate the data table.
    var simpleNumfetchData = google.visualization.arrayToDataTable([
      [ 'x', 'sys C', 'usr C', 'sys C++', 'usr C++'],
      [ 10,     0.00000, 0.00000, 0.00000, 0.00000 ],
      [ 1,      0.00050, 0.00017, 0.00050, 0.00017 ],
      [ 0.1,    0.00350, 0.00150, 0.00000, 0.00067 ],
      [ 0.01,   0.02950, 0.00650, 0.02817, 0.01233 ],
      [ 0.001,  0.10000, 0.02400, 0.10366, 0.01133 ],
      [ 0.0005, 0.15933, 0.03333, 0.16566, 0.03217 ]
    ]);

    var trivialData = google.visualization.arrayToDataTable([
      [ 'x',    'sys C', 'usr C', 'sys C++', 'usr C++'],
      [ 10,     0.00000, 0.00000, 0.00000, 0.00000 ],
      [ 1,      0.00033, 0.00000, 0.00000, 0.00000 ],
      [ 0.1,    0.00333, 0.00100, 0.00000, 0.00467 ],
      [ 0.01,   0.03183, 0.00850, 0.03383, 0.00217 ],
      [ 0.001,  0.10449, 0.00217, 0.09416, 0.00217 ],
      [ 0.0005, 0.16300, 0.00567, 0.16383, 0.00500 ]
    ]);
Clone this wiki locally