Performance Measurement/Tracking #445

jjhursey · 2016-06-30T00:16:15Z

We should investigate better performance benchmark integration into the new MTT infrastructure. Performance regressions are hard to see currently and automated tracking of this will give us better visibility of performance issue when commits happen instead of when we are ramping up to release.

See open-mpi/ompi#1831 for one case where this would be useful for Open MPI.

We need to discuss how to store the data, and how we can organize the DB structure and REST interface to make accessing apples-to-apples comparisons. We looked at it in the past, and it's harder than one might think.

gpaulsen · 2016-08-31T14:22:34Z

At the face to face, we recommended not trusting "stored away" numbers, and instead run both an old version and new version in sequence, to try to mitigate cluster changes, environmental changes, and other temporal abnormalities in the data.
I'd recommend a process of:

BUILD OLD
BUILD NEW
BUILD TEST with OLD
RUN TEST with OLD runtime
RUN TEST with NEW runtime
compare performance results.

Storing these performance diffs in the database might be more useful than storing raw results.
And if we don't recompile the test application, we might discover any binary compatible bugs introduced before OLD and NEW.

jsquyres · 2016-09-08T18:14:25Z

👍 on what @gpaulsen said. Except I'd make the backwards compatibility tests be separate from these performance tests (because the performance characteristics may/will be desirable to test over a longer period of time than our backwards compatibility guarantees).

Some possible requirements for the performance testing:

Let's start with 3 easy benchmarks: latency, bandwidth, and message injection rates.
- Each of these has a simple performance curve of time/Y vs. message size/X.
- Disk space is cheap: I'd store the individual X/Y data points, not just the difference between OLD/NEW.
The comparison of OLD vs. NEW can be a simple subtraction:
- For latency: calculate (NEW_y - OLD_y) for each MESSAGE_SIZE_x. If any value is greater than Z% of OLD_y (where Z% is TBD/parameter of the test checker), FAIL the test.
- For bandwidth and message rate: calculate (OLD_y - NEW_y) for each MESSAGE_SIZE_x. If any value is greater than Z% of OLD_y (where Z% is TBD/parameter of the test checker), FAIL the test.

We as a community just need to determine the versions of OLD that we want to compare against.

rhc54 · 2016-09-09T04:07:35Z

Let's also remember that we have plugin support in the new MTT. So there is no problem creating a plugin that compares against some stored "good" measurement, and another that does old vs new, and another that does what someone wants for their own purposes. If we write the plugins intelligently so data retrieval can be shared code, then it will be relatively easy to add new comparison algorithms.

jjhursey · 2016-09-14T16:25:44Z

Below are some thoughts...

Running Tests

Ability to run in the following modes (version could be a release tar ball, or git hash):

Run perf test for the currently installed build only
- Use case: single data point, developer point test during development
Run perf test for version A only
- Use case: single data point, bisect though history
Run perf test for the currently installed build and version A
- Use case: developer progress test against a baseline version
Run perf test for version X and version Y
- Use case: Compare delta between two points in time

Collecting/Reporting Data

Raw data is collected and archived by client
- Use plugins to determine set of archive methods (e.g., XML, JSON, local/remote database(s), custom format)
Option to push data to the server to share more broadly.
- This must be able to be a separate step that is run manually after local review of the results.
- Allowed to be part of the automated process.
Data is going to be specific to a particular configuration, so we need to make clear in the reporting the:
- System configuration (e.g., arch, network, ...)
  - User should set a description for the system - human readable
  - Maybe also add additional discovery via config.log and hwloc?
  - Would need any device/hw specific configurations too
- Build configuration
  - User should set a description for this - human readable
  - Maybe also pull the configure results...
- Runtime configuration
  - User should set a description for this - human readable
  - We need the command line, plus any environment variables, binding, ...
- I think a human readable configuration line would help in understanding the testing environment. We will not be able to automatically discover everything that is necessary to report.
Ability to delete/hide/promote results that are pushed to the DB.
- So we can remove known bad results
- So we can easily share useful comparisons. (current permalinks mechanism could be used here)

Rendering Data

Each perf test has a description of how the data should be organized/compared
Comparison represented in tabular format
Comparison represented in graph format
Comparison can be rendered entirely on the client side
Alarm / Flag option
- When a particular comparison is out-of-normal-range then it is flagged in an obvious way.

Other notes

We need to decide on a set of useful perf tests. Start small and gradually grow from there.
Must have the option to be selective in what is reported where.
- Sometimes developers want frequent status updates on the performance impact
- Some performance numbers should not be made public.
A useful rendering might be
- Every week show me the performance difference between:
  - Release vX.Y.Z and the current HEAD of master
  - Release vX.Y.Z and the current HEAD of release branch
- Ability to integrate git bisect to find where in history performance changed. This might get tricky...

jjhursey · 2021-03-25T15:50:25Z

This would be great to do one day if someone is interested in tinkering with it.

jjhursey added enhancement Perl Client Python Client Reporter Database Server labels Jun 30, 2016

jjhursey added this to the v4.0 milestone Jun 30, 2016

jjhursey self-assigned this Jun 30, 2016

jjhursey mentioned this issue Jun 30, 2016

Significant degradation in message rates observed on Master. open-mpi/ompi#1831

Closed

jjhursey removed their assignment Mar 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Measurement/Tracking #445

Performance Measurement/Tracking #445

jjhursey commented Jun 30, 2016

gpaulsen commented Aug 31, 2016 •

edited by jsquyres

Loading

jsquyres commented Sep 8, 2016

rhc54 commented Sep 9, 2016

jjhursey commented Sep 14, 2016

jjhursey commented Mar 25, 2021

Performance Measurement/Tracking #445

Performance Measurement/Tracking #445

Comments

jjhursey commented Jun 30, 2016

gpaulsen commented Aug 31, 2016 • edited by jsquyres Loading

jsquyres commented Sep 8, 2016

rhc54 commented Sep 9, 2016

jjhursey commented Sep 14, 2016

Running Tests

Collecting/Reporting Data

Rendering Data

Other notes

jjhursey commented Mar 25, 2021

gpaulsen commented Aug 31, 2016 •

edited by jsquyres

Loading