[HPC] Proposal: Exclude data movement from timing #507

nvaprodromou · 2023-02-07T17:02:16Z

Introduction:

After collecting feedback from engineers, clients, and press, NVIDIA presented a list of proposals that aim to improve the popularity of the MLPerf HPC benchmark suite. Please see our slide deck for more information on our feedback gathering process and insights.

Proposal: Exclude data movement from timing (start clock after data retrieval, before caching. Same as MLPerf-T).

Slide 14 in proposals slide deck.

This proposal aims to improve the popularity of the MLPerf HPC benchmark suite by improving on the following aspects:

High submission overhead and cost [Affects participation and competition]
Isolates benchmarking of compute from FS [Improves RFP interest]
Simplifies "Throughput" benchmark (renamed from "weak scaling") [Affects participation and competition]
- Benchmark renaming proposal: [HPC] Proposal: Update terminology for clarity #511
- Throughput extrapolation proposal: [HPC] Proposal: Allow throughput extrapolation to large system size #508

Note: We strongly believe that the filesystem is extremely important part and we always advice potential clients to consider the interplay of all parts of a system (FS + compute + network). However, we received a strong signal from some clients that it makes it harder to use the MLPerf-HPC scores for apples-to-apples comparisons, as FS and compute are sometimes not purchased at the same time.

Discussion

Pros:

By far the most common feedback we received was the unreasonably high submission overhead for MLPerf HPC (overhead, cost, engineering resources, system time)
Submitter no longer needs to optimize data movement

Cons:

Reduces the quality of the benchmark since it no longer considers the system as a whole.
This will make it an “upper bound” because of storage not being timed, but MLPerf-T has the same issue.

sparticlesteve · 2023-02-27T15:56:04Z

My comments:

I admit I would be a little disappointed if we exclude data movement because as a group we decided that this was an important piece of the end-to-end performance
However, I recognize that potential users of this benchmark may want to be able to disentangle compute+storage rather than measure them together.
I think the case for this is stronger if MLPerf Storage can fill the gap and characterize HPC storage system performance for our workloads.
One potential upside of this rule change is that we would allow the use of systems that can hide data movement completely from the user, e.g. a burst buffer that is prefilled with the data before the training job starts.
I wonder if we could still enable the (optional?) reporting of the data movement time while excluding it from the final time-to-train metric.
At the very least I think there should be documentation and/or scripts from submitters showing how the data was setup.

memani1 · 2023-02-28T15:46:25Z

I believe it is essential to include the data movement in this benchmark suite to distinguish it from MLPerf training ones since HPC applications typically involve large datasets that stress the I/O. In addition, this can be studied in detail in the storage group in the next year or so.

How about the data movement be included in the time-to-train (strong-scaling mode) and may be excluded in the throughput (weak-scaling mode)?

I agree that we should still report the data movement timing which can be excluded from the final metric.

nvaprodromou · 2023-03-06T18:02:42Z

Differentiating from MLPerf-T should not be sufficient justification for policy development. It is true that this will blur the line between MLPerf-T and MLPerf-HPC even more. But the truth is that the two benchmarks are indeed very similar.

Including data movement (even partly - only for closed division) does not alleviate the high cost of submission, which is the most common feedback we received (by a very wide margin). Please remember that the motivation for these proposals is to increase MLPerf-HPC's popularity and participation.

Having an optional report of data movement timing still adds complexity, this time in parsing the results. Describing data movement optimization strategies in READMEs can be a good middle ground.

The MLPerf-Storage approach might be the best solution and also results in cleanly separated scopes to the various MLCommons suites.

TheKanter · 2023-03-13T15:02:12Z

Hi @nvaprodromou are potential submitters willing to guarantee that they would submit to a version without storage?

I have been pondering and this is a difficult trade-off - fidelity vs. submission quantity.

Is there a way we can de-risk it?

How would we feel if we drop the storage and data movement and then no additional submitters appear?

nvaprodromou · 2023-03-13T18:02:15Z

I don't think we can get any formal guarantees on this. I did ask this question myself as well and I can dig into it some more. But I doubt we'll get any commitments. Even if we do, these are still likely to be NVIDIA submissions, which only solves part of the problem (participation and competition need to rise).

Furthermore, changing the rules by itself is not going to change things. We'll need to run some sort of campaign to "advertise" that (I'm making numbers up:) submissions are now 100x easier than they used to be, results (i.e., return on investment) have a guaranteed lifespan, and (this is primarily for businesses) results are more useful to entities that seek to purchase an HPC system.

Even though no guarantees can be made, easier submissions, guaranteed returns, and a good campaign can't really hurt the existing participation numbers. On the other hand, if we change rules and no additional submitters appear, I would argue we are at the same place we were before: Even though the quality of the benchmark was reduced compared to v2.0, the primary problem still remains to attract participation and competition. We can have a shiny thing few care about, or a less shiny thing few care about.

sparticlesteve · 2023-06-16T19:12:21Z

This was accepted and implemented in the rules, so I think it can be closed now. Correct, @nvaprodromou ?

This was referenced Feb 7, 2023

[HPC] Proposal: Allow throughput extrapolation to large system size #508

Open

[HPC] Proposals to increase popularity with submitters, press, and entities seeking to purchase systems #513

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HPC] Proposal: Exclude data movement from timing #507

[HPC] Proposal: Exclude data movement from timing #507

nvaprodromou commented Feb 7, 2023 •

edited

Loading

sparticlesteve commented Feb 27, 2023

memani1 commented Feb 28, 2023

nvaprodromou commented Mar 6, 2023

TheKanter commented Mar 13, 2023

nvaprodromou commented Mar 13, 2023

sparticlesteve commented Jun 16, 2023

[HPC] Proposal: Exclude data movement from timing #507

[HPC] Proposal: Exclude data movement from timing #507

Comments

nvaprodromou commented Feb 7, 2023 • edited Loading

Introduction:

Proposal: Exclude data movement from timing (start clock after data retrieval, before caching. Same as MLPerf-T).

Slide 14 in proposals slide deck.

Discussion

sparticlesteve commented Feb 27, 2023

memani1 commented Feb 28, 2023

nvaprodromou commented Mar 6, 2023

TheKanter commented Mar 13, 2023

nvaprodromou commented Mar 13, 2023

sparticlesteve commented Jun 16, 2023

nvaprodromou commented Feb 7, 2023 •

edited

Loading