Allow baseline per category #617

mgravell · 2018-01-09T09:11:11Z

Edit: if there's already a way to do this and I'm just being ignorant: please say!

Often I'm trying to compare alternative implementations that can impact multiple scenarios - essentially baseline vs option 1 (vs option 2 etc), but for multiple separate tests A, B, C.

The Baseline = true feature is great, but only really allows a single baseline. If multiple methods are marked as Baseline, then the test runner fails and complains at you.

However, [BenchmarkCategory(...)] exists (via #248). It is currently only used to filter tests to run, but it could be much richer:

the result grids could be split by category
the relative comparisons against baseline could be computed by category

so instead of:

                   Method |         Mean |      Error |     StdDev |      Op/s | Scaled | ScaledSD |    Gen 0 |   Gen 1 |  Allocated |
------------------------- |-------------:|-----------:|-----------:|----------:|-------:|---------:|---------:|--------:|-----------:|
         'single, Stream' |     61.32 us |  1.2102 us |  1.8842 us | 16,309.04 |   1.00 |     0.00 |   2.2583 |  0.0292 |   13.88 KB |
 'single, ReadOnlyBuffer' |     52.98 us |  0.1676 us |  0.1568 us | 18,874.14 |   0.86 |     0.03 |   2.2583 |  0.0042 |   13.88 KB |
          'multi, Stream' | 14,285.07 us | 54.3681 us | 50.8560 us |     70.00 | 233.18 |     6.95 | 564.1667 | 10.0000 | 3470.77 KB |
  'multi, ReadOnlyBuffer' | 13,219.59 us | 34.1087 us | 31.9053 us |     75.65 | 215.79 |     6.41 | 564.1667 |  0.8333 | 3470.76 KB |

we could have:


Category: 'multi'

                   Method |         Mean |      Error |     StdDev |      Op/s | Scaled | ScaledSD |    Gen 0 |   Gen 1 |  Allocated |
------------------------- |-------------:|-----------:|-----------:|----------:|-------:|---------:|---------:|--------:|-----------:|
                 'Stream' |     61.32 us |  1.2102 us |  1.8842 us | 16,309.04 |   1.00 |     0.00 |   2.2583 |  0.0292 |   13.88 KB |
         'ReadOnlyBuffer' |     52.98 us |  0.1676 us |  0.1568 us | 18,874.14 |   0.86 |     0.03 |   2.2583 |  0.0042 |   13.88 KB |

Category: 'single'

                   Method |         Mean |      Error |     StdDev |      Op/s | Scaled | ScaledSD |    Gen 0 |   Gen 1 |  Allocated |
------------------------- |-------------:|-----------:|-----------:|----------:|-------:|---------:|---------:|--------:|-----------:|
                 'Stream' | 14,285.07 us | 54.3681 us | 50.8560 us |     70.00 |   1.00 |     *.** | 564.1667 | 10.0000 | 3470.77 KB |
         'ReadOnlyBuffer' | 13,219.59 us | 34.1087 us | 31.9053 us |     75.65 |   0.93 |     *.** | 564.1667 |  0.8333 | 3470.76 KB |

with Scaled / ScaledSD being relative to the Baseline (if one) in that same category.

(*.** is just where I haven't "done the math" by hand; to be clear: "single" and "multi" here are completely different tests - it isn't just more of the same - naming is hard)

If necessary, this could be an opt-in SplitResultsByCategory feature on custom options. Or it could be implicit: "there's multiple baselines == split by category" (since this won't have worked previously, this can't change existing working behaviour)

The text was updated successfully, but these errors were encountered:

adamsitnik · 2018-01-09T14:14:35Z

I like this idea! @AndreyAkinshin what about you?

AndreyAkinshin · 2018-01-09T18:30:29Z

I think that it's better to introduce a unified way for grouping results. In each group, we should get exactly one baseline.
Here are some use cases for it:

Compare the performance of different methods (current default basline behavior).
Compare the performance of different methods in a category (this issue).
Compare the performance of the same method for different jobs (e.g. .NET Core vs. .NET Framework for the same method).
Compare the performance of the same method for different param values (e.g. I have a CacheStrategy enum property and I would like to compare different strategies for the same method).
A combination of these use cases.

With a "unified grouping approach", it should be easy to write simple API for specific use cases (e.g. SplitResultsByCategory).

A few additional thoughts:

Such "grouping" should be useful not only for baselines but also for ranking
The feature should be combined with OrderProviders (we should be able to order lines inside each group and order the groups)
Grouping should be obvious (when I look at 0.34 in the baseline column, it should be obvious where is the 1.0 value for this line; when I look at 5 in the rank column, it should be obvious where are the 1, 2, 3, 4 which correspond to this line). I like the approach by @mgravell with several tables, but I'm a little concerned that this approach takes too much space. It would be great to keep one table for all results, but I don't know how to make grouping obvious for all users (especially for those who look at the BenchmarkDotNet summary table for the first time).

It's a definitely a very useful feature and it should be easy to implement it (I guess, I need one hour for the first proof-of-concept implementation). The most difficult task here is simple, obvious, and user-friendly API and the summary table presentation.

Any thoughts?

mgravell · 2018-01-09T21:02:08Z

An extra double-line / separator row in a single table may be more than sufficient, to avoid repetitive column headers and other space issues

…

On 9 Jan 2018 6:30 p.m., "Andrey Akinshin" ***@***.***> wrote: I think that it's better to introduce a unified way for grouping results. In each group, we should get exactly one baseline. Here are some use cases for it: 1. Compare the performance of different methods (current default basline behavior). 2. Compare the performance of different methods in a category (this issue). 3. Compare the performance of the same method for different jobs (e.g. .NET Core vs. .NET Framework for the same method). 4. Compare the performance of the same method for different param values (e.g. I have a CacheStrategy enum property and I would like to compare different strategies for the same method). 5. A combination of these use cases. With a "unified grouping approach", it should be easy to write simple API for specific use cases (e.g. SplitResultsByCategory). A few additional thoughts: - Such "grouping" should be useful not only for baselines but also for ranking <http://benchmarkdotnet.org/Configs/Columns.htm#ranking> - The feature should be combined with OrderProviders <http://benchmarkdotnet.org/Configs/OrderProvider.htm> (we should be able to order lines inside each group and order the groups) - Grouping should be obvious (when I look at 0.34 in the baseline column, it should be obvious where is the 1.0 value for this line; when I look at 5 in the rank column, it should be obvious where are the 1, 2, 3, 4 which correspond to this line). I like the approach by @mgravell <https://github.com/mgravell> with several tables, but I'm a little concerned that this approach takes too much space. It would be great to keep one table for all results, but I don't know how to make grouping obvious for all users (especially for those who look at the BenchmarkDotNet summary table for the first time). It's a definitely a very useful feature and it should be easy to implement it (I guess, I need one hour for the first proof-of-concept implementation). The most difficult task here is simple, obvious, and user-friendly API and the summary table presentation. Any thoughts? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#617 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABDsIrPbHEcuFHiYYxkPFT5vDgMd6U7ks5tI7BNgaJpZM4RXhBU> .

AndreyAkinshin · 2018-01-09T22:00:09Z

@mgravell, LGTM.
@adamsitnik, what do you think?

adamsitnik · 2018-01-10T05:40:09Z

@AndreyAkinshin LGTM

Logical group support, fixes #617

SnakyBeaky · 2021-02-25T17:14:46Z

For anyone that might end up in this issue while looking for this functionality, the attribute GroupBenchmarksByAttribute was added and allows multiple baselines (and with CategoriesColumnAttribute too).

For example, this benchmark:

[SimpleJob, GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByCategory), CategoriesColumn]
public class SomeBenchmark
{
    [Benchmark(Baseline = true), BenchmarkCategory("fast")]
    public async Task FastA()
    {
        await Task.Delay(100);
    }
    
    [Benchmark, BenchmarkCategory("fast")]
    public async Task FastB()
    {
        await Task.Delay(150);
    }
    
    [Benchmark(Baseline = true), BenchmarkCategory("slow")]
    public async Task SlowA()
    {
        await Task.Delay(1000);
    }
    
    [Benchmark, BenchmarkCategory("slow")]
    public async Task SlowB()
    {
        await Task.Delay(1500);
    }
}

Will output something like:

| Method | Categories |       Mean |   Error |  StdDev | Ratio | RatioSD |
|------- |----------- |-----------:|--------:|--------:|------:|--------:|
|  FastA |       fast |   110.0 ms | 1.75 ms | 1.64 ms |  1.00 |    0.00 |
|  FastB |       fast |   157.2 ms | 1.50 ms | 1.17 ms |  1.43 |    0.03 |
|        |            |            |         |         |       |         |
|  SlowA |       slow | 1,007.5 ms | 3.67 ms | 3.43 ms |  1.00 |    0.00 |
|  SlowB |       slow | 1,509.3 ms | 4.60 ms | 4.31 ms |  1.50 |    0.01 |

AndreyAkinshin added enhancement discussion labels Jan 9, 2018

AndreyAkinshin self-assigned this Jan 11, 2018

AndreyAkinshin removed the discussion label Jan 11, 2018

AndreyAkinshin added a commit that referenced this issue Jan 11, 2018

Logical group support, fixes #617

c8f2099

AndreyAkinshin added a commit that referenced this issue Jan 11, 2018

Logical group support, fixes #617

a87cfa4

AndreyAkinshin added a commit that referenced this issue Jan 12, 2018

Logical group support, fixes #617

d9c5cde

adamsitnik closed this as completed in 8bb28b3 Jan 12, 2018

adamsitnik added a commit that referenced this issue Jan 12, 2018

Merge pull request #619 from dotnet/logical-groups

ae87c6d

Logical group support, fixes #617

AndreyAkinshin added this to the v0.10.12 milestone Jan 12, 2018

alinasmirnova pushed a commit to alinasmirnova/BenchmarkDotNet that referenced this issue Sep 22, 2018

Logical group support, fixes dotnet#617

792bfd4

khellang mentioned this issue May 24, 2019

RunAllJoined should consider baseline per type #1164

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow baseline per category #617

Allow baseline per category #617

mgravell commented Jan 9, 2018 •

edited

Loading

adamsitnik commented Jan 9, 2018

AndreyAkinshin commented Jan 9, 2018

mgravell commented Jan 9, 2018 via email

AndreyAkinshin commented Jan 9, 2018

adamsitnik commented Jan 10, 2018

SnakyBeaky commented Feb 25, 2021

Allow baseline per category #617

Allow baseline per category #617

Comments

mgravell commented Jan 9, 2018 • edited Loading

adamsitnik commented Jan 9, 2018

AndreyAkinshin commented Jan 9, 2018

mgravell commented Jan 9, 2018 via email

AndreyAkinshin commented Jan 9, 2018

adamsitnik commented Jan 10, 2018

SnakyBeaky commented Feb 25, 2021

mgravell commented Jan 9, 2018 •

edited

Loading