Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results Comparer #165

Merged
merged 11 commits into from
Dec 15, 2018
Merged

Results Comparer #165

merged 11 commits into from
Dec 15, 2018

Conversation

adamsitnik
Copy link
Member

Results Comparer

This simple tool allows for easy comparison of provided benchmark results.

It can be used to compare:

  • historical results (eg. before and after my changes)
  • results for different OSes (eg. Windows vs Ubuntu)
  • results for different CPU architectures (eg. x64 vs ARM64)
  • results for different target frameworks (eg. .NET Core 2.1 vs 2.2)

All you need to provide is:

  • --base - path to folder/file with baseline results
  • --diff - path to folder/file with diff results
  • --treshold - threshold for Statistical Test. Examples: 5%, 10ms, 100ns, 1s

Optional arguments:

  • --top - filter the diff to top/bottom N results

Sample: compare the results stored in C:\results\windows vs C:\results\ubuntu using 1% threshold and print only TOP 10.

dotnet run --base "C:\results\windows" --diff "C:\results\ubuntu" --treshold 1% --top 10

Note: the tool supports only *full.json results exported by BenchmarkDotNet. This exporter is enabled by default in this repository.

Note: if you have run your benchmarks for multiple jobs (eg. -r netcoreapp2.1 netcoreapp2.2) and you want to compare these historical results you can use --merge to point to the folder/file with such results.

Reading the results

Sample results:

SLOWER:
-30,35% BenchmarksGame.SpectralNorm_3.RunBench [can have several modes]
-5,45% BenchmarksGame.KNucleotide_9.RunBench [bimodal]
-4,27% BenchmarksGame.BinaryTrees_2.RunBench
-2,42% BenchmarksGame.MandelBrot_7.Bench(size: 4000, lineLength: 500, checksum: "C7-E6-66-43-66-73-F8-A8-D3-B4-D7-97-2F-FC-A1-D3")

FASTER:
3,59% BenchmarksGame.FannkuchRedux_5.RunBench(n: 10, expectedSum: 38)

Explanation:

  • if there is no difference, the results are omitted
  • if there is no match (we use full benchmark names to match the benchmarks), the results are omitted
  • every line contains:
    • ratio (1.0 - diffMedian / baseMedian)
    • id (full benchmark name, the same we use in BenchView)
    • optional information about the modality of the benchmark

@brianrob
Copy link
Member

brianrob commented Dec 4, 2018

@adamsitnik, will take a look. Also, I suspect that @billwert will want to review as well, but he's OOF today.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't handle the files I generated recently...

Unhandled Exception: Newtonsoft.Json.JsonSerializationException: Error converting value {null} to type 'System.Int32'. Path 'HostEnvironmentInfo.PhysicalProcessorCount', line 8, position 35. ---> System.InvalidCastException: Null object cannot be converted to a value type.
   at System.Convert.ChangeType(Object value, Type conversionType, IFormatProvider provider)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.EnsureType(JsonReader reader, Object value, CultureInfo culture, JsonContract contract, Type targetType) in /_/Src/Newtonsoft.Json/Serialization/JsonSerializerInternalReader.cs:line 982

not sure if this because I was using preliminary arm64 bits or something else. But it would be nice to catch this sort of error and indicate which file is problematic.

@AndyAyersMS
Copy link
Member

Looks like my arm64 files have some null entries:

{
   "Title":"BenchmarksGame.BinaryTrees_2",
   "HostEnvironmentInfo":{
      "BenchmarkDotNetCaption":"BenchmarkDotNet",
      "BenchmarkDotNetVersion":"0.11.3.886-nightly",
      "OsVersion":"ubuntu 16.04",
      "ProcessorName":"Unknown processor",
      "PhysicalProcessorCount":null,
      "PhysicalCoreCount":null,
      "LogicalCoreCount":null,
      "RuntimeVersion":".NET Core 3.0.0-preview-27122-01 (CoreCLR 4.6.27121.03, CoreFX 4.7.18.57103)",
      "Architecture":"64bit",
      "HasAttachedDebugger":false,

@adamsitnik
Copy link
Member Author

@AndyAyersMS I have updated the code, could you try it one more time?

public class CommandLineOptions
{
[Option("base", HelpText = "Path to the folder/file with base results.")]
public string BasePath { get; set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should perform input validation here. For example,

{
  get => _base;
  set
  {
    if (string.IsNullOrWhiteSpace(value))
      throw new ArgumentException("some message");
    if (!Directory.Exists(value))
      throw new DirectoryNotFoundException("some message");
    // maybe check that the directory has the right files?
    _base = value;
  }
}

[Option("diff", HelpText = "Path to the folder/file with diff results.")]
public string DiffPath { get; set; }

[Option("merged", HelpText = "Path to the folder/file with results merged for multiple jobs in the same file.")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the output directory? If so, it should be stated in the help string.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure how to call it (naming..)

So when we run the Benchmarks with --runtimes netcoreapp2.1 netcoreapp2.2 BDN is going to create one json file with the results for both 2.1 and 2.2 inside.

This option allows to compare the perf for such files (results for few jobs are merged into one file)

@jorive I am open to better name suggestions ;p

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name is fine - maybe clearer help text explaining it is for single runs of BDN with many runtimes. "merged" suggests I did something to merge them before hand which isn't the case, right?

.Select(resultFile => JsonConvert.DeserializeObject<BdnResult>(File.ReadAllText(resultFile)))
.SelectMany(result => result.Benchmarks)
.GroupBy(result => result.FullName)
.SelectMany(sameKey => sameKey
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove extra spaces.

@AndyAyersMS
Copy link
Member

Thanks, it's working on my data now:

SLOWER:
-67880.50% BenchmarksGame.SpectralNorm_3.RunBench [can have several modes]
-8524.79% System.Memory.ReadOnlySpan.IndexOfString(input: "?", value: "?", comparisonType: InvariantCulture)
-2063.05% System.Memory.ReadOnlySpan.GetPinnableReference
-1638.45% System.Memory.ReadOnlySpan.StringAsSpan [bimodal]
-980.13% PerfLabTests.CastingPerf2.CastingPerf.ObjInt
-840.98% PerfLabTests.CastingPerf2.CastingPerf.ObjObjrefValueType
-839.70% PerfLabTests.CastingPerf2.CastingPerf.ObjScalarValueType
-809.60% PerfLabTests.LowLevelPerf.GenericClassWithSTringGenericInstanceMethod
-799.23% Functions.MathTests.PowSingleBenchmark
-768.68% System.Threading.Tests.Perf_Interlocked.Decrement_int

FASTER:
78.23% BenchmarksGame.ReverseComplement_6.RunBench
64.30% BenchmarksGame.MandelBrot_7.Bench(size: 4000, lineLength: 500, checksum: "C7-E6-66-43-66-73-F8-A8-D3-B4-D7-97-2F-FC-A1-D3") [can have several modes]
42.61% Burgers.Test1
41.78% Burgers.Test0
30.05% Benchstone.BenchI.Pi.Test
13.01% System.Memory.ReadOnlySpan.IndexOfString(input: "5555555555", value: "5", comparisonType: InvariantCulture)
11.28% System.Memory.ReadOnlySpan.IndexOfString(input: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX", value: "x", comparisonType: InvariantCultureIgnoreCase)

Couple of things I'd like to see:

  • ratios instead of percentages
  • option to show base and diff values as well
  • option for csv or other structured output so I can do further analysis on the distribution of ratios

@jorive
Copy link
Member

jorive commented Dec 4, 2018

Maybe we should have two more options:
--format output format: md, csv, etc.
--comparison-output [ base/diff | diff/base | percentage] ?

if (noiseResult.Conclusion == EquivalenceTestConclusion.Same)
continue;

var ratio = (1.0 - pair.diffResult.Statistics.Median / pair.baseResult.Statistics.Median);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ratio instead of percentage delta? I think all of our other reporting tools typically report delta, not ratio.

[Option("diff", HelpText = "Path to the folder/file with diff results.")]
public string DiffPath { get; set; }

[Option("merged", HelpText = "Path to the folder/file with results merged for multiple jobs in the same file.")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name is fine - maybe clearer help text explaining it is for single runs of BDN with many runtimes. "merged" suggests I did something to merge them before hand which isn't the case, right?

@billwert
Copy link
Member

billwert commented Dec 4, 2018

@adamsitnik what's the intended use of this tool? Is this designed for devs to just quickly iterate on their own or is this intended to be part of the reporting infrastructure of the performance automation system?

@adamsitnik
Copy link
Member Author

Is this designed for devs to just quickly iterate on their own or is this intended to be part of the reporting infrastructure of the performance automation system?

one thing is Windows vs Ubuntu or x64 vs ARM64

the other is: I run the benchmarks before applying any changes and save the results somewhere, then I apply the changes and save them to other location and compare the perf.

@adamsitnik
Copy link
Member Author

adamsitnik commented Dec 4, 2018

ratios instead of percentages

@AndyAyersMS would --display ratio|%|delta be OK?

option to show base and diff values as well

ok, then I most probably need to introduce a table

option for csv or other structured output so I can do further analysis on the distribution of ratios

@AndyAyersMS would sth like following simple JSON be enough?

{
    "Benchmarks":[
    {
        "FullName": "BenchmarksGame.BinaryTrees_2.RunBench",
        "Base": [ 1, 2, 3, 4, 5],
        "Diff": [ 1, 2, 3, 4, 5],
        "Conclusion": "Same"
    },
    {
        "FullName": "BenchmarksGame.BinaryTrees_5.RunBench",
        "Base": [ 1, 2, 3, 4, 5],
        "Diff": [ 2, 3, 4, 4, 5],
        "Conclusion": "Slower"
    } ]
}

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Dec 4, 2018

would --display ratio|%|delta be OK?

Yes

would sth like following simple JSON be enough?

Ideally something that I can easily import into Excel or R ... I would think CSV would be the simplest. It looks like Excel can import JSON but it wasn't obvious to me how to get what I wanted.

@jorive
Copy link
Member

jorive commented Dec 5, 2018

@adamsitnik Should we add *.sln to .gitignore?

@adamsitnik
Copy link
Member Author

Ideally something that I can easily import into Excel or R ... I would think CSV would be the simplest.

@AndyAyersMS would sth like this be OK?

Conclusion;Id;Values
Base;System.Memory.Span.SomeMethod;0.01;0.02;0.03;
Slower;System.Memory.Span.SomeMethod;0.1;0.2;0.3;

@billwert
Copy link
Member

billwert commented Dec 5, 2018

one thing is Windows vs Ubuntu or x64 vs ARM64
the other is: I run the benchmarks before applying any changes and save the results somewhere, then I apply the changes and save them to other location and compare the perf.

@adamsitnik sure, makes sense. I'm more curious about whether or not this is something we'd want to incorporate into our automation strategy in general (think reporting), because modularizing it would be nice in that case. We can cross this bridge later however.

One other question: Why is this stand alone and not a mode in Benchmark.NET?

@jorive
Copy link
Member

jorive commented Dec 5, 2018

One other question: Why is this stand alone and not a mode in Benchmark.NET?

@billwert personally, I think it's better to have tools that build on top of each other and that perform one task very well, instead of a monolithic tool that becomes too bloated and complex to maintain (the linux way some would say?). This way you could think of benchmark/runner/reporter

@adamsitnik
Copy link
Member Author

Ok, I have added the export to a table and CSV. The table is GH markdown friendly, can be copy-pasted from the console to GH directly.

To keeps things simple I had also:

  • removed the --merged option - I am 100% sure I would be the only user of it, it would cause more confusion that good
  • instead of % I show base/diff for improvements (the bigger the better) and diff/base for regressions (the bigger the worse)
Slower diff/base Base Median (ns) Diff Median (ns) Modality
PerfLabTests.BlockCopyPerf.CallBlockCopy(numElements: 100) 1.60 9.22 14.76
System.Tests.Perf_String.Trim_CharArr(s: "Test", c: [' ', ' ']) 1.41 6.18 8.72
Faster base/diff Base Median (ns) Diff Median (ns) Modality
System.Tests.Perf_Array.ArrayCopy3D 1.31 372.71 284.73

@adamsitnik
Copy link
Member Author

Sample results:

Slower diff/base Base Median (ns) Diff Median (ns) Modality
Burgers.Test1 11.93 267235588.00 3187167559.50
System.Globalization.Tests.Perf_CompareInfo.IsSortable(text: "Hello Worldbbbbbbbbbbbbbbbbbbbbbbbbbbb 11.63 1115.77 12974.74
System.Globalization.Tests.Perf_CompareInfo.IsSortable(text: "More Test's") 11.24 68.36 768.48
System.Globalization.Tests.Perf_CompareInfo.IsSortable(text: "Exhibit A") 11.10 56.58 628.19
System.Globalization.Tests.Perf_CompareInfo.IsSortable(text: "TestFooBA`RnotsolongTELLme") 11.00 163.97 1804.41
System.Numerics.Tests.Perf_Vector4.DotBenchmark 10.13 1.50 15.16
System.Globalization.Tests.Perf_CompareInfo.IsSortable(text: "foo") 9.86 21.21 209.16
System.Globalization.Tests.Perf_CompareInfo.IsSortable(text: "$") 7.57 9.46 71.62
System.Globalization.Tests.Perf_CompareInfo.IsSortable(text: "?") 7.45 9.45 70.45
Burgers.Test0 7.43 440849666.00 3275757908.00
System.Numerics.Tests.Perf_Vector4.AddFunctionBenchmark 6.21 1.82 11.30
System.Numerics.Tests.Perf_Vector4.AddOperatorBenchmark 6.07 1.82 11.07
FractalPerf.Launch.Test 5.75 174073993.00 1000937680.50
Benchstone.BenchI.Array2.Test 4.80 501337097.00 2405957355.00
Benchstone.BenchI.BubbleSort2.Test 2.94 35372702.64 103894135.00 can have several modes
System.Tests.Perf_Double.ToStringWithFormat(format: "R", number: 1.79769313486232E+308, innerIterati 2.92 372.24 1088.56
System.Tests.Perf_Double.ToStringWithFormat(format: "R", number: -1.79769313486232E+308, innerIterat 2.90 375.55 1088.52
SciMark2.kernel.benchSparseMult 2.68 767957871.00 2054656061.50
Burgers.Test2 2.58 267264201.00 689646694.00
Benchstone.BenchI.IniArray.Test 2.54 112425878.50 285823839.00
Benchstone.BenchF.BenchMrk.Test 2.43 184005617.00 446363534.50
System.Net.Http.Tests.SocketsHttpHandlerPerfTest.Get(ssl: True, chunkedResponse: False, responseLeng 2.41 1620154.00 3910963.00
System.Net.Http.Tests.SocketsHttpHandlerPerfTest.Get(ssl: True, chunkedResponse: True, responseLengt 2.38 1664260.00 3955205.50
SeekUnroll.Test(boxedIndex: 27) 2.32 2416673122.00 5597261928.00
BenchmarksGame.FannkuchRedux_2.RunBench(n: 10, expectedSum: 73196) 2.20 171421112.00 377649577.00
System.Diagnostics.Perf_Process.StartAndKillDotNetVersion 2.19 723188.79 1584806.65
System.Tests.Perf_Array.ArrayCreate1D 2.18 1066.30 2324.38
System.Tests.Perf_Array.ArrayCreate3D 2.17 1191.00 2580.83
Benchstone.BenchF.Lorenz.Test 2.13 273039368.00 581460525.00
SeekUnroll.Test(boxedIndex: 19) 2.11 2368672138.00 5004807345.00
System.Collections.CtorGivenSizeNonGeneric.Hashtable(Size: 512) 2.11 1249.62 2632.27
System.Tests.Perf_Array.ArrayCreate2D 2.08 1192.54 2483.94
SeekUnroll.Test(boxedIndex: 11) 2.04 2453818595.00 5004047922.00
System.Tests.Perf_StringBuilder.StringBuilderAppend 1.94 196651.17 381427.68
BenchmarksGame.Mandelbrot_2.Bench(width: 4000, checksum: "C7-E6-66-43-66-73-F8-A8-D3-B4-D7-97-2F-FC- 1.93 1353477955.00 2609159425.00
Benchstone.BenchF.Romber.Test 1.82 167909839.00 305344466.50
System.Memory.ReadOnlySpan.StringAsSpan 1.71 6.70 11.44
Faster base/diff Base Median (ns) Diff Median (ns) Modality
System.Tests.Perf_String.Contains(size: 1000) 16.01 370.07 23.12
System.Memory.ReadOnlySpan.IndexOfString(input: "Hello Worldbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbareally 8.42 293.70 34.89
System.Globalization.Tests.Perf_CompareInfo.IndexOf(culture: en-US, source: "Hello Worldbbbbbbbbbbbb 7.58 260.58 34.36
XmlDocumentTests.XmlDocumentTests.Perf_XmlDocument.Create 4.15 2446.32 589.30
System.Tests.Perf_String.IndexOf(options: Ordinal) 4.09 15972.11 3901.83
System.Memory.ReadOnlySpan.IndexOfString(input: "Hello Worldbbbbbbbbbbbbbbcbbbbbbbbbbbbbbbbbbba!", v 3.90 108.62 27.82
System.Tests.Perf_Boolean.Parse_str 3.20 56.32 17.61
System.Globalization.Tests.Perf_CompareInfo.IsPrefix(culture: en-US, source: "StrIng", prefix: "str" 3.14 32.82 10.46
System.Tests.Perf_String.Contains(text: "This is a very nice sentence", value: "bad", comparisonType 2.98 45.03 15.12
System.Globalization.Tests.Perf_CompareInfo.IndexOf(culture: es-ES, source: "Hello Worldbbbbbbbbbbbb 2.88 80.92 28.14
System.Globalization.Tests.Perf_CompareInfo.IsPrefix(culture: ja-JP, source: "XXXXXXXXXXXXXXXXXXXXXX 2.72 29.56 10.87
System.Globalization.Tests.Perf_CompareInfo.IsPrefix(culture: , source: "XXXXXXXXXXXXXXXXXXXXXXXXXXX 2.64 29.05 11.02
System.Net.Primitives.Tests.CredentialCacheTests.GetCredential_HostPort(host: "name5", hostPortCount 2.45 230.66 94.31
System.Tests.Perf_String.Compare(strings: ["Thé quick brown fox", "Thé quick BROWN fox"], comparison 2.41 15.87 6.58
System.Tests.Perf_String.GetHashCode(s: "") 2.38 5.57 2.34
System.Net.Primitives.Tests.CredentialCacheTests.GetCredential_HostPort(host: "notfound", hostPortCo 2.36 183.45 77.86
System.Tests.Perf_UInt64.Parse(value: "18446744073709551615") 2.34 159.40 68.21
System.Tests.Perf_String.Compare(strings: ["The quick brown fox", "THE QUICK BROWN FOX"], comparison 2.28 13.43 5.90
System.Tests.Perf_Int64.Parse(value: "9223372036854775807") 2.21 150.47 68.17
System.Tests.Perf_Int64.Parse(value: "-9223372036854775808") 2.13 151.89 71.18
System.Globalization.Tests.Perf_CompareInfo.IsPrefix(culture: , source: "5555555555", prefix: "AAAAA 2.06 16.87 8.20
System.Globalization.Tests.Perf_CompareInfo.IsPrefix(culture: es-ES, source: "?????????????????????? 2.05 16.65 8.13
System.Net.Primitives.Tests.IPAddressPerformanceTests.TryFormat(address: 143.24.20.36) 2.02 72.92 36.01
System.Tests.Perf_UInt32.Parse(value: "4294967295") 1.95 109.66 56.11
System.Globalization.Tests.Perf_CompareInfo.IsPrefix(culture: , source: "foobardzsdzs", prefix: "Foo 1.90 50.68 26.73
System.Tests.Perf_String.ToLowerInvariant(s: "test") 1.89 22.71 12.04
System.Tests.Perf_Guid.ctor_str 1.88 328.81 174.58
System.Tests.Perf_String.Contains(size: 100) 1.88 192.41 102.19
System.Tests.Perf_String.GetHashCode(s: "TeSt!") 1.87 6.76 3.61
System.Tests.Perf_Int32.Parse(value: "-2147483648") 1.86 110.69 59.52
System.Globalization.Tests.Perf_CompareInfo.IsPrefix(culture: es-ES, source: "Hello Worldbbbbbbbbbbb 1.84 21.31 11.55
System.Tests.Perf_Int32.Parse(value: "2147483647") 1.83 108.12 59.12
PerfLabTests.LowLevelPerf.GenericGenericMethod 1.81 323704.26 178350.41
System.Tests.Perf_UInt64.Parse(value: "12345") 1.79 91.38 51.08
LinqBenchmarks.Where01LinqQueryX 1.78 692132246.00 388216530.50
System.Tests.Perf_Int64.Parse(value: "12345") 1.78 90.71 50.90
System.Memory.ReadOnlySpan.IndexOfString(input: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 1.78 62.91 35.37
LinqBenchmarks.Where01LinqMethodX 1.78 692317830.00 389405790.00
System.Globalization.Tests.Perf_CompareInfo.IsSuffix(culture: es-ES, source: "?????????????????????? 1.75 22.64 12.91
System.Memory.ReadOnlySpan.IndexOfString(input: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 1.75 40.40 23.05
System.Memory.ReadOnlySpan.IndexOfString(input: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 1.75 40.38 23.06
System.Memory.ReadOnlySpan.IndexOfString(input: "??????????????????????????????????????????????????? 1.73 39.82 23.06
LinqBenchmarks.Where01LinqMethodNestedX 1.72 777704640.00 452272892.00
PerfLabTests.LowLevelPerf.EmptyStaticFunction5Arg 1.72 3112353.87 1814303.51
System.Tests.Perf_Int32.Parse(value: "12345") 1.71 87.02 51.00
System.Net.Primitives.Tests.IPAddressPerformanceTests.GetAddressBytes(address: 143.24.20.36) 1.70 19.81 11.63
System.Net.Http.Tests.SocketsHttpHandlerPerfTest.Get(ssl: False, chunkedResponse: False, responseLen 1.69 205232.42 121376.53

return null;
}

private static double GetRatio(EquivalenceTestConclusion conclusion, Benchmark baseResult, Benchmark diffResult)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my comment got lost in the refactor - why are we reporting straight ratios instead of relative percentages? Typically I expect to see this calculation look something like (diff-base)/base (in the case where a higher number is better.) I'd also like to see slower (regressions) represented as negative deltas instead of reversing the calculation as is done here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that base/diff, diff/base, and diff-base)/base is very subjective. For example: your preferences are different than @AndyAyersMS which are also different than @stephentoub ;)

What I have learned in BDN is that users want to customize everything, but it typically adds too much complexity to the code.

For example here to keep everyone happy I would need to introduce a new console argument, add docs for it and handle all cases in sorting the results, formatting them and aligning in the table. I don't have time for it, but I would be happy to review a PR if somebody is willing to implement it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's that subjective. Using straight ratios leads to weird things sometimes - going from 10 to 8 by your method would show as 1.25, but I would think of as a -20% regression from 10. Reversing the terms so that smaller ratios are "worse", always dividing by the base, results in 0.8, which is also a non-obvious way to present the data but closer to something that makes sense in terms of how a data point relates to the previous data point. It gets weird in another way when you invert and higher numbers are worse - imagine a working set measurement going from 1235 pages loaded to 2342 pages. Ratio would tell us it is 1.9, while it is a -47% regression.

For this very specific purpose it may not matter much, but I think there is value in a consistent method for reporting data like this which makes sense across contexts. Given that this tool is not likely to be used for things other than manual investigations we can let it lie, but I'd like to take this back up in a more global sense as we move forward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like I will have to polish up my newsletter explaining all the ways ratios are vastly superior to other comparative measures...

More importantly, though: I always prefer to see things reported as diff/base, whether as a percentage or ratio or whatnot, so a single column sort can order things and we can plot distributions without having to do extra math.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@billwert Different developers have different preferences when it comes down to how their data is represented. In the past developers have asked for base/diff, diff/base, diff-base/base, etc. We should just make sure that we are consistent and provide transparent w.r.t. the way output data is presented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndyAyersMS I would like to read that newsletter. (Though I don't quite understand how ratios achieve sorting in ways that deltas do not.)

@jorive

We should just make sure that we are consistent and provide transparent w.r.t. the way output data is presented.

Agreed. I'm trying to identify which way that should go. :)

@adamsitnik
Copy link
Member Author

@AndyAyersMS if you don't have any more feature requests could you please accept this PR? I need at least 1 approving review to merge it and all other perf folks are already enjoying their holidays ;)

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants