-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Results Comparer #165
Results Comparer #165
Conversation
@adamsitnik, will take a look. Also, I suspect that @billwert will want to review as well, but he's OOF today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't handle the files I generated recently...
Unhandled Exception: Newtonsoft.Json.JsonSerializationException: Error converting value {null} to type 'System.Int32'. Path 'HostEnvironmentInfo.PhysicalProcessorCount', line 8, position 35. ---> System.InvalidCastException: Null object cannot be converted to a value type.
at System.Convert.ChangeType(Object value, Type conversionType, IFormatProvider provider)
at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.EnsureType(JsonReader reader, Object value, CultureInfo culture, JsonContract contract, Type targetType) in /_/Src/Newtonsoft.Json/Serialization/JsonSerializerInternalReader.cs:line 982
not sure if this because I was using preliminary arm64 bits or something else. But it would be nice to catch this sort of error and indicate which file is problematic.
Looks like my arm64 files have some null entries: {
"Title":"BenchmarksGame.BinaryTrees_2",
"HostEnvironmentInfo":{
"BenchmarkDotNetCaption":"BenchmarkDotNet",
"BenchmarkDotNetVersion":"0.11.3.886-nightly",
"OsVersion":"ubuntu 16.04",
"ProcessorName":"Unknown processor",
"PhysicalProcessorCount":null,
"PhysicalCoreCount":null,
"LogicalCoreCount":null,
"RuntimeVersion":".NET Core 3.0.0-preview-27122-01 (CoreCLR 4.6.27121.03, CoreFX 4.7.18.57103)",
"Architecture":"64bit",
"HasAttachedDebugger":false, |
@AndyAyersMS I have updated the code, could you try it one more time? |
public class CommandLineOptions | ||
{ | ||
[Option("base", HelpText = "Path to the folder/file with base results.")] | ||
public string BasePath { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should perform input validation here. For example,
{
get => _base;
set
{
if (string.IsNullOrWhiteSpace(value))
throw new ArgumentException("some message");
if (!Directory.Exists(value))
throw new DirectoryNotFoundException("some message");
// maybe check that the directory has the right files?
_base = value;
}
}
[Option("diff", HelpText = "Path to the folder/file with diff results.")] | ||
public string DiffPath { get; set; } | ||
|
||
[Option("merged", HelpText = "Path to the folder/file with results merged for multiple jobs in the same file.")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the output directory? If so, it should be stated in the help string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not sure how to call it (naming..)
So when we run the Benchmarks with --runtimes netcoreapp2.1 netcoreapp2.2
BDN is going to create one json file with the results for both 2.1 and 2.2 inside.
This option allows to compare the perf for such files (results for few jobs are merged into one file)
@jorive I am open to better name suggestions ;p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name is fine - maybe clearer help text explaining it is for single runs of BDN with many runtimes. "merged" suggests I did something to merge them before hand which isn't the case, right?
src/tools/ResultsComparer/Program.cs
Outdated
.Select(resultFile => JsonConvert.DeserializeObject<BdnResult>(File.ReadAllText(resultFile))) | ||
.SelectMany(result => result.Benchmarks) | ||
.GroupBy(result => result.FullName) | ||
.SelectMany(sameKey => sameKey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove extra spaces.
Thanks, it's working on my data now:
Couple of things I'd like to see:
|
Maybe we should have two more options: |
src/tools/ResultsComparer/Program.cs
Outdated
if (noiseResult.Conclusion == EquivalenceTestConclusion.Same) | ||
continue; | ||
|
||
var ratio = (1.0 - pair.diffResult.Statistics.Median / pair.baseResult.Statistics.Median); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why ratio instead of percentage delta? I think all of our other reporting tools typically report delta, not ratio.
[Option("diff", HelpText = "Path to the folder/file with diff results.")] | ||
public string DiffPath { get; set; } | ||
|
||
[Option("merged", HelpText = "Path to the folder/file with results merged for multiple jobs in the same file.")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name is fine - maybe clearer help text explaining it is for single runs of BDN with many runtimes. "merged" suggests I did something to merge them before hand which isn't the case, right?
@adamsitnik what's the intended use of this tool? Is this designed for devs to just quickly iterate on their own or is this intended to be part of the reporting infrastructure of the performance automation system? |
one thing is Windows vs Ubuntu or x64 vs ARM64 the other is: I run the benchmarks before applying any changes and save the results somewhere, then I apply the changes and save them to other location and compare the perf. |
@AndyAyersMS would
ok, then I most probably need to introduce a table
@AndyAyersMS would sth like following simple JSON be enough? {
"Benchmarks":[
{
"FullName": "BenchmarksGame.BinaryTrees_2.RunBench",
"Base": [ 1, 2, 3, 4, 5],
"Diff": [ 1, 2, 3, 4, 5],
"Conclusion": "Same"
},
{
"FullName": "BenchmarksGame.BinaryTrees_5.RunBench",
"Base": [ 1, 2, 3, 4, 5],
"Diff": [ 2, 3, 4, 4, 5],
"Conclusion": "Slower"
} ]
} |
Yes
Ideally something that I can easily import into Excel or R ... I would think CSV would be the simplest. It looks like Excel can import JSON but it wasn't obvious to me how to get what I wanted. |
@adamsitnik Should we add |
@AndyAyersMS would sth like this be OK?
|
@adamsitnik sure, makes sense. I'm more curious about whether or not this is something we'd want to incorporate into our automation strategy in general (think reporting), because modularizing it would be nice in that case. We can cross this bridge later however. One other question: Why is this stand alone and not a mode in Benchmark.NET? |
@billwert personally, I think it's better to have tools that build on top of each other and that perform one task very well, instead of a monolithic tool that becomes too bloated and complex to maintain (the linux way some would say?). This way you could think of benchmark/runner/reporter |
Ok, I have added the export to a table and CSV. The table is GH markdown friendly, can be copy-pasted from the console to GH directly. To keeps things simple I had also:
|
Sample results:
|
return null; | ||
} | ||
|
||
private static double GetRatio(EquivalenceTestConclusion conclusion, Benchmark baseResult, Benchmark diffResult) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my comment got lost in the refactor - why are we reporting straight ratios instead of relative percentages? Typically I expect to see this calculation look something like (diff-base)/base (in the case where a higher number is better.) I'd also like to see slower (regressions) represented as negative deltas instead of reversing the calculation as is done here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that base/diff, diff/base, and diff-base)/base is very subjective. For example: your preferences are different than @AndyAyersMS which are also different than @stephentoub ;)
What I have learned in BDN is that users want to customize everything, but it typically adds too much complexity to the code.
For example here to keep everyone happy I would need to introduce a new console argument, add docs for it and handle all cases in sorting the results, formatting them and aligning in the table. I don't have time for it, but I would be happy to review a PR if somebody is willing to implement it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's that subjective. Using straight ratios leads to weird things sometimes - going from 10 to 8 by your method would show as 1.25, but I would think of as a -20% regression from 10. Reversing the terms so that smaller ratios are "worse", always dividing by the base, results in 0.8, which is also a non-obvious way to present the data but closer to something that makes sense in terms of how a data point relates to the previous data point. It gets weird in another way when you invert and higher numbers are worse - imagine a working set measurement going from 1235 pages loaded to 2342 pages. Ratio would tell us it is 1.9, while it is a -47% regression.
For this very specific purpose it may not matter much, but I think there is value in a consistent method for reporting data like this which makes sense across contexts. Given that this tool is not likely to be used for things other than manual investigations we can let it lie, but I'd like to take this back up in a more global sense as we move forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like I will have to polish up my newsletter explaining all the ways ratios are vastly superior to other comparative measures...
More importantly, though: I always prefer to see things reported as diff/base, whether as a percentage or ratio or whatnot, so a single column sort can order things and we can plot distributions without having to do extra math.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@billwert Different developers have different preferences when it comes down to how their data is represented. In the past developers have asked for base/diff
, diff/base
, diff-base/base
, etc. We should just make sure that we are consistent and provide transparent w.r.t. the way output data is presented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AndyAyersMS I would like to read that newsletter. (Though I don't quite understand how ratios achieve sorting in ways that deltas do not.)
We should just make sure that we are consistent and provide transparent w.r.t. the way output data is presented.
Agreed. I'm trying to identify which way that should go. :)
@AndyAyersMS if you don't have any more feature requests could you please accept this PR? I need at least 1 approving review to merge it and all other perf folks are already enjoying their holidays ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Results Comparer
This simple tool allows for easy comparison of provided benchmark results.
It can be used to compare:
All you need to provide is:
--base
- path to folder/file with baseline results--diff
- path to folder/file with diff results--treshold
- threshold for Statistical Test. Examples: 5%, 10ms, 100ns, 1sOptional arguments:
--top
- filter the diff to top/bottomN
resultsSample: compare the results stored in
C:\results\windows
vsC:\results\ubuntu
using1%
threshold and print only TOP 10.Note: the tool supports only
*full.json
results exported by BenchmarkDotNet. This exporter is enabled by default in this repository.Note: if you have run your benchmarks for multiple jobs (eg.
-r netcoreapp2.1 netcoreapp2.2
) and you want to compare these historical results you can use--merge
to point to the folder/file with such results.Reading the results
Sample results:
Explanation:
1.0 - diffMedian / baseMedian
)