-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regressions in System.Xml.Linq.Perf_XName (FullPGO) #64626
Comments
Tagging subscribers to this area: @dotnet/area-system-security, @vcsjones Issue DetailsRun Information
Regressions in System.Xml.Linq.Perf_XName
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Xml.Linq.Perf_XName*' PayloadsHistogramSystem.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository
|
Tagging subscribers to this area: @dotnet/area-system-xml Issue DetailsRun Information
Regressions in System.Xml.Linq.Perf_XName
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Xml.Linq.Perf_XName*' PayloadsHistogramSystem.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository
|
@adamsitnik this seems a bit like a noise - looking at graph my gut tells me this is well within stdev, any clues? |
@krwq that was my initial impression as well, but then I've clicked on "Toggle annotations" (1), the blue line (2) and moving average (3) and it's rather an actual regression: |
@AndyAyersMS PTAL. |
Regression in FramedAllocateString might be a sign of a regression in GC e.g. GC Regions. |
cc @Maoni0. |
@krwq in the profiles above are the two runs measuring the same amount of work (that is, same values for And, were you filtering the profile to just the actual stages? Assuming so, it does not look like a codegen issue, the jit-generated code is actually a good deal faster. |
Also we haven't historically run triage on full PGO results; I'm a bit surprised we even opened this issue. If we look at perf of this test with the default settings we see For purposes of comparison: (full pgo) Just for completeness here's the other bit of data we have: The regressions in the PGO modes on Jan 31 (which inspired this issue) seem to well correlated with #64521. But clearly since then all modes have regressed. The most troubling of these is the recent regression in default perf: We have a reporting outage around the first and largest regression. Commit range there has 66 commits: bceefa9...e117479. I think @DrewScoggins is trying to backfill data for that gap. Source of the second regression is unclear; implicated commit range is 9814b1b...e3b2af3 which is 16 commits. For PGO there have been several regressions over the past months. It will take some time to sort through them all. |
PGO regressions (best guesses)
|
@krwq any info you can add here? If not, will start looking at this more closely tomorrow. |
@AndyAyersMS yes, they measured exactly same amount of work. Unfortunately I don't have more info I could add here. I only observed newer code being consistently slower on my machine which is inline with what the issue description is showing and seems the only thing the XML code is doing is string concatenation and given PerfView also showed string related stuff as being slower it seems root cause is somewhere in coreclr but I don't know that code enough to pinpoint anything in particular. |
Looking at default behavior for now (that is, product defaults, not Full PGO), I see 10% regressions vs .NET 6: BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) PowerPlanMode=00000000-0000-0000-0000-000000000000 InvocationCount=9983200 IterationCount=25
Comparing filtered BDN profiles hasn't been illuminating yet, though they are similar to the ones from @krwq above in that the native part of the allocation seems slower. Per discussion in #74771 the .NET 7 regression seems greatly reduced if we don't use region-based GC: BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) EnvironmentVariables=DOTNET_GCName=clrgc.dll PowerPlanMode=00000000-0000-0000-0000-000000000000 Runtime=.NET 7.0
and a similar with BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) EnvironmentVariables=DOTNET_GCWriteBarrier=3 PowerPlanMode=00000000-0000-0000-0000-000000000000 Runtime=.NET 7.0
|
Moving this to area-gc since it seems to be related to recent gc changes. Again, this is mainly about the non-PGO behavior. |
Tagging subscribers to this area: @dotnet/gc Issue DetailsRun Information
Regressions in System.Xml.Linq.Perf_XName
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Xml.Linq.Perf_XName*' PayloadsHistogramSystem.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository
|
@AndyAyersMS, are you still looking into this? Should we assign a different owner in GC team? |
I'll take over from here since this is prospectively a GC issue. Investigation in Progress. |
DataWithout GC NormalizationThere seems to be a regression within a 5% margin:
Baseline - SegmentsBenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) EnvironmentVariables=COMPlus_GCName=clrgc.dll PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms
RegionsBenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20
Regions w/ More Precise WritebarrierBenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) EnvironmentVariables=COMPlus_GCWriteBarrier=3,COMPlus_EnableWriteXorExecute=0 PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms
With GC Normalization
Baseline - SegmentsBenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) OutlierMode=DontRemove EnvironmentVariables=COMPlus_GCName=clrgc.dll PowerPlanMode=00000000-0000-0000-0000-000000000000
RegionsBenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) OutlierMode=DontRemove PowerPlanMode=00000000-0000-0000-0000-000000000000 Force=False
Regions w/ More Precise WritebarrierBenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) OutlierMode=DontRemove EnvironmentVariables=COMPlus_GCWriteBarrier=3,COMPlus_EnableWriteXorExecute=0 PowerPlanMode=00000000-0000-0000-0000-000000000000
|
We have been working through root-causing and fixing this issue and here are our conclusions:
Results
Details
Full comparison table with the virtual commit and decommit data: Compare.xlsx Tracking these issues here: #73592 CC: @dotnet/gc |
Thanks @mrsharm. @adamsitnik checking that you agree with the analysis of ensuring the GC impact is equitable across runs and with that the regression is not too high? |
I have seen this effect when comparing things too (say OSR on vs off) -- if the two variants behave differently BDN may change its measurement approach to the point where the measurement strategy becomes a contributing factor in the overall reported results. Keeping things the "same" across the two sets of measurements often resolves this, though not always. |
is this the .NET equivalent of the uncertainty principle :) |
Definitely, that's why we kept an eye on standard error and standard error normalized by the mean as well to ensure the repeatability of these experiments:
Getting to this low standard error was a matter of discerning a high enough invocation count described above. The result was a consistent and equitable comparison i.e., we ran this experiment for 20+ times and got similar results. |
Assume its ok to close this now since the normalized regression after the decommit fix is reasonable? |
Run Information
Regressions in System.Xml.Linq.Perf_XName
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString
Description of detection logic
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
The text was updated successfully, but these errors were encountered: