Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mono] Restore old code to solve the recent SpanHelpers regressions #75917

Merged
merged 6 commits into from
Sep 22, 2022

Conversation

adamsitnik
Copy link
Member

@adamsitnik adamsitnik commented Sep 20, 2022

Fixes #75709

As discovered in #dotnet/perf-autofiling-issues#7976 (and #74395), SpanHelpers vectorization work in #73768 that went into RC1 caused severe Span.IndexOf performance regressions on mono. This PR restores the previous implementation for mono while retaining the performance gains seen for coreclr.

This is a temporary workaround for the regression on mono, but .NET 7.0 will ship with this workaround in place. For .NET 8, we have #75801 to explore the better long-term resolution.

This PR does not include the changes introduced in the following PRs:

#73481 (it caused the JSON comment parsing regression: #74442)
#73368 and #73469 where I've removed AdvSimd.Arm64 code-path and started using Vector128/256 everywhere. The reason for that is WASM/Mono does not support Vector128/256 for all configs (#73469 (comment)), so I've not included these changes.

  • ensure the CI is green for all the configs
  • verify the performance for Mono AOT and WASM

cc @jkotas @vargaz

@ghost
Copy link

ghost commented Sep 20, 2022

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Issue Details

TODO:

  • ensure the CI is green for all the configs
  • verify the performance for Mono AOT and WASM

cc @jkotas @vargaz

Author: adamsitnik
Assignees: -
Labels:

area-System.Memory, tenet-performance

Milestone: -

@ghost ghost assigned adamsitnik Sep 20, 2022
@vargaz
Copy link
Contributor

vargaz commented Sep 20, 2022

/azp run runtime-wasm

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@adamsitnik
Copy link
Member Author

Plenty of the CI legs failed with "Git fetch failed with exit code: 128"

@vargaz
Copy link
Contributor

vargaz commented Sep 20, 2022

/azp run runtime

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@radical
Copy link
Member

radical commented Sep 20, 2022

/backport to release/7.0-rc2

@github-actions
Copy link
Contributor

Started backporting to release/7.0-rc2: https://github.com/dotnet/runtime/actions/runs/3093620874

Copy link
Member

@jeffhandley jeffhandley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamsitnik I tried comparing against SpanHelpers.T.cs at a couple points in history and I'm not seeing a 1:1 match with what is brought in here for SpanHelpers.Mono.cs.

Can you edit the issue description to link to the baseline that was used to copy from and provide any other noteworthy details?

@adamsitnik
Copy link
Member Author

Can you edit the issue description to link to the baseline that was used to copy from and provide any other noteworthy details?

@jeffhandley done

@adamsitnik
Copy link
Member Author

The microbenchmarks results are suprising. Some methods are improving, while some are regressing with the current state of this PR.

WASM AOT

AOT

better: 4, geomean: 1.293
worse: 12, geomean: 1.608
total diff: 16

Slower diff/base Base Median (ns) Diff Median (ns) Modality
System.Memory.Span[Int32].IndexOfAnyFourValues(Size: 512) 2.31 1950.70 4509.10
System.Memory.Span[Byte].IndexOfAnyFourValues(Size: 512) 2.14 1894.33 4053.56 several?
System.Memory.Span[Char].IndexOfAnyFiveValues(Size: 512) 2.08 566.80 1179.55 several?
System.Memory.Span[Int32].IndexOfAnyThreeValues(Size: 512) 2.02 358.98 724.70 several?
System.Memory.ReadOnlySpan.IndexOfString(input: "StrIng", value: "string", compa 1.63 92.44 150.53 bimodal
System.Memory.Span[Char].IndexOfAnyFourValues(Size: 512) 1.51 10582.26 16016.39
System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, O 1.50 5917.21 8877.92 several?
System.Memory.Span[Byte].LastIndexOfAnyValues(Size: 512) 1.34 7411.45 9963.83 bimodal
System.Memory.ReadOnlySpan.IndexOfString(input: "Hello Worldbbbbbbbbbbbbbbbbbbbb 1.33 3998.54 5308.80
System.Memory.ReadOnlySpan.IndexOfString(input: "だだだだだだだだだだだだだだだだだだだだだだだだだだだだだだだ 1.32 10470.40 13856.53
System.Tests.Perf_Array.IndexOfShort 1.30 1491.77 1932.28
System.Memory.ReadOnlySpan.IndexOfString(input: "More Test's", value: "Tests", c 1.27 274.76 348.33 several?
Faster base/diff Base Median (ns) Diff Median (ns) Modality
System.Memory.Span[Char].IndexOfAnyThreeValues(Size: 512) 1.53 12661.75 8275.87 bimodal
System.Tests.Perf_Array.IndexOfChar 1.44 2109.61 1466.15
System.Tests.Perf_String.IndexOfAny 1.13 1927.32 1708.39
System.Memory.Span[Byte].IndexOfValue(Size: 512) 1.13 7592.62 6743.64

WASM

better: 5, geomean: 1.116
worse: 2, geomean: 1.157
total diff: 7

Slower diff/base Base Median (ns) Diff Median (ns) Modality
System.Memory.Span[Int32].IndexOfAnyThreeValues(Size: 512) 1.24 12709.48 15758.57
System.Memory.Span[Byte].IndexOfAnyFourValues(Size: 512) 1.08 24093.75 26007.18
Faster base/diff Base Median (ns) Diff Median (ns) Modality
System.Memory.Span[Byte].IndexOfAnyTwoValues(Size: 512) 1.29 10959.04 8524.58
System.Memory.Span[Byte].IndexOfAnyFiveValues(Size: 512) 1.10 30364.02 27677.80
System.Memory.Span[Byte].IndexOfValue(Size: 512) 1.08 5181.86 4801.36
System.Memory.Span[Byte].IndexOfAnyThreeValues(Size: 512) 1.07 12186.64 11363.84
System.Memory.Span[Char].IndexOfAnyFiveValues(Size: 512) 1.06 20108.57 18942.31

I don't know why it is this way, but I am sure that we should not be merging this PR right now. Again, I am sorry for the disappointment. I've not followed my usual perf regression routine in this case (profile => indetify => solve), but blindly listened to "just copy all the code".

@adamsitnik
Copy link
Member Author

@radekdoulik has shared offline different results that show that the fix actually helps. He has run the benchmarks in different way, using perf pipeline developed by @radical

@radekdoulik
Copy link
Member

@radekdoulik has shared offline different results that show that the fix actually helps. He has run the benchmarks in different way, using perf pipeline developed by @radical

the results I mentioned in the teams are from our bench sample measurements, run on dedicated arm SBC, so they are not coming from the CI pipeline. they have smaller coverage though. in our data the regression was visible in the Span.IndexOf measurements and in one of the Json measurements, which we have in the bench sample.

@adamsitnik do you also have your microbenchmarks results for the regression itself? was it showing only regressions or some improvements too?

@jeffhandley jeffhandley merged commit 254844a into dotnet:main Sep 22, 2022
jeffhandley added a commit that referenced this pull request Sep 22, 2022
…75917)

* bring back the old code...

* bring back more old code

* Use an ifdef around clr code instead of a separate file

* Delete SpanHelpers.Clr.cs

* Remove a remaining INumber<T> helper from mono

Co-authored-by: Jeff Handley <[email protected]>
@jeffhandley
Copy link
Member

/backport to release/7.0-rc2

@github-actions
Copy link
Contributor

Started backporting to release/7.0-rc2: https://github.com/dotnet/runtime/actions/runs/3102128783

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[wasm] Perf regression of Span.IndexOf
6 participants