Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[net6] Poor performance for macOS running on CoreCLR #15145

Closed
spouliot opened this issue May 27, 2022 · 6 comments
Closed

[net6] Poor performance for macOS running on CoreCLR #15145

spouliot opened this issue May 27, 2022 · 6 comments
Labels
CoreCLR Bugs related to our CoreCLR support dotnet An issue or pull request related to .NET (6) performance If an issue or pull request is related to performance
Milestone

Comments

@spouliot
Copy link
Contributor

spouliot commented May 27, 2022

Steps to Reproduce

  1. Run this Xamarin.Mac sample
  2. Run this net6.0-macios sample
  3. Compare the numbers

Expected Behavior

Same or better performance from CoreCLR (versus Xamarin.Mac legacy running on Mono)

Actual Behavior

Up to 4 times slower running on CoreCLR.

This is likely similar to unoplatform/uno#8890 but the workaround (to disable ObjC exception marshalling) is not possible on CoreCLR.

Environment

  • Xamarin.Mac
Xamarin.Mac
Version: 8.11.0.288 (Visual Studio Community)
Hash: 5946727d0
Branch: main
Build date: 2022-05-23 19:40:52-0400
  • net6.0
Installed Workload Ids      Installation Source
-----------------------------------------------
wasm-tools                  SDK 6.0.300
macos                       SDK 6.0.300
maui-maccatalyst            SDK 6.0.300
maui-ios                    SDK 6.0.300
maui-android                SDK 6.0.300
ios                         SDK 6.0.300
maccatalyst                 SDK 6.0.300
tvos                        SDK 6.0.300
android                     SDK 6.0.300

Logs

speedscope logs from dotnet trace
DopeTestUno.Mobile_20220526_215531.speedscope.json.zip

Logs shows 99+% time spend in unmanaged code (not surprising if related to #8890). However it also points out some places that are more costly than anticipated.

More details will be added in unoplatform/uno#8890

IsUserType

IsUserType takes 5.2% of the execution time (on the main thread). It is not cached (in flags) and does a few native calls. Also IIRC Selector.GetHandle is not optimized on macOS (at least is was not on mono, never checked with CoreCLR). Fixing this should be beneficial to all platforms/runtime.

Update: fixed with #15149 - it was not really anything specific to CoreCLR, but found in that data

Screen Shot 2022-05-27 at 10 16 52 AM

@rolfbjarne rolfbjarne added dotnet An issue or pull request related to .NET (6) CoreCLR Bugs related to our CoreCLR support labels May 27, 2022
@rolfbjarne rolfbjarne added this to the .NET 7 milestone May 27, 2022
@spouliot
Copy link
Contributor Author

The attached data identified a problem with CFArray (used in bindings) and #15146 should fix it. However it applies to both Mono and CoreCLR so it does not really change the performance regression reported here.

spouliot added a commit to spouliot/xamarin-macios that referenced this issue May 29, 2022
This is computed **twice** _per instance_, when the instance is
* created: `NSObject.CreateManagedRef`
* released: `NSObject.ReleaseManagedRef`

However its (boolean) value remains identical for all instances of the same type.

This shows up as consuming a lot of time in the logs attached to xamarin#15145

Basically two things are cached
* the selector for `xamarinSetGCHandle:flags:`, which makes it 15% faster;
	 * some platforms, but not macOS, have an optimization for the `Selector.GetHandle` API
* the result is cached into a `Dictionrary<IntPtr,bool>`

Note that the optimizations are not specific to CoreCLR nor macOS (so they are not fixes for the CoreCLR performance regression of the above mentioned issue).
OTOH it will also help performance for legacy and other net6 (mono) platforms.

```
BenchmarkDotNet=v0.12.1, OS=macOS 12.3.1 (21E258) [Darwin 21.4.0]
Apple M1, 1 CPU, 8 logical and 8 physical cores
.NET Core SDK=        6.0.100 [/usr/local/share/dotnet/sdk]
  [Host] : .NET Core 6.0 (CoreCLR 6.0.522.21309, CoreFX 6.0.522.21309), Arm64 RyuJIT

Job=InProcess  Toolchain=InProcessEmitToolchain  IterationCount=3
LaunchCount=1  WarmupCount=3
```

|           Method | Length |           Mean |        Error |      StdDev | Ratio |
|----------------- |------- |---------------:|-------------:|------------:|------:|
|         Original |     16 |     7,729.8 ns |    212.61 ns |    11.65 ns |  1.00 |
|   CachedSelector |     16 |     6,552.6 ns |    202.70 ns |    11.11 ns |  0.85 |
| CachedIsUserType |     16 |       162.0 ns |     14.86 ns |     0.81 ns |  0.02 |
|                  |        |                |              |             |       |
|         Original |    256 |   123,183.0 ns |  4,724.95 ns |   258.99 ns |  1.00 |
|   CachedSelector |    256 |   104,570.3 ns |  2,029.20 ns |   111.23 ns |  0.85 |
| CachedIsUserType |    256 |     2,489.5 ns |    390.86 ns |    21.42 ns |  0.02 |
|                  |        |                |              |             |       |
|         Original |   4096 | 1,970,381.7 ns | 66,393.09 ns | 3,639.23 ns |  1.00 |
|   CachedSelector |   4096 | 1,676,773.0 ns | 12,149.92 ns |   665.98 ns |  0.85 |
| CachedIsUserType |   4096 |    39,933.3 ns |  7,426.74 ns |   407.08 ns |  0.02 |

[Benchmark source code](https://gist.github.com/spouliot/42fd43e94c5a9ce90164f0f6f9b35018)
rolfbjarne pushed a commit that referenced this issue Jun 1, 2022
This is computed **twice** _per instance_, when the instance is
* created: `NSObject.CreateManagedRef`
* released: `NSObject.ReleaseManagedRef`

However its (boolean) value remains identical for all instances of the same type.

This shows up as consuming a lot of time in the logs attached to #15145

Basically two things are cached
* the selector for `xamarinSetGCHandle:flags:`, which makes it 15% faster;
	 * some platforms, but not macOS, have an optimization for the `Selector.GetHandle` API
* the result is cached into a `Dictionrary<IntPtr,bool>`

Note that the optimizations are not specific to CoreCLR nor macOS (so they are not fixes for the CoreCLR performance regression of the above mentioned issue).
OTOH it will also help performance for legacy and other net6 (mono) platforms.

```
BenchmarkDotNet=v0.12.1, OS=macOS 12.3.1 (21E258) [Darwin 21.4.0]
Apple M1, 1 CPU, 8 logical and 8 physical cores
.NET Core SDK=        6.0.100 [/usr/local/share/dotnet/sdk]
  [Host] : .NET Core 6.0 (CoreCLR 6.0.522.21309, CoreFX 6.0.522.21309), Arm64 RyuJIT

Job=InProcess  Toolchain=InProcessEmitToolchain  IterationCount=3
LaunchCount=1  WarmupCount=3
```

|           Method | Length |           Mean |        Error |      StdDev | Ratio |
|----------------- |------- |---------------:|-------------:|------------:|------:|
|         Original |     16 |     7,729.8 ns |    212.61 ns |    11.65 ns |  1.00 |
|   CachedSelector |     16 |     6,552.6 ns |    202.70 ns |    11.11 ns |  0.85 |
| CachedIsUserType |     16 |       162.0 ns |     14.86 ns |     0.81 ns |  0.02 |
|                  |        |                |              |             |       |
|         Original |    256 |   123,183.0 ns |  4,724.95 ns |   258.99 ns |  1.00 |
|   CachedSelector |    256 |   104,570.3 ns |  2,029.20 ns |   111.23 ns |  0.85 |
| CachedIsUserType |    256 |     2,489.5 ns |    390.86 ns |    21.42 ns |  0.02 |
|                  |        |                |              |             |       |
|         Original |   4096 | 1,970,381.7 ns | 66,393.09 ns | 3,639.23 ns |  1.00 |
|   CachedSelector |   4096 | 1,676,773.0 ns | 12,149.92 ns |   665.98 ns |  0.85 |
| CachedIsUserType |   4096 |    39,933.3 ns |  7,426.74 ns |   407.08 ns |  0.02 |

[Benchmark source code](https://gist.github.com/spouliot/42fd43e94c5a9ce90164f0f6f9b35018)
@spouliot
Copy link
Contributor Author

spouliot commented Jun 4, 2022

cross-posting unoplatform/uno#8890 (comment)

Comparing p/invoke wrappers enabled

I had to hack around an issue (in quite an hackish way) but I was able to get the numbers

macOS 12.3.1 / arm64 Build Change Reuse Grid
Xamarin.Mac Legacy (updated) 1594.97 465188.25 1903.63 6746.49
net-6.0 (w/pinvoke wrappers) 1460.71 376612.26 [1] [1]
Impact (slowdown) 0.92x 0.81x N/A N/A

So the wrappers are helping a lot, net6.0-macos numbers are at best yet (see previous numbers) - but not quite what XM legacy was able to achieve.

We can also compare the ratios (other numbers can't be compared due to other fixes that were applied) with the original numbers

macOS 12.3.1 / arm64 Build Change Reuse Grid
Xamarin.Mac Legacy 853.22 336635.53 909.64 5356.02
net6.0-macos 667.86 82385.28 [1] [1]
Impact (slowdown) 0.78x 0.24x N/A N/A

The performance gap, between mono and coreclr runtimes, as narrowed considerably but still exists.

[1] Code is currently commented inside the repo, comparison is not possible

@rolfbjarne rolfbjarne modified the milestones: .NET 7, .NET 8 Aug 26, 2022
@rolfbjarne rolfbjarne added the performance If an issue or pull request is related to performance label Aug 26, 2022
@rolfbjarne
Copy link
Member

I just tried this with .NET 7, and the .NET numbers looked consistently a little bit higher than the Xamarin.Mac ones (they jump around a lot though, and are the tests supposed to run indefinitely, or am I just impatient?)

@spouliot
Copy link
Contributor Author

spouliot commented Sep 7, 2023

Thanks for revisiting this issue @rolfbjarne :)

The test app does run indefinitely.

Out of curiosity

  • Did you use Uno's master for the app ?
  • Was it tested on x64, arm64 or both ?

@jeromelaban I think we can close this issue and the one on our side unoplatform/uno#8890 ?

@jeromelaban
Copy link
Contributor

Thanks for the update! If the performance is improved, the issue can be closed at your convenience :)

@rolfbjarne
Copy link
Member

Thanks for revisiting this issue @rolfbjarne :)

The test app does run indefinitely.

Out of curiosity

  • Did you use Uno's master for the app ?

This is the commit: unoplatform/performance@46a79f2

  • Was it tested on x64, arm64 or both ?

x64 on an M2 machine.

I had to disable the interpreter, since we don't support MonoVM on macOS anymore, only CoreCLR (in which case there's no interpreter):

https://github.com/unoplatform/performance/blob/46a79f2a9d8b64f59726a46bf48eb878d020ae15/src/dopes/Uno-dotnet6/DopeTestUno/DopeTestUno.Mobile/DopeTestUno.Mobile.csproj#L13

@spouliot spouliot closed this as completed Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CoreCLR Bugs related to our CoreCLR support dotnet An issue or pull request related to .NET (6) performance If an issue or pull request is related to performance
Projects
None yet
Development

No branches or pull requests

3 participants