-
Notifications
You must be signed in to change notification settings - Fork 517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[net6] Poor performance for macOS running on CoreCLR #15145
Comments
The attached data identified a problem with |
This is computed **twice** _per instance_, when the instance is * created: `NSObject.CreateManagedRef` * released: `NSObject.ReleaseManagedRef` However its (boolean) value remains identical for all instances of the same type. This shows up as consuming a lot of time in the logs attached to xamarin#15145 Basically two things are cached * the selector for `xamarinSetGCHandle:flags:`, which makes it 15% faster; * some platforms, but not macOS, have an optimization for the `Selector.GetHandle` API * the result is cached into a `Dictionrary<IntPtr,bool>` Note that the optimizations are not specific to CoreCLR nor macOS (so they are not fixes for the CoreCLR performance regression of the above mentioned issue). OTOH it will also help performance for legacy and other net6 (mono) platforms. ``` BenchmarkDotNet=v0.12.1, OS=macOS 12.3.1 (21E258) [Darwin 21.4.0] Apple M1, 1 CPU, 8 logical and 8 physical cores .NET Core SDK= 6.0.100 [/usr/local/share/dotnet/sdk] [Host] : .NET Core 6.0 (CoreCLR 6.0.522.21309, CoreFX 6.0.522.21309), Arm64 RyuJIT Job=InProcess Toolchain=InProcessEmitToolchain IterationCount=3 LaunchCount=1 WarmupCount=3 ``` | Method | Length | Mean | Error | StdDev | Ratio | |----------------- |------- |---------------:|-------------:|------------:|------:| | Original | 16 | 7,729.8 ns | 212.61 ns | 11.65 ns | 1.00 | | CachedSelector | 16 | 6,552.6 ns | 202.70 ns | 11.11 ns | 0.85 | | CachedIsUserType | 16 | 162.0 ns | 14.86 ns | 0.81 ns | 0.02 | | | | | | | | | Original | 256 | 123,183.0 ns | 4,724.95 ns | 258.99 ns | 1.00 | | CachedSelector | 256 | 104,570.3 ns | 2,029.20 ns | 111.23 ns | 0.85 | | CachedIsUserType | 256 | 2,489.5 ns | 390.86 ns | 21.42 ns | 0.02 | | | | | | | | | Original | 4096 | 1,970,381.7 ns | 66,393.09 ns | 3,639.23 ns | 1.00 | | CachedSelector | 4096 | 1,676,773.0 ns | 12,149.92 ns | 665.98 ns | 0.85 | | CachedIsUserType | 4096 | 39,933.3 ns | 7,426.74 ns | 407.08 ns | 0.02 | [Benchmark source code](https://gist.github.com/spouliot/42fd43e94c5a9ce90164f0f6f9b35018)
This is computed **twice** _per instance_, when the instance is * created: `NSObject.CreateManagedRef` * released: `NSObject.ReleaseManagedRef` However its (boolean) value remains identical for all instances of the same type. This shows up as consuming a lot of time in the logs attached to #15145 Basically two things are cached * the selector for `xamarinSetGCHandle:flags:`, which makes it 15% faster; * some platforms, but not macOS, have an optimization for the `Selector.GetHandle` API * the result is cached into a `Dictionrary<IntPtr,bool>` Note that the optimizations are not specific to CoreCLR nor macOS (so they are not fixes for the CoreCLR performance regression of the above mentioned issue). OTOH it will also help performance for legacy and other net6 (mono) platforms. ``` BenchmarkDotNet=v0.12.1, OS=macOS 12.3.1 (21E258) [Darwin 21.4.0] Apple M1, 1 CPU, 8 logical and 8 physical cores .NET Core SDK= 6.0.100 [/usr/local/share/dotnet/sdk] [Host] : .NET Core 6.0 (CoreCLR 6.0.522.21309, CoreFX 6.0.522.21309), Arm64 RyuJIT Job=InProcess Toolchain=InProcessEmitToolchain IterationCount=3 LaunchCount=1 WarmupCount=3 ``` | Method | Length | Mean | Error | StdDev | Ratio | |----------------- |------- |---------------:|-------------:|------------:|------:| | Original | 16 | 7,729.8 ns | 212.61 ns | 11.65 ns | 1.00 | | CachedSelector | 16 | 6,552.6 ns | 202.70 ns | 11.11 ns | 0.85 | | CachedIsUserType | 16 | 162.0 ns | 14.86 ns | 0.81 ns | 0.02 | | | | | | | | | Original | 256 | 123,183.0 ns | 4,724.95 ns | 258.99 ns | 1.00 | | CachedSelector | 256 | 104,570.3 ns | 2,029.20 ns | 111.23 ns | 0.85 | | CachedIsUserType | 256 | 2,489.5 ns | 390.86 ns | 21.42 ns | 0.02 | | | | | | | | | Original | 4096 | 1,970,381.7 ns | 66,393.09 ns | 3,639.23 ns | 1.00 | | CachedSelector | 4096 | 1,676,773.0 ns | 12,149.92 ns | 665.98 ns | 0.85 | | CachedIsUserType | 4096 | 39,933.3 ns | 7,426.74 ns | 407.08 ns | 0.02 | [Benchmark source code](https://gist.github.com/spouliot/42fd43e94c5a9ce90164f0f6f9b35018)
cross-posting unoplatform/uno#8890 (comment) Comparing p/invoke wrappers enabledI had to hack around an issue (in quite an hackish way) but I was able to get the numbers
So the wrappers are helping a lot, net6.0-macos numbers are at best yet (see previous numbers) - but not quite what XM legacy was able to achieve. We can also compare the ratios (other numbers can't be compared due to other fixes that were applied) with the original numbers
The performance gap, between mono and coreclr runtimes, as narrowed considerably but still exists. [1] Code is currently commented inside the repo, comparison is not possible |
I just tried this with .NET 7, and the .NET numbers looked consistently a little bit higher than the Xamarin.Mac ones (they jump around a lot though, and are the tests supposed to run indefinitely, or am I just impatient?) |
Thanks for revisiting this issue @rolfbjarne :) The test app does run indefinitely. Out of curiosity
@jeromelaban I think we can close this issue and the one on our side unoplatform/uno#8890 ? |
Thanks for the update! If the performance is improved, the issue can be closed at your convenience :) |
This is the commit: unoplatform/performance@46a79f2
x64 on an M2 machine. I had to disable the interpreter, since we don't support MonoVM on macOS anymore, only CoreCLR (in which case there's no interpreter): |
Steps to Reproduce
Expected Behavior
Same or better performance from CoreCLR (versus Xamarin.Mac legacy running on Mono)
Actual Behavior
Up to 4 times slower running on CoreCLR.
This is likely similar to unoplatform/uno#8890 but the workaround (to disable ObjC exception marshalling) is not possible on CoreCLR.
Environment
Logs
speedscope logs from
dotnet trace
DopeTestUno.Mobile_20220526_215531.speedscope.json.zip
Logs shows 99+% time spend in unmanaged code (not surprising if related to #8890). However it also points out some places that are more costly than anticipated.
More details will be added in unoplatform/uno#8890
IsUserTypeIsUserType
takes 5.2% of the execution time (on the main thread). It is not cached (inflags
) and does a few native calls. Also IIRCSelector.GetHandle
is not optimized on macOS (at least is was not on mono, never checked with CoreCLR). Fixing this should be beneficial to all platforms/runtime.Update: fixed with #15149 - it was not really anything specific to CoreCLR, but found in that data
The text was updated successfully, but these errors were encountered: