Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Don't needlessly refresh proc count #27543

Merged
merged 1 commit into from
Oct 31, 2019
Merged

Conversation

benaadams
Copy link
Member

@benaadams benaadams commented Oct 30, 2019

Edit: Ah... my mistake, it isn't cached in the VM layer. How important is detecting changing proc count changes every 30 secs? (as used by SpinWait, SpinLock and CancellationTokenSource)

e.g. IsSingleProcessor is only ever set once and never refreshed

Its cached in the vm layer after first access so never changes; so it can be a static readonly.

I looked at changing Enviorment.ProcessorCount to be static readonly backed and then changing all the individual static readonly procCount caches to just call it directly.

However, that's a bit more complicated as there are Jit tests that rely on it being non-inlinable due to the p/invoke and it would cause complications for the shared source as Mono's p/invoke its the get property not the method that the get property then calls; so loosing the caches would likely cause regressions there.

/cc @jkotas @stephentoub

@benaadams
Copy link
Member Author

benaadams commented Oct 30, 2019

Edit: Ah... my mistake, it isn't cached in the VM layer. How important is detecting changing proc count changes every 30 secs?

This particular property is used by SpinWait, SpinLock and CancellationTokenSource

e.g. IsSingleProcessor is only ever set once and never refreshed

@jkotas
Copy link
Member

jkotas commented Oct 30, 2019

How important is detecting changing proc count changes every 30 secs?

The original version of this code was refreshing processor affinity every 30 secs. It is not doing that anymore after several refactorings - in most cases at least.

@jkotas
Copy link
Member

jkotas commented Oct 30, 2019

Environment.ProcessorCount to be static readonly backed and then changing all the individual static readonly procCount caches to just call it directly.

Yes, I think it would be best. If there are tests that failing because of that, they should be fixed or deleted.

@stephentoub
Copy link
Member

cc @kouvel, who changed IsSingleProcessor in https://github.com/dotnet/coreclr/pull/13670/files#diff-683db3e0122bda1b5528859535552c7fR357.

This whole PlatformHelper.ProcessorCount thing originated around 10 years ago, when hot-swappable CPUs were the rage and this was done to assist types like Parallel in adapting as the number of cores changed.

At this point, I don't think it's worth the trouble. Software architecture has evolved, and scalable services that need to adapt are likely better off being restarted than paying runtime costs trying to adapt.

I think it's fine to remove this whole PlatformHelper thing (in both coreclr and corefx) and just change everything to use Environment.ProcessorCount, and then make ProcessorCount fixed on first use.

@jkotas
Copy link
Member

jkotas commented Oct 30, 2019

We seem to also have ambiguity about what Environment.ProcessorCount returns. According to the docs, it should return "the number of processors on the current machine".

  • On Windows without NUMA, it returns SYSTEM_INFO::dwNumberOfProcessors. This sounds correct. I assume that this number cannot change during the process lifetime.
  • On Windows with NUMA, it returns CPUGroupInfo::GetNumActiveProcessors. Can this number change during the process lifetime?
  • On Unix, with HAVE_SCHED_GETAFFINITY, it returns the process affinity at the process start:
    nrcpus = CPU_COUNT(&cpuSet);
  • On Unix ARM and ARM64, it returns _SC_NPROCESSORS_CONF. This sounds correct.
  • On Unix on other platforms, it returns _SC_NPROCESSORS_ONLN at the process start. That does not sound right.

cc @janvorli @kouvel

@benaadams
Copy link
Member Author

Mono method is interesting with comments on trade offs https://github.com/mono/mono/blob/6dff5fc81321a4fd9dd493e215d4ace543475054/mono/utils/mono-proclib.c#L760-L880

@benaadams
Copy link
Member Author

benaadams commented Oct 30, 2019

In Mono the comments indicate its bypassing Android's power shutdown of CPUs to get the real CPU count:

/* Android tries really hard to save power by powering off CPUs on SMP phones which
* means the normal way to query cpu count returns a wrong value with userspace API.
* Instead we use /sys entries to query the actual hardware CPU count.
*/

However then does the same _SC_NPROCESSORS_ONLN to _SC_NPROCESSORS_CONF for compat with Coreclr and OpenJDK

@benaadams
Copy link
Member Author

benaadams commented Oct 30, 2019

If there are tests that failing because of that, they should be fixed or deleted.

@jkotas @AndyAyersMS this was the area I was concerned about (since it will now inline); but I'm not sure how the Jit tests work/run

// These two pinvokes should *not* be inline (filter)
//
// For the first call the jit won't inline the wrapper, so
// it just calls get_ProcessorCount.
//
// For the second call, the force inline works, and the
// subsequent inline of get_ProcessorCount exposes a call
// to the pinvoke GetProcessorCount. This pinvoke will
// not be inline.
catch (Exception) when (Environment.ProcessorCount == AsForceInline())

@jkotas
Copy link
Member

jkotas commented Oct 30, 2019

this was the area I was concerned about (since it will now inline); but I'm not sure how the Jit tests work/run

You can change that test to use Thread.Yield that is using the same pattern as Environment.ProcessorCount before.

Regardless, it is not ideal for JIT tests to depend on CoreLib internals.

@benaadams benaadams closed this Oct 31, 2019
@benaadams benaadams reopened this Oct 31, 2019
@benaadams
Copy link
Member Author

Squashed and rebased to try to give CI a kick

@kouvel kouvel merged commit 946e57b into dotnet:master Oct 31, 2019
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/corefx that referenced this pull request Oct 31, 2019
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/mono that referenced this pull request Oct 31, 2019
stephentoub pushed a commit to dotnet/corefx that referenced this pull request Oct 31, 2019
akoeplinger pushed a commit to mono/mono that referenced this pull request Oct 31, 2019
lewurm added a commit to lewurm/mono that referenced this pull request Oct 31, 2019
On top of mono/corefx#369 this improves the execution time of `System.Core-xunit` on Linux/ARM64 by 2x, so from:

```console
$ make -C mcs/class/System.Core run-xunit-test
[...]
=== TEST EXECUTION SUMMARY ===
   net_4_x_System.Core_xunit-test  Total: 48774, Errors: 0, Failed: 0, Skipped: 6, Time: 131.143s
```
to
```console
$ make -C mcs/class/System.Core run-xunit-test
[...]
=== TEST EXECUTION SUMMARY ===
   net_4_x_System.Core_xunit-test  Total: 48774, Errors: 0, Failed: 0, Skipped: 6, Time: 74.636s
```

This is only relevant for non-netcore. The CoreCLR folks just recently fixed something similar (thanks to Marek sharing this link): dotnet/coreclr#27543
@benaadams benaadams deleted the proc-count branch October 31, 2019 14:38
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/corert that referenced this pull request Oct 31, 2019
jkotas pushed a commit to dotnet/corert that referenced this pull request Oct 31, 2019
akoeplinger pushed a commit to mono/mono that referenced this pull request Oct 31, 2019
On top of mono/corefx#369 this improves the execution time of `System.Core-xunit` on Linux/ARM64 by 2x, so from:

```console
$ make -C mcs/class/System.Core run-xunit-test
[...]
=== TEST EXECUTION SUMMARY ===
   net_4_x_System.Core_xunit-test  Total: 48774, Errors: 0, Failed: 0, Skipped: 6, Time: 131.143s
```
to
```console
$ make -C mcs/class/System.Core run-xunit-test
[...]
=== TEST EXECUTION SUMMARY ===
   net_4_x_System.Core_xunit-test  Total: 48774, Errors: 0, Failed: 0, Skipped: 6, Time: 74.636s
```

This is only relevant for non-netcore. The CoreCLR folks just recently fixed something similar (thanks to Marek sharing this link): dotnet/coreclr#27543
@jashook
Copy link

jashook commented Nov 1, 2019

/cc @dotnet/coreclr-infra this change has broken pinvoke-examples it was not run during this change because it is an outerloop test

@jashook
Copy link

jashook commented Nov 1, 2019

@jashook
Copy link

jashook commented Nov 1, 2019

@AaronRobinsonMSFT will disable the test, @benaadams do you have cycles to fix the test after it is disabled?

@benaadams
Copy link
Member Author

@benaadams do you have cycles to fix the test after it is disabled?

Not sure; I changed the test in this PR to use Thread.Yield rather than Environment.ProcessorCount since its no longer a p\invoke, but if that's not working am not entirely sure what type of call the test wants.

@kouvel
Copy link
Member

kouvel commented Nov 1, 2019

I think the issue is that Thread.Yield may realistically return true or false, while Environment.ProcessorCount typically returned the same thing, and the return value is actually used by the tests in verification. I think maybe it would be ok to not rely on the return value of Yield. I'm not sure if that's part of what it is testing, but I didn't find any other QCall that returns a typically constant value. @AaronRobinsonMSFT?

@AaronRobinsonMSFT
Copy link
Member

Is a QCall needed or is that merely to simplify the dependency on a new native library? To answer the question specifically, I don't know of a QCall that would satisfy that requirement off the top of my head.

Additionally, can someone provide insight into what this test is meant to solve? Perhaps it could be rewritten to more accurately address the need - pinvoke-examples.cs doesn't indicate what the test is meant to do nor do any comments in the test itself. @dotnet/jit-contrib or @BruceForstall may have some insight into what this test is for.

@BruceForstall
Copy link
Member

BruceForstall commented Nov 1, 2019

The test was added by @AndyAyersMS --

commit 043fe32b7489952a62b2cb0cbe41de86895075a3
Author: Andy Ayers
Date:   Mon Nov 21 15:44:21 2016 -0800

    JIT: enable inline pinvoke in more cases

    An inline pinvoke is a pinvoke where the managed/native transition
    overhead is reduced by inlining parts of the transition bookkeeping
    around the call site. A normal pinvoke does this bookkeeping in
    a stub method that interposes between the managed caller and the
    native callee.

    Previously the jit would not allow pinvoke calls that came from inlines
    to be optimized via inline pinvoke. This sometimes caused performance
    surprises for users who wrap DLL imports with managed methods. See for
    instance #2373.

    This change lifts this limitation. Pinvokes from inlined method bodies
    are now given the same treatment as pinvokes in the root method. The
    legality check for inline pinvokes has been streamlined slightly to
    remove a redundant check. Inline pinvokes introduced by inlining are
    handled by accumulating the unmanaged method count with the value from
    inlinees, and deferring insertion of the special basic blocks until after
    inlining, so that if the only inline pinvokes come from inline instances
    they are still properly processed.

    Inline pinvokes are still disallowed in try and handler regions
    (catches, filters, and finallies).

    X87 liveness tracking was updated to handle the implicit inline frame
    var references. This was a pre-existing issue that now can show up more
    frequently. Added a test case that fails with the stock legacy jit
    (and also with the new enhancements to pinvoke). Now both the original
    failing case and this case pass.

    Inline pinvokes are also now suppressed in rarely executed blocks,
    for instance blocks leading up to throws or similar.

    The inliner is now also changed to preferentially report inline
    reasons as forced instead of always when both are applicable.

    This change adds a new test case that shows the variety of
    situations that can occur with pinvoke, inlining, and EH.

@AndyAyersMS
Copy link
Member

nor do any comments in the test itself

There is a comment in the test:

// Test cases showing interaction of inlining and inline pinvoke,
// along with the impact of EH.

It doesn't matter much what gets pinvoked, though it seems some adaptation will be needed. ProcessorCount just happened to be convenient and worked well cross-plat.

@AaronRobinsonMSFT
Copy link
Member

There is a comment in the test:

Oops. Sorry for missing that.

It doesn't matter much what gets pinvoked, though it seems some adaptation will be needed. ProcessorCount just happened to be convenient and worked well cross-plat.

Okay. Given that authoring a simple native project is so cheap, I suggest we create one for this test. Relying on implementation details in the runtime for testing seems like a bad idea.

@AaronRobinsonMSFT
Copy link
Member

I fixed the test - see #27614.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants