Experiment, parallelize some tests #17662

majocha · 2024-09-04T07:50:51Z

To run tests in parallel we must deal with global resources and global state accessed by the test cases.

Out of proc:
Tests running as separate processes are sharing the file system. We must make sure they execute in their own temporary directories and don't overwrite any hardcoded paths. This is already done, mostly in separate PR.

Hosted:
Many tests use hosted compiler and FsiEvaluationSession, sharing global resources and global state within the runner process:

Console streams - this is swept under a rug for now by using a simple AsyncLocal stream splitter.
FileSystem global mutable of the file system shim - few tests that mutate it, must be excluded from parallelization.
Environment.CurrentDirectory - many tests executing in hosted session were doing a variation of File.WriteAllText("test.ok", "ok") all in the current directory i.e. bin, leading to conflicts. This is replaced with a threadsafe mechanism.
Environment variables, Path - mostly this applies to DependencyManager, excluded from parallelization for now.
Async default cancellation token - few tests doing Async.CancelDefaultToken() must be excluded from parallelization.
global state used in conjunction with --times option - tests excluded from parallelization.
global mutable state in the form of multiple caches implemented as ConcurrentDictionary. This needs further investigation.

I'll ad to the above list if I recall anything else.

Problems:
Tests depending on tight timing, orchestrating stuff by combinations of Thread.Sleep, Async.Sleep and wait timeouts.
These are mostly excluded from parallelization, some attempts at fixing things were made.

Obscure compiler bugs revealed in this PR:

Internal error: value cannot be null this mostly happens in coreClr, one time, sometimes a few times during the test run.
Error creating evaluation session because of NRE somewhere in TcImports.BuildNonFrameworkTcImports. This is more rare but may be related to the above.

These were related to some concurrency issues; modyfing frameworkTcImportsCache without lock and a bug in custom lazy implementation in il.fs. Hopefully both fixed now.

Running in parallel:
Xunit runners are configured with mostly default parallelization settings.

dotnet test .\FSharp.sln -c Release -f net9.0 will run all discovered test assemblies in parallel as soon as they're built.
This can be limited with the -m switch. For example,
dotnet test -m:2 .\FSharp.Compiler.Service.sln
will limit the test run to at most 2 simultaneous processes. Still, each test host process runs its test collections in parallel.

Some test collections are excluded form parallelization with [<Collection(nameof DoNotRunInParallel)>] attribute.

Running in the IDE with "Run tests in parallel" enabled will respect xunit.runner.json settings and the above exclusions.

TODO:

Make sure this keeps working properly with BUILDING_USING_DOTNET scenario (Attempt to make FCS solution build without arcade and with the SDK specified in global.json #14677)

github-actions · 2024-09-04T07:51:34Z

❗ Release notes required

✅ Found changes and release notes in following paths:

Change path Release notes path Description

src/Compiler docs/release-notes/.FSharp.Compiler.Service/9.0.200.md

majocha · 2024-09-04T10:02:11Z

Yeah this will not do much.

A lot of test cases write to stdout. We do captureConsoleOutputs in CompilerAssert but in a way that does not allow parallel execution. Not to mention in a lot of places there are printfn calls that will mangle outputs even more. All of those tests must run sequentially.

This needs a systemic approach to deal with stdout capture, maybe xUnit have some mechanism to do this in parallel.

psfinaki · 2024-09-04T10:51:01Z

@majocha guess what, I was going to try the same today :)

Thanks for looking into that. Indeed, it seems like we should just damn stop printing everything to the console, it's likely an artifact of older testing approaches. I cannot see any reason to do this instead of just in memory processing.

majocha · 2024-09-04T15:00:48Z

xUnit does not run tests from the same module in parallel. It also does not parallelize Theories.
This can slow things down even with parallel execution enabled. In FSharpSuite we have modules with lots of slow tests.

This can be mitigated by customizing xUnit in code, I think?

psfinaki · 2024-09-04T16:33:58Z

By customizing xUnit you mean setting up some special runner settings and assembly attributes?

We can do that. However - I am not a fan of this idea. xUnit's philosophy is really to apply good coding practices to tests. As in, write tests as you write code. Hence, e.g. compared to Nunit, it offers a very limited test platform voodoo (think fixtures, setup/teardown and so on), instead making as much as possible of the builtin language capabilities.

And so I would instead prefer keeping up with this philosophy. If, by default, xUnit parallelizes execution on the module level, then we should actually split modules into smaller ones - thereby it will improve code clarity and will generally add to better code organization :)

What do you think?

majocha · 2024-09-05T07:43:17Z

By customizing I meant something like https://www.meziantou.net/parallelize-test-cases-execution-in-xunit.htm

But this is not the most pressing thing and probably not needed if splitting modules would do.

The biggest hurdle for now is correctly isolating the console when running tests in parallel. Redirecting with Console.SetOut for each test won't work anymore when another thread can also redirect it unpredictably.

Console writes come from multiple sources:

In process executing fsi, probably also fsc
Various printfn and log calls sprinkled through the test cases and helper code.
Source code compiled and executed in AssemblyLoadContext (or AppDomain in case of net472).

While we can manage 1. and 2., 3. is a bit harder.

majocha · 2024-09-05T07:55:09Z

I've been chatting with Bing / Copilot about it, and it actually proposed a not bad idea:

Don't redirect the Console at all for individual tests. Instead install a custom thread splitting TextWriter upfront.
That splitting writer will keep a ThreadLocal inner TextWriter and direct all writes to it.

psfinaki · 2024-09-05T13:07:05Z

Thanks for the further investigations here.

In the spirit of my comment above - I just vote for reinventing as few wheels as possible and removing those we've already reinvented here :) Unit tests rarely need any output at all, but if they do - it's good to use those few means that xUnit provides for this, which are basically "plug in the writer if and when you need to".

I think it aligns with your thoughts above? It's important to make gradual changes here, probably actually in the way you outline it above. The current direction you're taking (removing stuff) looks promising!

Note, I am off until Monday with limited internet connection so cannot play with the code myself. Also, we've discussed this PR internally yesterday and were all very happy that things are moving in this space!

majocha · 2024-09-06T11:41:51Z

This is at a state that can be run locally in VS test explorer or from the console with build -c Release -testCoreClr
I think it's shaving a few minutes from build -testCoreClr locally, this could be further improved by breaking up large modules in ComponentTests and FSharpSuite. Definitely I see much better CPU utilization.

In the CI there's that weird TaskCancelledException in random tests, no idea where it comes from.

Still, there are some minor fixes here that I'll try to extract to another PR.

majocha · 2024-10-07T15:26:57Z

This started today I think:
https://dev.azure.com/dnceng-public/public/_build/results?buildId=829440&view=logs&j=2f0d093c-1064-5c86-fc5b-b7b1eca8e66a&t=52d0a7a6-39c9-5fa2-86e8-78f84e98a3a2&l=45

./build.sh --ci --configuration Release --restore --build --pack --publish -bl /p:SourceBuildNonPortable= /p:ArcadeBuildFromSource=true /p:DotNetBuildSourceOnly=true /p:DotNetBuildRepo=true /p:AssetManifestFileName=SourceBuild_Managed.xml
./build.sh: line 16: /__w/1/s/eng/build.sh: Permission denied

No idea what's it about.

majocha · 2024-10-10T16:55:35Z

I'm not giving up on this. I'll squash this, clean up a bit and post another draft PR.
I have some wins wrt. utilizing xUnit standard output mechanism and parallel execution of theory cases / collection cases.

T-Gro · 2024-10-11T09:28:20Z

I'm not giving up on this. I'll squash this, clean up a bit and post another draft PR. I have some wins wrt. utilizing xUnit standard output mechanism and parallel execution of theory cases / collection cases.

This is good news @majocha .
If you think this could be enabled per project/collection, this would be a good alternative to postpone solving some of the problems.

(e.g. FSharpSuite is the slowest one at CI, but likely avoids some of the issues because compilation is via separate .exe invocation. Therefore things like shared state inside the compiler should not matter here that much)

majocha · 2024-10-11T11:34:59Z

@T-Gro there are a lot of modifications in FSharp.Test.Utilities that I think are in use in basically all test projects so this cannot be just disabled selectively as in: use previous implementation. It can be selectively throttled down, even down to full sequential execution, per project, per module etc.

I've been timing the test runs locally a bit, what is really problematic and bottlenecked by something is the net472 target. The slowest here for me is ComponentTests, throwing additional cores at it does nothing in net4, there's just no CPU utilization. I suspect it's the thousands of appdomains it creates and unloads.

net9.0 -testCoreClr runs for me locally in around 4 minutes now:

What I've been struggling with atm is hanging processes because of files getting locked. For example the test run hangs for 5 minutes and the dump indicates ilread.fs waits to read "System.Security.Cryptography.Primitives.dll" in the dotnet sdk folder. wth?

Anyway, I'll just post what I got in another PR. It'd be good to test this locally on different machines.

See #17872

T-Gro · 2024-10-11T11:57:33Z

Which File IO call was it, was that visible in the stack trace?
We might check if we are using the best set of switches for a read-only operation.

majocha · 2024-10-11T12:00:58Z

Which File IO call was it, was that visible in the stack trace? We might check if we are using the best set of switches for a read-only operation.
It was here IIRC

fsharp/src/Compiler/AbstractIL/ilread.fs

Line 250 in 20f1408

| Some(start, length) -> stream.ReadBytes(start, length)

I don't think I still have the dump file, but I added --blame-hang-timeout to the script so if it reproduces in CI, should be possible to debug.

majocha · 2024-10-11T12:03:02Z

Closing in favor of #17872

T-Gro · 2024-10-11T12:12:04Z

There might be a race condition in the way the ilModuleReaderCache works, or how the flags are set.
But system dlls should only be ever read from in the build, so it would be good to make it work without shadow copying.

fix compile

4471183

psfinaki changed the title ~~Expreriment, parallelize some tests~~ Experiment, parallelize some tests Sep 4, 2024

threadlocal console splitter

932d12c

majocha force-pushed the parallelize-tests branch from 37918b1 to 932d12c Compare September 5, 2024 09:04

majocha added 6 commits September 5, 2024 11:06

Merge branch 'main' into parallelize-tests

1c945b5

fix

8ff95d4

just to be sure

003399a

fix

219b1de

fix failing

c2ec08b

gaah

1700504

majocha added 6 commits September 5, 2024 15:59

wip

7501dc3

wip

7f9798f

unused?

c6ab38d

wip

4e9e461

Merge branch 'main' into parallelize-tests

249b1cf

maybe?

ef14588

majocha added 5 commits September 6, 2024 16:21

deny appdomain, tests should be isolated already?

87ad15c

give it some more time

3d32f29

try to fix some more tests

e8e8413

core mailbox > tasks

67ef5f5

try fix tests

6af4cc6

majocha force-pushed the parallelize-tests branch from 3befef0 to c632695 Compare October 6, 2024 17:07

majocha added 8 commits October 6, 2024 19:59

cpu limit in ci

0276755

Merge branch 'main' into fsc-stdout

5d2ccd6

merge fsc-stdout

f961823

wip

e75f796

wip

f77ec61

wip

0bee824

wip

76fc354

that's a funny thing to time out

346ddb7

majocha mentioned this pull request Oct 7, 2024

Asorted tests improvements #17840

Merged

majocha added 2 commits October 7, 2024 16:49

Merge branch 'main' into parallelize-tests

5643336

.

ae81bd6

majocha force-pushed the parallelize-tests branch from aca0134 to ae81bd6 Compare October 7, 2024 15:11

majocha closed this Oct 7, 2024

majocha reopened this Oct 7, 2024

majocha added 5 commits October 7, 2024 19:46

Merge branch 'main' into parallelize-tests

601bf47

wip

22ee1c6

unskip depman test

635ba14

Merge branch 'main' into parallelize-tests

aa570c6

Merge branch 'main' into parallelize-tests

24b6f00

majocha mentioned this pull request Oct 11, 2024

Parallelize tests #17872

Merged

8 tasks

majocha closed this Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment, parallelize some tests #17662

Experiment, parallelize some tests #17662

majocha commented Sep 4, 2024 •

edited

Loading

github-actions bot commented Sep 4, 2024 •

edited

Loading

majocha commented Sep 4, 2024

psfinaki commented Sep 4, 2024

majocha commented Sep 4, 2024

psfinaki commented Sep 4, 2024

majocha commented Sep 5, 2024

majocha commented Sep 5, 2024

psfinaki commented Sep 5, 2024

majocha commented Sep 6, 2024 •

edited

Loading

majocha commented Oct 7, 2024

majocha commented Oct 10, 2024

T-Gro commented Oct 11, 2024

majocha commented Oct 11, 2024 •

edited

Loading

T-Gro commented Oct 11, 2024

majocha commented Oct 11, 2024

majocha commented Oct 11, 2024

T-Gro commented Oct 11, 2024

Experiment, parallelize some tests #17662

Experiment, parallelize some tests #17662

Conversation

majocha commented Sep 4, 2024 • edited Loading

github-actions bot commented Sep 4, 2024 • edited Loading

❗ Release notes required

majocha commented Sep 4, 2024

psfinaki commented Sep 4, 2024

majocha commented Sep 4, 2024

psfinaki commented Sep 4, 2024

majocha commented Sep 5, 2024

majocha commented Sep 5, 2024

psfinaki commented Sep 5, 2024

majocha commented Sep 6, 2024 • edited Loading

majocha commented Oct 7, 2024

majocha commented Oct 10, 2024

T-Gro commented Oct 11, 2024

majocha commented Oct 11, 2024 • edited Loading

T-Gro commented Oct 11, 2024

majocha commented Oct 11, 2024

majocha commented Oct 11, 2024

T-Gro commented Oct 11, 2024

majocha commented Sep 4, 2024 •

edited

Loading

github-actions bot commented Sep 4, 2024 •

edited

Loading

majocha commented Sep 6, 2024 •

edited

Loading

majocha commented Oct 11, 2024 •

edited

Loading