Consider splitting up `System.Text.Json` tests to allow running them completely with wasm/AOT #87078

radical · 2023-06-02T20:43:01Z

Running System.Text.Json.Tests with wasm/AOT crashes V8, and chrome. Issue: #86164

[22:06:45] info: [STRT] System.Text.Json.Serialization.Tests.StreamTests_DeserializeAsyncEnumerable.DeserializeAsyncEnumerable_ReadSimpleObjectAsync(count: 1000, bufferSize: 1000)
[22:06:56] info: 
[22:06:56] info: <--- Last few GCs --->
[22:06:56] info: 
[22:06:56] info: [70:0x560a9068c280]    63266 ms: Mark-Compact (reduce) 2048.2 (2071.2) -> 2048.2 (2071.2) MB, 667.36 / 0.00 ms  (average mu = 0.728, current mu = 0.001) allocation failure; scavenge might not succeed
[22:06:56] info: 
[22:06:56] info: 
[22:06:56] info: <--- JS stacktrace --->
[22:06:56] info: 
[22:06:56] info: 
[22:06:56] info: #
[22:06:56] info: # Fatal JavaScript out of memory: Reached heap limit
[22:06:56] info: #
[22:06:56] info: 
[22:07:23] info: /home/helixbot/.jsvu/bin/v8: line 2:    70 Trace/breakpoint trap   (core dumped) "/home/helixbot/.jsvu/engines/v8/v8" --snapshot_blob="/home/helixbot/.jsvu/engines/v8/snapshot_blob.bin" "$@"
[22:07:23] info: Process v8 exited with 133

We run into limits while trying to run this test suite, and try to work around that in different ways. It would be extremely useful to possibly run the tests in parts. But I'm not sure how we could split this up. We could either split this into multiple projects if there is a good logical boundary for that. Or we could pick handful of big classes, and explicitly run them in a separate helix job.

cc @kg @pavelsavara for more details on the issues we are running into.
cc @eiriktsarpalis

The text was updated successfully, but these errors were encountered:

ghost · 2023-06-02T20:43:09Z

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

Issue Details

Running System.Text.Json.Tests with wasm/AOT crashes V8, and chrome. Issue: #86164

[22:06:45] info: [STRT] System.Text.Json.Serialization.Tests.StreamTests_DeserializeAsyncEnumerable.DeserializeAsyncEnumerable_ReadSimpleObjectAsync(count: 1000, bufferSize: 1000)
[22:06:56] info: 
[22:06:56] info: <--- Last few GCs --->
[22:06:56] info: 
[22:06:56] info: [70:0x560a9068c280]    63266 ms: Mark-Compact (reduce) 2048.2 (2071.2) -> 2048.2 (2071.2) MB, 667.36 / 0.00 ms  (average mu = 0.728, current mu = 0.001) allocation failure; scavenge might not succeed
[22:06:56] info: 
[22:06:56] info: 
[22:06:56] info: <--- JS stacktrace --->
[22:06:56] info: 
[22:06:56] info: 
[22:06:56] info: #
[22:06:56] info: # Fatal JavaScript out of memory: Reached heap limit
[22:06:56] info: #
[22:06:56] info: 
[22:07:23] info: /home/helixbot/.jsvu/bin/v8: line 2:    70 Trace/breakpoint trap   (core dumped) "/home/helixbot/.jsvu/engines/v8/v8" --snapshot_blob="/home/helixbot/.jsvu/engines/v8/snapshot_blob.bin" "$@"
[22:07:23] info: Process v8 exited with 133

We run into limits while trying to run this test suite, and try to work around that in different ways. It would be extremely useful to possibly run the tests in parts. But I'm not sure how we could split this up. We could either split this into multiple projects if there is a good logical boundary for that. Or we could pick handful of big classes, and explicitly run them in a separate helix job.

cc @kg @pavelsavara for more details on the issues we are running into.
cc @eiriktsarpalis

Author:	radical
Assignees:	-
Labels:	`area-System.Text.Json`, `test-enhancement`
Milestone:	-

kg · 2023-06-02T20:44:40Z

S.T.J is also very close to the limit in the wasm interpreter configuration. If you raise the Jiterpreter limit by a few more megabytes the test suite will also hit V8's heap size limit and crash. So I think we've been just barely staying below this 2GB threshold for a while.

eiriktsarpalis · 2023-06-03T14:03:00Z

In order to understand the problem correctly, is this an issue of assembly size, number of tests, or specific tests having a very deep stack?

MichalStrehovsky · 2023-06-07T04:19:11Z

@eiriktsarpalis this is also a problem for Native AOT. The Json tests have been failing in the runtime-extra-platforms runs ever since they got enabled:

artifacts/bin/coreclr/linux.arm64.Release/build/Microsoft.NETCore.Native.targets(271,5): error MSB3073: (NETCORE_ENGINEERING_TELEMETRY=Build) The command ""/__w/1/s/artifacts/bin/coreclr/linux.arm64.Release/x64/ilc/ilc" @"/__w/1/s/artifacts/obj/System.Text.Json.SourceGeneration.Roslyn4.4.Tests/Release/net8.0/native/System.Text.Json.SourceGeneration.Roslyn4.4.Tests.ilc.rsp"" exited with code 137

It's an OOM. The test does compile for me locally but takes forever, consumes gigabytes of memory, and compiles into a 150 MB executable with a 1.5 GB PDB. That's rather large.

There are three issues:

Generic recursion. The compiler does detect this and cuts it off, but it still causes quite a bit of damage because the cutoff only kicks in for deep nesting to prevent compiling forever/until memory runs out. It also produces warnings about this but unfortunately we disable them for test builds so it's not surfaced:

runtime/eng/testing/tests.singlefile.targets

Line 35 in 10222f9

<SuppressAotAnalysisWarnings>true</SuppressAotAnalysisWarnings>
There are tens of megabytes of JsonConverter<T> instances. Instances over value types are not shared and there's a lot of valuetypes being serialized. Some of them might not be intentional - for example ClassWithReadOnlyPropertyQueue says it's a class but it's in fact a struct - there are dozens of these ClassWith that are structs. Each unique JsonConverter<T> instance over a struct costs more than 100 kB in size in the test. The 90% of this 100+ kB is caused by:
CreateCastingConverter generic method. This is a generic virtual method on a generic type. This patterns causes an N * M generic expansion in AOT. The test ends up calling the method with about 250 unique T's (we only count valuetypes). This means that each of the JsonConverter<T> instances in the test get extra 250 specializations. There's hundreds of JsonConverter instances in the app so this multiplies into unreasonable numbers (tens of thousands of CreateCastingConverter method bodies) very quickly.

Getting rid of the generic recursion would be probably the best first step. If we can turn some of the structs in the test into classes, that would help too. The generic virtual method is a huge problem because of the recursion but it will probably scale pretty poorly for large JSON models in general.

I have reasons to believe the recursion is still around handling of Queue<T> because the warnings look like this:

ILC: AOT analysis warning IL3054: System.Text.Json.Serialization.Metadata.JsonTypeInfo`1.<DeserializeAsyncEnumerable>d__4<Queue`1<Queue`1<Queue`1<Queue`1<SByte[]>>>>>: Generic expansion to 'System.Text.Json.Serialization.Metadata.JsonTypeInfo`1<Queue`1<Queue`1<Queue`1<Queue`1<Queue`1<SByte[]>>>>>>' was aborted due to generic recursion. An exception will be thrown at runtime if this codepath is ever reached. Generic recursion also negatively affects compilation speed and the size of the compilation output. It is advisable to remove the source of the generic recursion by restructuring the program around the source of recursion. The source of generic recursion might include: 'System.Text.Json.Serialization.Metadata.JsonTypeInfo`1', 'System.Text.Json.Serialization.JsonConverter`1', 'System.Text.Json.Serialization.JsonResumableConverter`1', 'System.Text.Json.Serialization.JsonDictionaryConverter`1', 'System.Text.Json.Serialization.JsonConverter.CreateCastingConverter<TTarget>()', 'System.Text.Json.Serialization.JsonConverterFactory.CreateCastingConverter<TTarget>()', 'System.Text.Json.Serialization.JsonConverter`1.CreateCastingConverter<TTarget>()', 'System.Text.Json.Serialization.Converters.CastingConverter`1', 'System.Text.Json.Serialization.Converters.JsonMetadataServicesConverter`1', 'System.Text.Json.Serialization.Metadata.DefaultJsonTypeInfoResolver.DeterminePropertyAccessors<T>(JsonPropertyInfo`1<T>,MemberInfo)', 'System.Text.Json.Serialization.Metadata.JsonPropertyInfo`1', 'System.Text.Json.Serialization.Metadata.JsonTypeInfo`1.<DeserializeAsync>d__1', 'System.Text.Json.Serialization.Metadata.JsonTypeInfo`1.<DeserializeAsyncEnumerable>d__4', 'System.Text.Json.Serialization.Metadata.JsonTypeInfo`1.<DeserializeAsObjectAsync>d__6', 'System.Text.Json.Serialization.Metadata.JsonTypeInfo`1.<SerializeAsync>d__10', 'System.Text.Json.Serialization.Metadata.JsonParameterInfo`1'

MichalStrehovsky · 2023-06-07T04:24:37Z

If you want to see the extent of it, dotnet tool install sizoscope --global --framework net7.0-windows, run sizoscope and open this file in the GUI:

System.Text.Json.SourceGeneration.Roslyn4.4.Tests.zip

You'll see something like this:

kg · 2023-06-07T04:42:25Z

In order to understand the problem correctly, is this an issue of assembly size, number of tests, or specific tests having a very deep stack?

AFAIK stack depth is not the problem here, for the wasm interp scenario the native heap as a whole is growing until it hits a limit. If I had to guess it's probably a mix of type info, generated interpreter code, jitted wasm code (for the interp+jiterp scenario), etc. I don't know offhand how to get measurements out of the mono runtime on this stuff but if you need measurements I can probably figure it out. In the V8 console scenario, the native heap has to approach 2GB of allocated space before this happens. I'm not sure why the wasm AOT version is running out of memory, to be honest - I would expect it to use less memory, but it's possible that loaded AOT binaries count against V8's heap size limit.

We're getting OOM-killed in the CI. See dotnet#87078. I'm also changing how we do warning suppressions. We still need to suppress most of AOT warnings because xUnit/tests are not warning clean, but the warning about generic recursion is critical for the product and should fail the build.

We're getting OOM-killed in the CI. See #87078. I'm also changing how we do warning suppressions. We still need to suppress most of AOT warnings because xUnit/tests are not warning clean, but the warning about generic recursion is critical for the product and should fail the build.

eiriktsarpalis · 2023-06-07T09:06:56Z

There are tens of megabytes of JsonConverter instances. Instances over value types are not shared and there's a lot of valuetypes being serialized. Some of them might not be intentional - for example ClassWithReadOnlyPropertyQueue says it's a class but it's in fact a struct - there are dozens of these ClassWith that are structs.

I think that's unavoidable given the nature of testing serializers, effectively we tend to define one new type per test and many of these tests need to include coverage for structs (to make sure that each facet of the serializer correctly accounts for struct semantics). I think we could try cutting a few struct definitions here and there when not strictly necessary from a test coverage perspective but I wouldn't expect huge gains from that exercise.

CreateCastingConverter generic method. This is a generic virtual method on a generic type. This patterns causes an N * M generic expansion in AOT.

I recently merged #80755 that made changes to CreateCastingConverter out of similar size concerns, and it appears that this method no longer needs to defined on JsonConverter<T>. I'll follow-up with a PR.

eiriktsarpalis · 2023-06-07T20:30:37Z

@MichalStrehovsky new sizes after #87211 got merged.

eiriktsarpalis · 2023-06-08T14:19:58Z

With #87276 it looks like the total size has dropped to 60 MB. Should we try re-enabling the tests once merged?

radical · 2023-06-08T17:44:32Z

With #87276 it looks like the total size has dropped to 60 MB. Should we try re-enabling the tests once merged?

I have opened #87256 which we can try once #87276 is merged.

eiriktsarpalis · 2023-06-08T18:39:40Z

PR is merged, feel free to rebase your PR branch.

radical · 2023-06-12T20:52:43Z

With #87276 it looks like the total size has dropped to 60 MB. Should we try re-enabling the tests once merged?

It still fails:

[19:22:37] info: [STRT] System.Text.Json.Serialization.Tests.EnumConverterTests+UInt16EnumDictionary.SerilizeDictionaryWhenCacheIsFull
[19:22:43] info: 
[19:22:43] info: <--- Last few GCs --->
[19:22:43] info: 
[19:22:43] info: [92:0x563b327ef2f0]    89765 ms: Mark-Compact 2034.6 (2069.8) -> 2034.6 (2069.8) MB, 672.20 / 0.00 ms  (average mu = 0.913, current mu = 0.006) allocation failure; scavenge might not succeed
[19:22:43] info: [92:0x563b327ef2f0]    90441 ms: Mark-Compact 2037.5 (2072.8) -> 2037.5 (2072.8) MB, 674.53 / 0.00 ms  (average mu = 0.827, current mu = 0.002) allocation failure; scavenge might not succeed
[19:22:43] info: 
[19:22:43] info: 
[19:22:43] info: <--- JS stacktrace --->
[19:22:43] info: 
[19:22:43] info: 
[19:22:43] info: #
[19:22:43] info: # Fatal JavaScript out of memory: Reached heap limit
[19:22:43] info: #
[19:22:43] info: 
[19:23:11] info: /home/helixbot/.jsvu/bin/v8: line 2:    92 Trace/breakpoint trap   (core dumped) "/home/helixbot/.jsvu/engines/v8/v8" --snapshot_blob="/home/helixbot/.jsvu/engines/v8/snapshot_blob.bin" "$@"

pavelsavara · 2023-06-12T21:04:23Z

Adding pre-test yield to few test classes will allow JS engine to run timer loops. It may give our GC chance to run.

xunit syntax is

        public async Task InitializeAsync()
        {
            await Task.Yield();
        }

But v8 timer loop is strange, still worth trying, I think.

lewing · 2023-07-12T16:57:51Z

cc @BrzVlad @vargaz

vargaz · 2023-07-13T02:02:51Z

Did some experiments:
Running with -s MAXIMUM_MEMORY=1073741824 (1GB) succeeds.
Running with -s -s MAXIMUM_MEMORY=536870912 (512mb) fails in a controller manner:

  �[40m�[32minfo�[39m�[22m�[49m: [STRT] System.Text.Json.Serialization.Tests.StreamTests_AsyncWithSmallBuffer.VeryLargeJsonFileTest(payloadSize: 1000, ignoreNull: False, writeIndented: False)
  �[40m�[32minfo�[39m�[22m�[49m: console.debug: [MONO] GC_MINOR: (Nursery full) time 3.62ms, stw 3.62ms promoted 3302K major size: 41984K in use: 33033K los size: 62016K in use: 51968K
  �[40m�[32minfo�[39m�[22m�[49m: console.debug: [MONO] GC_MINOR: (Nursery full) time 3.92ms, stw 3.92ms promoted 3710K major size: 44608K in use: 36825K los size: 62016K in use: 51968K
  �[40m�[32minfo�[39m�[22m�[49m: console.debug: [MONO] GC_MINOR: (Nursery full) time 3.67ms, stw 3.67ms promoted 3709K major size: 48128K in use: 40615K los size: 62016K in use: 51968K
  �[41m�[30mfail�[39m�[22m�[49m: [FAIL] System.Text.Json.Serialization.Tests.StreamTests_AsyncWithSmallBuffer.VeryLargeJsonFileTest(payloadSize: 1000, ignoreNull: False, writeIndented: False)
  �[40m�[32minfo�[39m�[22m�[49m: System.OutOfMemoryException : Out of memory
  �[40m�[32minfo�[39m�[22m�[49m:    at System.Text.Json.JsonReaderHelper.TranscodeHelper(ReadOnlySpan`1 )
  �[40m�[32minfo�[39m�[22m�[49m:    at System.Text.Json.JsonSerializer.WriteString[List`1](List`1& , JsonTypeInfo`1 )
  �[40m�[32minfo�[39m�[22m�[49m:    at System.Text.Json.Serialization.Tests.StreamTests.VeryLargeJsonFileTest(Int32 payloadSize, Boolean ignoreNull, Boolean writeIndented)
  �[40m�[32minfo�[39m�[22m�[49m:    at System.Text.Json.JsonReaderHelper.TranscodeHelper(ReadOnlySpan`1 )
  �[40m�[32minfo�[39m�[22m�[49m:    at System.Text.Json.JsonSerializer.WriteString[List`1](List`1& , JsonTypeInfo`1 )

So it looks like the linear memory usage is somewhere between 512mb and 1gb. Not sure what causes v8 to consume 2gb of memory.

vargaz · 2023-07-13T02:35:14Z

I can reproduce this when running with v8 --max-old-space-size=2024.

lewing · 2023-07-13T02:56:04Z

@pavelsavara @kg are there js objects we could be leaking/retaining?

kg · 2023-07-13T02:59:24Z

Could this be heap growth? i.e. growing from 512mb to 1gb requires being able to allocate contiguously 512mb + 1gb in the v8 perm gen, and it's not big enough

pavelsavara · 2023-07-13T09:31:30Z

@pavelsavara @kg are there js objects we could be leaking/retaining?

That unit test doesn't do any JS interop. The only one is that we hold Task/Promise pair for the main method.

I'm thinking about emscripten's posix emulation leaking.
If there are VFS files created as part of the test, they would consume JS heap.
What else ?

And I guess I should better understand how we allocate/free memory. I don't know what #ifdef in major_free_swept_blocks means yet.

runtime/src/mono/mono/sgen/sgen-marksweep.c

Lines 2129 to 2138 in b9c9964

    
           major_free_swept_blocks (size_t section_reserve) 
        
           { 
        
           	SGEN_ASSERT (0, sweep_state == SWEEP_STATE_SWEPT, "Sweeping must have finished before freeing blocks"); 
        
           #if defined(HOST_WIN32) || defined(HOST_ORBIS) || defined (HOST_WASM) 
        
           		/* 
        
           		 * sgen_free_os_memory () asserts in mono_vfree () because windows doesn't like freeing the middle of 
        
           		 * a VirtualAlloc ()-ed block. 
        
           		 */ 
        
           		return;

BrzVlad · 2023-07-13T10:14:52Z

A major GC block has 16kb and when we allocate blocks we allocate them in chunks (of 32 blocks at a time). Each block will be in a free list and will be used by the GC when needed. After a collection, in major_free_swept_blocks we attempt to free blocks also in various chunk sizes, that is not identical to the original allocation. So we attempt to unmap in the middle of mapped memory, which I know doesn't work on Windows. Not sure how much benefit we get for this complication of avoiding to call into OS map/unmap.

pavelsavara · 2023-07-13T12:12:00Z

To do more testing see #88825 (comment)

eiriktsarpalis · 2023-07-19T15:51:28Z

It's unlikely we'll be able to address this in .NET 8, moving to future.

pavelsavara · 2023-07-27T14:01:33Z

From #88825 (comment)

It would be useful to get some GC log for those offending test suites. Env: MONO_LOG_LEVEL=debug and MONO_LOG_MASK=gc

lewing · 2024-02-09T06:13:38Z

cc @vitek-karas @artl93

closing as resolved but the large trim unfriendly test assemblies are still very much a thing

radical added area-System.Text.Json test-enhancement Improvements of test source code labels Jun 2, 2023

ghost added the untriaged New issue has not been triaged by the area owner label Jun 2, 2023

MichalStrehovsky mentioned this issue Jun 7, 2023

Disable S.T.Json tests on NativeAOT #87203

Merged

eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label Jun 7, 2023

eiriktsarpalis added this to the 8.0.0 milestone Jun 7, 2023

eiriktsarpalis mentioned this issue Jun 7, 2023

Move the CreateCastingConverter<T>() method to the base JsonConverter type. #87211

Merged

eiriktsarpalis mentioned this issue Jun 8, 2023

Fix DeserializeAsyncEnumerable generic recursion issue. #87276

Merged

eiriktsarpalis mentioned this issue Jun 13, 2023

Remove test exclusions for Json tests #87457

Merged

vargaz mentioned this issue Jul 14, 2023

[mono][interp] Fix memory leaks for interpreted dynamic methods. #88892

Merged

eiriktsarpalis modified the milestones: 8.0.0, Future Jul 19, 2023

MichalStrehovsky mentioned this issue Jul 21, 2023

Move tests from full facade assemblies into type forward destination #89230

Closed

26 tasks

vargaz mentioned this issue Jul 22, 2023

[wasm] v8 crashing at the end with "Reached heap limit" - for AOT builds #88497

Closed

jtschuster mentioned this issue Dec 13, 2023

ilc crash while compiling System.Text.Json.SourceGeneration.Roslyn tests #91885

Closed

lewing closed this as completed Feb 9, 2024

github-actions bot locked and limited conversation to collaborators Mar 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider splitting up `System.Text.Json` tests to allow running them completely with wasm/AOT #87078

Consider splitting up `System.Text.Json` tests to allow running them completely with wasm/AOT #87078

radical commented Jun 2, 2023

ghost commented Jun 2, 2023

kg commented Jun 2, 2023

eiriktsarpalis commented Jun 3, 2023

MichalStrehovsky commented Jun 7, 2023

MichalStrehovsky commented Jun 7, 2023

kg commented Jun 7, 2023 •

edited

Loading

eiriktsarpalis commented Jun 7, 2023

eiriktsarpalis commented Jun 7, 2023

eiriktsarpalis commented Jun 8, 2023

radical commented Jun 8, 2023

eiriktsarpalis commented Jun 8, 2023

radical commented Jun 12, 2023

pavelsavara commented Jun 12, 2023

lewing commented Jul 12, 2023

vargaz commented Jul 13, 2023

vargaz commented Jul 13, 2023

lewing commented Jul 13, 2023

kg commented Jul 13, 2023

pavelsavara commented Jul 13, 2023

BrzVlad commented Jul 13, 2023

pavelsavara commented Jul 13, 2023

eiriktsarpalis commented Jul 19, 2023

pavelsavara commented Jul 27, 2023

lewing commented Feb 9, 2024

Consider splitting up System.Text.Json tests to allow running them completely with wasm/AOT #87078

Consider splitting up System.Text.Json tests to allow running them completely with wasm/AOT #87078

Comments

radical commented Jun 2, 2023

ghost commented Jun 2, 2023

kg commented Jun 2, 2023

eiriktsarpalis commented Jun 3, 2023

MichalStrehovsky commented Jun 7, 2023

MichalStrehovsky commented Jun 7, 2023

kg commented Jun 7, 2023 • edited Loading

eiriktsarpalis commented Jun 7, 2023

eiriktsarpalis commented Jun 7, 2023

eiriktsarpalis commented Jun 8, 2023

radical commented Jun 8, 2023

eiriktsarpalis commented Jun 8, 2023

radical commented Jun 12, 2023

pavelsavara commented Jun 12, 2023

lewing commented Jul 12, 2023

vargaz commented Jul 13, 2023

vargaz commented Jul 13, 2023

lewing commented Jul 13, 2023

kg commented Jul 13, 2023

pavelsavara commented Jul 13, 2023

BrzVlad commented Jul 13, 2023

pavelsavara commented Jul 13, 2023

eiriktsarpalis commented Jul 19, 2023

pavelsavara commented Jul 27, 2023

lewing commented Feb 9, 2024

Consider splitting up `System.Text.Json` tests to allow running them completely with wasm/AOT #87078

Consider splitting up `System.Text.Json` tests to allow running them completely with wasm/AOT #87078

kg commented Jun 7, 2023 •

edited

Loading