-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve serializer performance #57327
Conversation
Tagging subscribers to this area: @eiriktsarpalis, @layomia Issue DetailsPrimarily addresses serializer overhead of a given call, but there are general gains throughout. Changes include:
It primarily resolves this issue with overhead: OverheadA class with no members: 5.0
6.0 before this PR
6.0 with this PR
Serialize overhead: 1.24x faster Also, if this test is changed to add 2 String properties: 5.0
6.0 before this PR
6.0 before this PR
Serialize: 1.1x faster compared to before this PR but also should resolve these (verification needed for some): resolves #57093 Improvements in benchmarks with this PR
6.0 status (with this PR) of benchmarks compared to 5.0
|
@@ -16,6 +16,9 @@ internal class ObjectDefaultConverter<T> : JsonObjectConverter<T> where T : notn | |||
{ | |||
internal override bool CanHaveIdMetadata => true; | |||
|
|||
#if NET6_0_OR_GREATER | |||
[MethodImpl(MethodImplOptions.AggressiveOptimization)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, what are some of the optimizations that were being missed without this attribute? I had tried using it a few months back on the JsonConverter<T>.TryWrite
method (which is also fairly large) and did not receive any perf improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how to find this out, with tiered jitting and all (e.g. hit a breakpoint after running a 1,000 times to inspect the native code?). Also, I did not run with crossgen2
yet, but typically that has the same results once the warm-up phase is done.
UPDATE: the assembly can be viewed by a checked build and using DOTNET_JitDisasm=methodName.
The results are small when there are a couple properties but should be greater when there are more properties. I had to run benchmarks a few times to make sure noise isn't a factor. I welcome investigation from others on this.
Here's some results with and without AggressiveOptimization
. Again, I assume the more properties, the more savings. Also I tend to focus on the Min
column more than Mean
.
With AO and without
Using a class with 2 string properties; I ran 5 times and took the fastest.
With AO:
Method | Mean | Error | StdDev | Gen 0 | Allocated |
---|---|---|---|---|---|
Serialize | 197.1 ns | 0.34 ns | 0.27 ns | - | - |
Deserialize | 330.1 ns | 0.42 ns | 0.38 ns | 0.0161 | 104 B |
Without AO:
Method | Mean | Error | StdDev | Gen 0 | Allocated |
---|---|---|---|---|---|
Serialize | 198.8 ns | 0.10 ns | 0.09 ns | - | - |
Deserialize | 333.1 ns | 1.14 ns | 1.01 ns | 0.0161 | 104 B |
Running ReadJson<LargeStructWithProperties>
and WriteJson<LargeStructWithProperties>
since it has 10 properties. I ran the benchmarks 3 times and took the fastest. The "Stream" cases didn't get any faster -- not sure why, although that does have a different code path in the object converter.
With AO:
Method | Mean | Error | StdDev | Median | Min | Max | Gen 0 | Allocated |
---|---|---|---|---|---|---|---|---|
SerializeToString | 650.1 ns | 5.49 ns | 4.87 ns | 648.0 ns | 643.8 ns | 656.6 ns | 0.0765 | 488 B |
SerializeToUtf8Bytes | 618.7 ns | 7.81 ns | 7.31 ns | 615.9 ns | 609.7 ns | 633.3 ns | 0.0577 | 376 B |
SerializeToStream | 766.6 ns | 10.86 ns | 9.63 ns | 766.1 ns | 754.1 ns | 783.0 ns | 0.0368 | 232 B |
SerializeObjectProperty | 831.6 ns | 13.94 ns | 11.64 ns | 828.7 ns | 819.1 ns | 862.5 ns | 0.1347 | 848 B |
Without AO:
Method | Mean | Error | StdDev | Median | Min | Max | Gen 0 | Allocated |
---|---|---|---|---|---|---|---|---|
SerializeToString | 667.1 ns | 2.42 ns | 2.15 ns | 667.5 ns | 661.5 ns | 669.4 ns | 0.0754 | 488 B |
SerializeToUtf8Bytes | 620.1 ns | 3.00 ns | 2.66 ns | 620.0 ns | 615.0 ns | 624.4 ns | 0.0598 | 376 B |
SerializeToStream | 760.7 ns | 0.80 ns | 0.67 ns | 760.6 ns | 759.8 ns | 762.1 ns | 0.0370 | 232 B |
SerializeObjectProperty | 839.5 ns | 8.38 ns | 7.84 ns | 838.3 ns | 826.0 ns | 851.7 ns | 0.1340 | 848 B |
With AO:
Method | Mean | Error | StdDev | Median | Min | Max | Gen 0 | Allocated |
---|---|---|---|---|---|---|---|---|
DeserializeFromString | 1.114 us | 0.0083 us | 0.0070 us | 1.114 us | 1.104 us | 1.131 us | 0.0318 | 200 B |
DeserializeFromUtf8Bytes | 1.042 us | 0.0118 us | 0.0111 us | 1.040 us | 1.028 us | 1.065 us | 0.0289 | 200 B |
DeserializeFromStream | 1.519 us | 0.0189 us | 0.0176 us | 1.516 us | 1.493 us | 1.556 us | 0.0478 | 328 B |
Without AO:
Method | Mean | Error | StdDev | Median | Min | Max | Gen 0 | Allocated |
---|---|---|---|---|---|---|---|---|
DeserializeFromString | 1.151 us | 0.0046 us | 0.0041 us | 1.152 us | 1.143 us | 1.156 us | 0.0276 | 200 B |
DeserializeFromUtf8Bytes | 1.063 us | 0.0141 us | 0.0132 us | 1.065 us | 1.043 us | 1.081 us | 0.0295 | 200 B |
DeserializeFromStream | 1.497 us | 0.0053 us | 0.0047 us | 1.496 us | 1.489 us | 1.507 us | 0.0479 | 328 B |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end, after running thousands of times, the jitted (native) code may be more or less the same with or without the [AO]. But [AO] should produce faster code sooner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please avoid [AO]?
Methods with [AO] aren't instrumented so won't be able to benefit from PGO (devirtualize calls, move cold parts of methods further from hot ones, etc) - e.g. run your benchmark again with DOTNET_ReadyToRun=0
, DOTNET_TC_QuickJitForLoops=1
, DOTNET_TieredPGO=1
with and without [AO]
Also, when a method is promoted to tier1 naturally (without [AO]`) we can:
- Get rid of static initializations in codegen
- Convert
static readonly
fields to constants - Inliner is able to resolve
call
tokens and produce better results
/cc @AndyAyersMS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[AO] should be avoided. It causes methods to be optimized prematurely. We use it only in a handful of places where we feel like we can't afford to ever run unoptimized code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll re-run with those settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #58209 for reverting the 4 usages of [AO]. Note I was going to keep the 2 in JsonConverter<T>
since there does seem to benefits at least sometimes, but for V7 I think it makes sense to remove them based upon offline discussion with @EgorBo and "DynamicPGO" scenarios. I don't plan on porting that PR to V6.
|
||
#if NET6_0_OR_GREATER | ||
[MethodImpl(MethodImplOptions.AggressiveOptimization)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not able to squeeze out any performance gains when applying aggressive optimizations to this method in the past.
src/libraries/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializerOptions.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializerOptions.cs
Show resolved
Hide resolved
...ries/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializerOptions.Converters.cs
Show resolved
Hide resolved
...raries/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Write.ByteArray.cs
Show resolved
Hide resolved
...ibraries/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Write.Element.cs
Show resolved
Hide resolved
|
||
WriteCore(converter, writer, value, jsonTypeInfo.Options, ref state); | ||
// For performance, the code below is a lifted WriteCore() above. | ||
if (converter is JsonConverter<TValue> typedConverter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming generic type tests are expensive, is this also an opportunity for us to improve the hot path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eiriktsarpalis what kind of converter is the most popular? (is it DefaultObjectConverter<T>
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, alongside with all the common primitive type converters. Why do you ask?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just wondering if we could introduce a fast path for the most popular one if it's sealed
. Cast to sealed classes is almost free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this concerns root-level types ObjectDefaultConverter<T>
would probably be the most common type. If we do try this it out we should make sure it doesn't regress other common root-level converters types like JsonNodeConverter
or custom converters registered by users.
...ibraries/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Write.Helpers.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.String.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.Json/src/System/Text/Json/Serialization/JsonConverterOfT.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.Json/src/System/Text/Json/Serialization/JsonConverterOfT.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.Json/src/System/Text/Json/Serialization/JsonConverterOfT.cs
Outdated
Show resolved
Hide resolved
/// Performance optimization. Overridden by the nullable converter to prevent boxing of values by the JIT. | ||
/// Although the JIT added support to detect this IL pattern (box+isinst+br), it still manifests in TryWrite(). | ||
/// </summary> | ||
internal virtual bool IsNull(in T value) => value is null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran a few experiments locally and it seems like the in
modifier is the culprit for the boxing behavior. Changing to
private bool IsNull(T value) => value is null;
makes the regression go away. It comes at the cost of having to pass the parameter by value but it's negligible compared to boxing. This also avoids needing to make a virtual call. cc @EgorBo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Based on this and your other comment around using default(T) is null in that LOC in question, I re-ran some micro benchmarks and found, as expected, that value types such as int are faster, while reference types including string will be a bit slower. However, removing the extra caching and make the code simpler is also a win, so I plan on making these changes in the next commit and re-running the full benchmarks again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eiriktsarpalis how can I reproduce the boxing? e.g. I tried sharplab and don't see any boxing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be a reproduction? Not sure what L000e: call 0x00007ffd016a1b10
is doing but seems consistent with what I'm seeing when I profile the repro:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW here's a commit that reproduces the boxing behavior: eiriktsarpalis@a44c0ee
It can be observed running the benchmark added here dotnet/performance#1920
src/libraries/System.Text.Json/src/System/Text/Json/Serialization/JsonConverterOfT.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.Json/src/System/Text/Json/Serialization/JsonConverterOfT.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
src/libraries/System.Text.Json/src/System/Text/Json/Serialization/JsonConverterOfT.cs
Outdated
Show resolved
Hide resolved
…information # By dotnet-maestro[bot] (4) and others # Via GitHub * origin/main: (58 commits) Localized file check-in by OneLocBuild Task (dotnet#57384) [debugger][wasm] Support DebuggerProxyAttribute (dotnet#56872) Account for type mismatch of `FIELD_LIST` members in LSRA (dotnet#57450) Qualify `sorted_table` allocation with `nothrow` (dotnet#57467) Rename transport packages to follow convention (dotnet#57504) Generate proper DWARF reg num for ARM32 (dotnet#57443) Enable System.Linq.Queryable and disable dotnet#50712 (dotnet#57464) Mark individual tests for 51211 (dotnet#57463) Fix Length for ReadOnlySequence created out of sliced Memory owned by MemoryManager (dotnet#57479) Add JsonConverter.Write/ReadAsPropertyName APIs (dotnet#57302) Remove workaround for dotnet/sdk#19482 (dotnet#57453) Do not drain HttpContentReadStream if the connection is disposed (dotnet#57287) [mono] Fix a few corner case overflow operations (dotnet#57407) make use of ports in SPN optional (dotnet#57159) Fixed H/3 stress server after the last Kestrel change (dotnet#57356) disable a failing stress test. (dotnet#57473) Eliminate temporary byte array allocations in the static constructor of `IPAddress`. (dotnet#57397) Update dependencies from https://github.com/dotnet/emsdk build 20210815.1 (dotnet#57447) [main] Update dependencies from mono/linker (dotnet#57344) Improve serializer performance (dotnet#57327) ... # Conflicts: # src/mono/wasm/debugger/BrowserDebugProxy/MemberReferenceResolver.cs # src/mono/wasm/debugger/BrowserDebugProxy/MonoProxy.cs # src/mono/wasm/debugger/BrowserDebugProxy/MonoSDBHelper.cs
Primarily addresses serializer overhead of a given call, but there are general gains throughout.
Changes include:
if
statements and the smaller size appears to help the jitter.JsonTypeInfo<T>
in unnecessary cases during serialization; this is expected to help perf for source-gen when falling back to the serializer.[MethodImplOptions.AggressiveOptimization]
in hot, looping methods.null
when serializing eitherNullable<T>
and non-Nullable<T>
.It primarily resolves this issue with overhead:
resolves #56993
Overhead
A class with no members:
5.0
6.0 before this PR
6.0 with this PR
Serialize overhead: 1.24x faster
Deserialize overhead: 1.015x faster
Also, if this test is changed to add 2 String properties:
5.0
6.0 before this PR
6.0 before this PR
Serialize: 1.1x faster compared to before this PR
Deserialize: 1.03x faster compared to before this PR
but also should resolve these (verification needed for some):
resolves #57093
resolves #52640
resolves #52311
resolves #43413
Improvements in benchmarks with this PR
6.0 status (with this PR) of benchmarks compared to 5.0