Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve serializer performance #57327

Merged
merged 7 commits into from
Aug 15, 2021
Merged

Conversation

steveharter
Copy link
Member

@steveharter steveharter commented Aug 13, 2021

Primarily addresses serializer overhead of a given call, but there are general gains throughout.

Changes include:

  • Manual inlinining \ lifting of serializer calls. This avoids some if statements and the smaller size appears to help the jitter.
  • Avoids a cast against JsonTypeInfo<T> in unnecessary cases during serialization; this is expected to help perf for source-gen when falling back to the serializer.
  • Use of [MethodImplOptions.AggressiveOptimization] in hot, looping methods.
  • Optimized check for null when serializing either Nullable<T> and non-Nullable<T>.
  • Misc others, including some consistency changes for variable and method names.

It primarily resolves this issue with overhead:
resolves #56993

Overhead

A class with no members:

5.0

Method Mean Error StdDev Gen 0 Allocated
Serialize 85.53 ns 0.375 ns 0.351 ns - -
Deserialize 268.87 ns 1.627 ns 1.522 ns 0.0049 32 B

6.0 before this PR

Method Mean Error StdDev Gen 0 Allocated
Serialize 106.5 ns 0.55 ns 0.49 ns - -
Deserialize 267.9 ns 0.55 ns 0.49 ns 0.0049 32 B

6.0 with this PR

Method Mean Error StdDev Gen 0 Allocated
Serialize 85.88 ns 0.527 ns 0.493 ns - -
Deserialize 263.78 ns 2.824 ns 2.503 ns 0.0049 32 B

Serialize overhead: 1.24x faster
Deserialize overhead: 1.015x faster

Also, if this test is changed to add 2 String properties:

5.0

Method Mean Error StdDev Gen 0 Allocated
Serialize 196.3 ns 1.31 ns 1.16 ns - -
Deserialize 341.1 ns 0.74 ns 0.70 ns 0.0161 104 B

6.0 before this PR

Method Mean Error StdDev Gen 0 Allocated
Serialize 218.2 ns 1.06 ns 0.94 ns - -
Deserialize 334.5 ns 0.71 ns 0.63 ns 0.0161 104 B

6.0 before this PR

Method Mean Error StdDev Gen 0 Allocated
Serialize 198.4 ns 2.19 ns 2.05 ns - -
Deserialize 324.2 ns 2.35 ns 2.08 ns 0.0161 104 B

Serialize: 1.1x faster compared to before this PR
Deserialize: 1.03x faster compared to before this PR

but also should resolve these (verification needed for some):

resolves #57093
resolves #52640
resolves #52311
resolves #43413

Improvements in benchmarks with this PR
note: threshold is 2%
summary:
better: 31, geomean: 1.057
worse: 2, geomean: 1.071
total diff: 33
Slower diff/base Base Median (ns) Diff Median (ns) Modality
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStream 1.11 1386.49 1540.57 bimodal
System.Text.Json.Serialization.Tests.WriteJson.Serializ 1.03 377918.23 390056.88
Faster base/diff Base Median (ns) Diff Median (ns) Modality
System.Text.Json.Serialization.Tests.WriteJson.SerializeToUtf8Bytes 1.19 119.98 100.69
System.Text.Json.Serialization.Tests.WriteJson.SerializeToString 1.16 137.06 117.90
System.Text.Json.Serialization.Tests.WriteJson.SerializeObjectProperty 1.09 246.90 225.56
System.Text.Json.Serialization.Tests.WriteJson.SerializeObjectPr 1.09 505.15 462.27
System.Text.Json.Serialization.Tests.WriteJson.SerializeToString 1.08 7678.84 7140.75
System.Text.Json.Serialization.Tests.WriteJson.Seria 1.07 251.04 234.99
System.Text.Json.Serialization.Tests.WriteJson.Serial 1.06 622.98 586.76
System.Text.Json.Serialization.Tests.WriteJson.SerializeToUtf8By 1.06 303.54 286.05
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf 1.06 393.55 371.04
System.Text.Json.Serialization.Tests.WriteJson.Seria 1.06 412.13 388.94
System.Text.Json.Serialization.Tests.WriteJson.Serial 1.06 862.66 815.38
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromString 1.06 124.69 118.10
System.Text.Json.Serialization.Tests.ReadJson.Deseri 1.05 559.86 530.84
System.Text.Json.Serialization.Tests.WriteJson.SerializeToString 1.05 331.31 314.80
System.Text.Json.Serialization.Tests.WriteJson.Seria 1.05 225.08 214.83
System.Text.Json.Serialization.Tests.WriteJson.SerializeToString 1.05 810.09 774.23
System.Text.Json.Serialization.Tests.WriteJson.SerializeToUtf8Bytes 1.05 7249.32 6936.89
System.Text.Json.Serialization.Tests.ReadJson.Deseri 1.04 334.56 320.96
System.Text.Json.Serialization.Tests.WriteJson.SerializeToUtf8Bytes 1.04 390.61 374.94
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStr 1.04 729.07 700.01
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf8Bytes 1.04 1076.71 1035.66
System.Text.Json.Serialization.Tests.WriteJson.SerializeToUtf8Bytes 1.04 779.50 750.04
System.Text.Json.Serialization.Tests.WriteJson.SerializeToString 1.04 543.79 523.71
System.Text.Json.Serialization.Tests.WriteJson.Serial 1.04 658.87 636.45
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf8Bytes 1.03 74.08 71.59
System.Text.Json.Serialization.Tests.WriteJson.Serial 1.03 774.83 748.87
System.Text.Json.Serialization.Tests.WriteJson.Serializ 1.03 379279.76 367081.10
System.Text.Json.Serialization.Tests.ReadJson.Deseria 1.03 1145.78 1114.74
System.Text.Json.Serialization.Tests.ReadJson.Deseria 1.03 1556.95 1515.01
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStream 1.03 298.83 290.81
System.Text.Json.Serialization.Tests.WriteJson<HashSet>.SerializeToStrin 1.03 6444.12 6282.43
6.0 status (with this PR) of benchmarks compared to 5.0
note: threshold is 2%
summary:
better: 22, geomean: 1.268
worse: 2, geomean: 1.065
total diff: 24
Slower diff/base Base Median (ns) Diff Median (ns) Modality
System.Text.Json.Serialization.Tests.ReadJson.Deseri 1.10 482.68 530.03
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStr 1.03 668.86 690.92
Faster base/diff Base Median (ns) Diff Median (ns) Modality
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromString 2.13 47511.77 22281.48
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStream 2.10 47456.53 22649.40
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf8Byte 2.07 46860.63 22584.86
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromString 1.88 67083.49 35639.60
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf8Byte 1.76 63251.21 35890.00
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStream 1.74 63594.48 36651.60
System.Text.Json.Serialization.Tests.ReadJson<ImmutableDictionary<String, String 1.14 45314.11 39891.80
System.Text.Json.Serialization.Tests.ReadJson<ImmutableDictionary<String, String 1.12 45394.56 40385.56
System.Text.Json.Serialization.Tests.ReadJson<HashSet>.DeserializeFromUt 1.11 11901.75 10722.24
System.Text.Json.Serialization.Tests.ReadJson<Dictionary<String, String>>.Deseri 1.11 21191.19 19168.64
System.Text.Json.Serialization.Tests.ReadJson<HashSet>.DeserializeFromSt 1.10 12139.81 11039.48
System.Text.Json.Serialization.Tests.ReadJson<ImmutableSortedDictionary<String, 1.10 75880.24 69099.21
System.Text.Json.Serialization.Tests.ReadJson<ImmutableDictionary<String, String 1.09 47138.25 43118.00
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStr 1.09 29213.97 26797.27
System.Text.Json.Serialization.Tests.ReadJson<Dictionary<String, String>>.Deseri 1.08 20351.18 18857.99
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf 1.07 27666.04 25944.15
System.Text.Json.Serialization.Tests.ReadJson<Dictionary<String, String>>.Deseri 1.07 22345.68 20956.26
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf8Bytes 1.06 1105.41 1039.88
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromString 1.06 1178.27 1113.83
System.Text.Json.Serialization.Tests.ReadJson<ImmutableSortedDictionary<String, 1.05 75538.87 72250.75
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromString 1.04 587.58 564.27
System.Text.Json.Serialization.Tests.ReadJson<HashSet>.DeserializeFromSt 1.03 12883.77 12530.95

@steveharter steveharter added this to the 6.0.0 milestone Aug 13, 2021
@steveharter steveharter self-assigned this Aug 13, 2021
@ghost
Copy link

ghost commented Aug 13, 2021

Tagging subscribers to this area: @eiriktsarpalis, @layomia
See info in area-owners.md if you want to be subscribed.

Issue Details

Primarily addresses serializer overhead of a given call, but there are general gains throughout.

Changes include:

  • Manual inlinining \ lifting of serializer calls. This avoids some if statements and the smaller size appears to help the jitter.
  • Avoids a cast against JsonTypeInfo<T> in unnecessary cases during serialization; this is expected to help perf for source-gen when falling back to the serializer.
  • Use of [MethodImplOptions.AggressiveOptimization] in hot, looping methods.
  • Optimized check for null when serializing either Nullable<T> and non-Nullable<T>.
  • Misc others, including some consistency changes for variable and method names.

It primarily resolves this issue with overhead:
resolves #56993

Overhead

A class with no members:

5.0

Method Mean Error StdDev Gen 0 Allocated
Serialize 85.53 ns 0.375 ns 0.351 ns - -
Deserialize 268.87 ns 1.627 ns 1.522 ns 0.0049 32 B

6.0 before this PR

Method Mean Error StdDev Gen 0 Allocated
Serialize 106.5 ns 0.55 ns 0.49 ns - -
Deserialize 267.9 ns 0.55 ns 0.49 ns 0.0049 32 B

6.0 with this PR

Method Mean Error StdDev Gen 0 Allocated
Serialize 85.88 ns 0.527 ns 0.493 ns - -
Deserialize 263.78 ns 2.824 ns 2.503 ns 0.0049 32 B

Serialize overhead: 1.24x faster
Deserialize overhead: 1.015x faster

Also, if this test is changed to add 2 String properties:

5.0

Method Mean Error StdDev Gen 0 Allocated
Serialize 196.3 ns 1.31 ns 1.16 ns - -
Deserialize 341.1 ns 0.74 ns 0.70 ns 0.0161 104 B

6.0 before this PR

Method Mean Error StdDev Gen 0 Allocated
Serialize 218.2 ns 1.06 ns 0.94 ns - -
Deserialize 334.5 ns 0.71 ns 0.63 ns 0.0161 104 B

6.0 before this PR

Method Mean Error StdDev Gen 0 Allocated
Serialize 198.4 ns 2.19 ns 2.05 ns - -
Deserialize 324.2 ns 2.35 ns 2.08 ns 0.0161 104 B

Serialize: 1.1x faster compared to before this PR
Deserialize: 1.03x faster compared to before this PR

but also should resolve these (verification needed for some):

resolves #57093
resolves #52640
resolves #52311
resolves #43413

Improvements in benchmarks with this PR
note: threshold is 2%
summary:
better: 22, geomean: 1.055
worse: 4, geomean: 1.034
total diff: 26
Slower diff/base Base Median (ns) Diff Median (ns) Modality
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStream 1.04 1386.49 1442.69
System.Text.Json.Serialization.Tests.WriteJson<Dictionary<String, String>>.Seria 1.04 10160.87 10520.05
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStr 1.03 28357.84 29236.77
System.Text.Json.Serialization.Tests.WriteJson<Dictionary<String, String>>.Seria 1.03 10253.93 10566.39
Faster base/diff Base Median (ns) Diff Median (ns) Modality
System.Text.Json.Serialization.Tests.WriteJson.SerializeToUtf8Bytes 1.11 119.98 108.35
System.Text.Json.Serialization.Tests.WriteJson.SerializeToString 1.11 137.06 123.94
System.Text.Json.Serialization.Tests.WriteJson.SerializeObjectProperty 1.10 246.90 224.10
System.Text.Json.Serialization.Tests.WriteJson.SerializeToStream 1.08 213.16 198.04
System.Text.Json.Serialization.Tests.ReadJson.Deseria 1.07 1556.95 1449.40
System.Text.Json.Serialization.Tests.WriteJson.Seria 1.07 225.08 210.27
System.Text.Json.Serialization.Tests.WriteJson.Seria 1.06 251.04 236.62
System.Text.Json.Serialization.Tests.ReadJson.Deseri 1.06 559.86 530.03
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStr 1.06 729.07 690.92
System.Text.Json.Serialization.Tests.WriteJson.SerializeToString 1.05 331.31 314.87
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromString 1.05 37465.99 35639.60
System.Text.Json.Serialization.Tests.WriteJson.SerializeToUtf8By 1.05 303.54 290.19
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromString 1.05 124.69 119.23
System.Text.Json.Serialization.Tests.WriteJson.SerializeObjectPr 1.05 505.15 483.23
System.Text.Json.Serialization.Tests.WriteJson.SerializeToUtf8Bytes 1.04 779.50 747.46
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStream 1.04 298.83 287.05
System.Text.Json.Serialization.Tests.WriteJson.SerializeToUtf8Bytes 1.04 390.61 375.73
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf8Bytes 1.04 74.08 71.26
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf8Bytes 1.04 1076.71 1039.88
System.Text.Json.Serialization.Tests.ReadJson.Deseri 1.03 334.56 324.45
System.Text.Json.Serialization.Tests.WriteJson.SerializeToString 1.03 810.09 789.99
System.Text.Json.Serialization.Tests.WriteJson<HashSet>.SerializeToStrea 1.02 6130.58 5996.71
6.0 status (with this PR) of benchmarks compared to 5.0
note: threshold is 2%
summary:
better: 22, geomean: 1.268
worse: 2, geomean: 1.065
total diff: 24
Slower diff/base Base Median (ns) Diff Median (ns) Modality
System.Text.Json.Serialization.Tests.ReadJson.Deseri 1.10 482.68 530.03
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStr 1.03 668.86 690.92
Faster base/diff Base Median (ns) Diff Median (ns) Modality
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromString 2.13 47511.77 22281.48
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStream 2.10 47456.53 22649.40
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf8Byte 2.07 46860.63 22584.86
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromString 1.88 67083.49 35639.60
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf8Byte 1.76 63251.21 35890.00
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStream 1.74 63594.48 36651.60
System.Text.Json.Serialization.Tests.ReadJson<ImmutableDictionary<String, String 1.14 45314.11 39891.80
System.Text.Json.Serialization.Tests.ReadJson<ImmutableDictionary<String, String 1.12 45394.56 40385.56
System.Text.Json.Serialization.Tests.ReadJson<HashSet>.DeserializeFromUt 1.11 11901.75 10722.24
System.Text.Json.Serialization.Tests.ReadJson<Dictionary<String, String>>.Deseri 1.11 21191.19 19168.64
System.Text.Json.Serialization.Tests.ReadJson<HashSet>.DeserializeFromSt 1.10 12139.81 11039.48
System.Text.Json.Serialization.Tests.ReadJson<ImmutableSortedDictionary<String, 1.10 75880.24 69099.21
System.Text.Json.Serialization.Tests.ReadJson<ImmutableDictionary<String, String 1.09 47138.25 43118.00
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStr 1.09 29213.97 26797.27
System.Text.Json.Serialization.Tests.ReadJson<Dictionary<String, String>>.Deseri 1.08 20351.18 18857.99
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf 1.07 27666.04 25944.15
System.Text.Json.Serialization.Tests.ReadJson<Dictionary<String, String>>.Deseri 1.07 22345.68 20956.26
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf8Bytes 1.06 1105.41 1039.88
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromString 1.06 1178.27 1113.83
System.Text.Json.Serialization.Tests.ReadJson<ImmutableSortedDictionary<String, 1.05 75538.87 72250.75
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromString 1.04 587.58 564.27
System.Text.Json.Serialization.Tests.ReadJson<HashSet>.DeserializeFromSt 1.03 12883.77 12530.95
Author: steveharter
Assignees: steveharter
Labels:

area-System.Text.Json, tenet-performance

Milestone: 6.0.0

@@ -16,6 +16,9 @@ internal class ObjectDefaultConverter<T> : JsonObjectConverter<T> where T : notn
{
internal override bool CanHaveIdMetadata => true;

#if NET6_0_OR_GREATER
[MethodImpl(MethodImplOptions.AggressiveOptimization)]
Copy link
Member

@eiriktsarpalis eiriktsarpalis Aug 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what are some of the optimizations that were being missed without this attribute? I had tried using it a few months back on the JsonConverter<T>.TryWrite method (which is also fairly large) and did not receive any perf improvements.

Copy link
Member Author

@steveharter steveharter Aug 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how to find this out, with tiered jitting and all (e.g. hit a breakpoint after running a 1,000 times to inspect the native code?). Also, I did not run with crossgen2 yet, but typically that has the same results once the warm-up phase is done.

UPDATE: the assembly can be viewed by a checked build and using DOTNET_JitDisasm=methodName.

The results are small when there are a couple properties but should be greater when there are more properties. I had to run benchmarks a few times to make sure noise isn't a factor. I welcome investigation from others on this.

Here's some results with and without AggressiveOptimization. Again, I assume the more properties, the more savings. Also I tend to focus on the Min column more than Mean.

With AO and without

Using a class with 2 string properties; I ran 5 times and took the fastest.

With AO:

Method Mean Error StdDev Gen 0 Allocated
Serialize 197.1 ns 0.34 ns 0.27 ns - -
Deserialize 330.1 ns 0.42 ns 0.38 ns 0.0161 104 B

Without AO:

Method Mean Error StdDev Gen 0 Allocated
Serialize 198.8 ns 0.10 ns 0.09 ns - -
Deserialize 333.1 ns 1.14 ns 1.01 ns 0.0161 104 B

Running ReadJson<LargeStructWithProperties> and WriteJson<LargeStructWithProperties> since it has 10 properties. I ran the benchmarks 3 times and took the fastest. The "Stream" cases didn't get any faster -- not sure why, although that does have a different code path in the object converter.

With AO:

Method Mean Error StdDev Median Min Max Gen 0 Allocated
SerializeToString 650.1 ns 5.49 ns 4.87 ns 648.0 ns 643.8 ns 656.6 ns 0.0765 488 B
SerializeToUtf8Bytes 618.7 ns 7.81 ns 7.31 ns 615.9 ns 609.7 ns 633.3 ns 0.0577 376 B
SerializeToStream 766.6 ns 10.86 ns 9.63 ns 766.1 ns 754.1 ns 783.0 ns 0.0368 232 B
SerializeObjectProperty 831.6 ns 13.94 ns 11.64 ns 828.7 ns 819.1 ns 862.5 ns 0.1347 848 B

Without AO:

Method Mean Error StdDev Median Min Max Gen 0 Allocated
SerializeToString 667.1 ns 2.42 ns 2.15 ns 667.5 ns 661.5 ns 669.4 ns 0.0754 488 B
SerializeToUtf8Bytes 620.1 ns 3.00 ns 2.66 ns 620.0 ns 615.0 ns 624.4 ns 0.0598 376 B
SerializeToStream 760.7 ns 0.80 ns 0.67 ns 760.6 ns 759.8 ns 762.1 ns 0.0370 232 B
SerializeObjectProperty 839.5 ns 8.38 ns 7.84 ns 838.3 ns 826.0 ns 851.7 ns 0.1340 848 B

With AO:

Method Mean Error StdDev Median Min Max Gen 0 Allocated
DeserializeFromString 1.114 us 0.0083 us 0.0070 us 1.114 us 1.104 us 1.131 us 0.0318 200 B
DeserializeFromUtf8Bytes 1.042 us 0.0118 us 0.0111 us 1.040 us 1.028 us 1.065 us 0.0289 200 B
DeserializeFromStream 1.519 us 0.0189 us 0.0176 us 1.516 us 1.493 us 1.556 us 0.0478 328 B

Without AO:

Method Mean Error StdDev Median Min Max Gen 0 Allocated
DeserializeFromString 1.151 us 0.0046 us 0.0041 us 1.152 us 1.143 us 1.156 us 0.0276 200 B
DeserializeFromUtf8Bytes 1.063 us 0.0141 us 0.0132 us 1.065 us 1.043 us 1.081 us 0.0295 200 B
DeserializeFromStream 1.497 us 0.0053 us 0.0047 us 1.496 us 1.489 us 1.507 us 0.0479 328 B

Copy link
Member Author

@steveharter steveharter Aug 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end, after running thousands of times, the jitted (native) code may be more or less the same with or without the [AO]. But [AO] should produce faster code sooner.

Copy link
Member

@EgorBo EgorBo Aug 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please avoid [AO]?
Methods with [AO] aren't instrumented so won't be able to benefit from PGO (devirtualize calls, move cold parts of methods further from hot ones, etc) - e.g. run your benchmark again with DOTNET_ReadyToRun=0, DOTNET_TC_QuickJitForLoops=1, DOTNET_TieredPGO=1 with and without [AO]

Also, when a method is promoted to tier1 naturally (without [AO]`) we can:

  1. Get rid of static initializations in codegen
  2. Convert static readonly fields to constants
  3. Inliner is able to resolve call tokens and produce better results

/cc @AndyAyersMS

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[AO] should be avoided. It causes methods to be optimized prematurely. We use it only in a handful of places where we feel like we can't afford to ever run unoptimized code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll re-run with those settings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #58209 for reverting the 4 usages of [AO]. Note I was going to keep the 2 in JsonConverter<T> since there does seem to benefits at least sometimes, but for V7 I think it makes sense to remove them based upon offline discussion with @EgorBo and "DynamicPGO" scenarios. I don't plan on porting that PR to V6.


#if NET6_0_OR_GREATER
[MethodImpl(MethodImplOptions.AggressiveOptimization)]
Copy link
Member

@eiriktsarpalis eiriktsarpalis Aug 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not able to squeeze out any performance gains when applying aggressive optimizations to this method in the past.


WriteCore(converter, writer, value, jsonTypeInfo.Options, ref state);
// For performance, the code below is a lifted WriteCore() above.
if (converter is JsonConverter<TValue> typedConverter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming generic type tests are expensive, is this also an opportunity for us to improve the hot path?

Copy link
Member

@EgorBo EgorBo Aug 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eiriktsarpalis what kind of converter is the most popular? (is it DefaultObjectConverter<T> ?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, alongside with all the common primitive type converters. Why do you ask?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just wondering if we could introduce a fast path for the most popular one if it's sealed. Cast to sealed classes is almost free.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this concerns root-level types ObjectDefaultConverter<T> would probably be the most common type. If we do try this it out we should make sure it doesn't regress other common root-level converters types like JsonNodeConverter or custom converters registered by users.

/// Performance optimization. Overridden by the nullable converter to prevent boxing of values by the JIT.
/// Although the JIT added support to detect this IL pattern (box+isinst+br), it still manifests in TryWrite().
/// </summary>
internal virtual bool IsNull(in T value) => value is null;
Copy link
Member

@eiriktsarpalis eiriktsarpalis Aug 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran a few experiments locally and it seems like the in modifier is the culprit for the boxing behavior. Changing to

private bool IsNull(T value) => value is null;

makes the regression go away. It comes at the cost of having to pass the parameter by value but it's negligible compared to boxing. This also avoids needing to make a virtual call. cc @EgorBo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Based on this and your other comment around using default(T) is null in that LOC in question, I re-ran some micro benchmarks and found, as expected, that value types such as int are faster, while reference types including string will be a bit slower. However, removing the extra caching and make the code simpler is also a win, so I plan on making these changes in the next commit and re-running the full benchmarks again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eiriktsarpalis how can I reproduce the boxing? e.g. I tried sharplab and don't see any boxing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a reproduction? Not sure what L000e: call 0x00007ffd016a1b10 is doing but seems consistent with what I'm seeing when I profile the repro:

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW here's a commit that reproduces the boxing behavior: eiriktsarpalis@a44c0ee

It can be observed running the benchmark added here dotnet/performance#1920

Copy link
Member

@eiriktsarpalis eiriktsarpalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@steveharter steveharter merged commit 6b943e3 into dotnet:main Aug 15, 2021
@steveharter steveharter deleted the OverheadPerf branch August 15, 2021 20:30
thaystg added a commit to thaystg/runtime that referenced this pull request Aug 16, 2021
…information

# By dotnet-maestro[bot] (4) and others
# Via GitHub
* origin/main: (58 commits)
  Localized file check-in by OneLocBuild Task (dotnet#57384)
  [debugger][wasm] Support DebuggerProxyAttribute (dotnet#56872)
  Account for type mismatch of `FIELD_LIST` members in LSRA (dotnet#57450)
  Qualify `sorted_table` allocation with `nothrow` (dotnet#57467)
  Rename transport packages to follow convention (dotnet#57504)
  Generate proper DWARF reg num for ARM32 (dotnet#57443)
  Enable System.Linq.Queryable and disable dotnet#50712 (dotnet#57464)
  Mark individual tests for 51211 (dotnet#57463)
  Fix Length for ReadOnlySequence created out of sliced Memory owned by MemoryManager (dotnet#57479)
  Add JsonConverter.Write/ReadAsPropertyName APIs (dotnet#57302)
  Remove workaround for dotnet/sdk#19482 (dotnet#57453)
  Do not drain HttpContentReadStream if the connection is disposed (dotnet#57287)
  [mono] Fix a few corner case overflow operations (dotnet#57407)
  make use of ports in SPN optional (dotnet#57159)
  Fixed H/3 stress server after the last Kestrel change (dotnet#57356)
  disable a failing stress test. (dotnet#57473)
  Eliminate temporary byte array allocations in the static constructor of `IPAddress`. (dotnet#57397)
  Update dependencies from https://github.com/dotnet/emsdk build 20210815.1 (dotnet#57447)
  [main] Update dependencies from mono/linker (dotnet#57344)
  Improve serializer performance (dotnet#57327)
  ...

# Conflicts:
#	src/mono/wasm/debugger/BrowserDebugProxy/MemberReferenceResolver.cs
#	src/mono/wasm/debugger/BrowserDebugProxy/MonoProxy.cs
#	src/mono/wasm/debugger/BrowserDebugProxy/MonoSDBHelper.cs
@kunalspathak
Copy link
Member

kunalspathak commented Aug 17, 2021

Improvements: dotnet/perf-autofiling-issues#612, dotnet/perf-autofiling-issues#611, dotnet/perf-autofiling-issues#551

@ghost ghost locked as resolved and limited conversation to collaborators Nov 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
6 participants