Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intrinsicify (Sse, Axv2, Arm64, wasm) JsonReaderHelper.IndexOfOrLessThan #41097

Closed
wants to merge 3 commits into from

Conversation

benaadams
Copy link
Member

@benaadams benaadams commented Aug 20, 2020

Change the output wasm for IndexOfOrLessThan to be a simple for loop in C# chosen via packaging as suggested in #40705 (comment)

Update the vectorized path to sync with the SpanHelpers.Byte.cs method it is a variant of adding intrinsics for Sse, Axv2, Arm64 (upto +20% improvement for ReadJson on x64)

/cc @danroth27, @steveharter

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 20, 2020
@benaadams
Copy link
Member Author

Correct artefacts are being generated

net5.0-Browser-Release

image

net5.0-Release

image

@benaadams
Copy link
Member Author

x64 performance changes (up to +20% improvement)

better: 30, geomean: 1.113
worse: 1, geomean: 1.066
total diff: 31

| Slower                                                                           | diff/base | Base Median (ns) | Diff Median (ns) | Modality|
| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| System.Text.Json.Serialization.Tests.ReadJson<SimpleStructWithProperties>.Deseri |      1.07 |           288.23 |           307.28 |         |

| Faster                                                                           | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| System.Text.Json.Serialization.Tests.ReadJson<Dictionary<String, String>>.Deseri |      1.22 |         22031.18 |         18102.24 |         |
| System.Text.Json.Serialization.Tests.ReadJson<Dictionary<String, String>>.Deseri |      1.20 |         21554.46 |         17944.26 |         |
| System.Text.Json.Serialization.Tests.ReadJson<MyEventsListerViewModel>.Deseriali |      1.20 |        341085.15 |        284277.87 |         |
| System.Text.Json.Serialization.Tests.ReadJson<IndexViewModel>.DeserializeFromUtf |      1.19 |         29706.85 |         24903.98 |         |
| System.Text.Json.Serialization.Tests.ReadJson<MyEventsListerViewModel>.Deseriali |      1.19 |        330481.64 |        277947.73 |         |
| System.Text.Json.Serialization.Tests.ReadJson<MyEventsListerViewModel>.Deseriali |      1.19 |        399397.91 |        336085.63 |         |
| System.Text.Json.Serialization.Tests.ReadJson<IndexViewModel>.DeserializeFromStr |      1.18 |         31986.63 |         27064.34 |         |
| System.Text.Json.Serialization.Tests.ReadJson<Dictionary<String, String>>.Deseri |      1.18 |         23919.94 |         20268.43 |         |
| System.Text.Json.Serialization.Tests.ReadJson<IndexViewModel>.DeserializeFromStr |      1.18 |         31000.14 |         26348.41 |         |
| System.Text.Json.Serialization.Tests.ReadJson<LoginViewModel>.DeserializeFromStr |      1.15 |           522.04 |           454.92 |         |
| System.Text.Json.Serialization.Tests.ReadJson<Hashtable>.DeserializeFromUtf8Byte |      1.12 |         68993.04 |         61636.12 |         |
| System.Text.Json.Serialization.Tests.ReadJson<HashSet<String>>.DeserializeFromUt |      1.10 |         12663.49 |         11477.42 |         |
| System.Text.Json.Serialization.Tests.ReadJson<ImmutableSortedDictionary<String,  |      1.10 |         78887.97 |         71927.75 |         |
| System.Text.Json.Serialization.Tests.ReadJson<Hashtable>.DeserializeFromStream   |      1.10 |         69968.66 |         63797.29 |         |
| System.Text.Json.Serialization.Tests.ReadJson<LoginViewModel>.DeserializeFromStr |      1.09 |           763.93 |           698.34 |         |
| System.Text.Json.Serialization.Tests.ReadJson<HashSet<String>>.DeserializeFromSt |      1.09 |         12745.33 |         11734.38 |         |
| System.Text.Json.Serialization.Tests.ReadJson<ImmutableDictionary<String, String |      1.09 |         48372.57 |         44572.25 |         |
| System.Text.Json.Serialization.Tests.ReadJson<ArrayList>.DeserializeFromStream   |      1.09 |         51673.60 |         47617.28 |         |
| System.Text.Json.Serialization.Tests.ReadJson<ImmutableDictionary<String, String |      1.08 |         51226.78 |         47249.65 |         |
| System.Text.Json.Serialization.Tests.ReadJson<HashSet<String>>.DeserializeFromSt |      1.08 |         13979.59 |         12973.07 |         |
| System.Text.Json.Serialization.Tests.ReadJson<ImmutableDictionary<String, String |      1.08 |         48102.61 |         44652.30 |         |
| System.Text.Json.Serialization.Tests.ReadJson<BinaryData>.DeserializeFromStream  |      1.07 |           807.95 |           755.26 |         |
| System.Text.Json.Serialization.Tests.ReadJson<ArrayList>.DeserializeFromString   |      1.07 |         49947.46 |         46735.64 |         |
| System.Text.Json.Serialization.Tests.ReadJson<BinaryData>.DeserializeFromUtf8Byt |      1.07 |           488.59 |           458.07 |         |
| System.Text.Json.Serialization.Tests.ReadJson<LoginViewModel>.DeserializeFromUtf |      1.06 |           426.56 |           400.78 |         |
| System.Text.Json.Serialization.Tests.ReadJson<ImmutableSortedDictionary<String,  |      1.06 |         78155.94 |         73498.13 |         |
| System.Text.Json.Serialization.Tests.ReadJson<Hashtable>.DeserializeFromString   |      1.06 |         67906.38 |         63896.07 |         |
| System.Text.Json.Serialization.Tests.ReadJson<ArrayList>.DeserializeFromUtf8Byte |      1.06 |         50083.93 |         47157.52 |         |
| System.Text.Json.Serialization.Tests.ReadJson<ImmutableSortedDictionary<String,  |      1.05 |         73213.63 |         69703.93 |         |
| System.Text.Json.Serialization.Tests.ReadJson<Int32>.DeserializeFromUtf8Bytes    |      1.03 |            90.42 |            87.57 |         |

@eerhardt
Copy link
Member

@benaadams - Do we actually need to cross compile for Broswer to make this kind of change?

Blazor WASM apps are run through the ILLinker during publish. At link-time, the ILLinker is passed that SSE2.IsSupported, AdvSimd.Arm64.IsSupported, and Vector.IsHardwareAccelerated are all false statically so the ILLinker will link away any code that is conditioned by them.

So I think you should naturally write the code as you typically would:

            if (Sse2.IsSupported || AdvSimd.Arm64.IsSupported)
            {
                // intrinics code
            }
            else if (Vector.IsHardwareAccelerated)
            {
                // Vector code
            }
            else
            {
                 // normal software fallback
            }

And for a Blazor WASM app, all that will be in the published assembly will be the //normal software fallback code.

You don't need to build specifically for Browser to get "size opts". The tooling is set up to make it just work.

@benaadams
Copy link
Member Author

Do we actually need to cross compile for Broswer to make this kind of change?

Blazor WASM apps are run through the ILLinker during publish. At link-time, the ILLinker is passed that SSE2.IsSupported, AdvSimd.Arm64.IsSupported, and Vector.IsHardwareAccelerated are all false statically so the ILLinker will link away any code that is conditioned by them.

I don't think the wasm likes the ref address taking or gotos in the main method; adding a branch to move to a simple for loop method then doesn't get eliminated for the non-wasm platforms so will regress them. Was a lot of discussion in
#39733 (comment) /cc @jkotas @steveharter, @ahsonkhan

@benaadams
Copy link
Member Author

benaadams commented Aug 20, 2020

Adding an intrinsic to the mono interpreter for this method was also looked at #40705, but considered too fragile to do for an upstack lib

@eerhardt
Copy link
Member

I don't think the wasm likes the ref address taking or gotos in the main method; adding a branch to move to a simple for loop method then doesn't get eliminated for the non-wasm platforms so will regress them.

A simple, top-level condition at the root of this method wouldn't get tier-JIT'd away?

public static int IndexOfQuoteOrAnyControlOrBackSlash(this ReadOnlySpan<byte> span)
{
    if (Sse2.IsSupported || AdvSimd.Arm64.IsSupported || Vector.IsHardwareAccelerated)
    {
        return IndexOfQuoteOrAnyControlOrBackSlashIntrinsic(span);
    }
    else
    {
        return IndexOfQuoteOrAnyControlOrBackSlashForLoop(span);
    }
}

@benaadams
Copy link
Member Author

A simple, top-level condition at the root of this method wouldn't get tier-JIT'd away?

Arm would then go down the for loop path; rather than the unrolled path?

@eerhardt
Copy link
Member

Arm would then go down the for loop path; rather than the unrolled path?

Arm32? Yes. Is that a concern?

@benaadams
Copy link
Member Author

Arm would then go down the for loop path; rather than the unrolled path?

Arm32? Yes. Is that a concern?

It was highlighted as an issue #39733 (comment); I assume not regressing platforms when its not too hard to avoid is a desirable feature? /cc @jkotas

@jkotas
Copy link
Member

jkotas commented Aug 20, 2020

We do support ARM32 and number of our customers are deploying on ARM32.

E.g. #41089 is about poor performance of System.Text.Json on Xamarin platforms that means ARM.

@layomia layomia added area-System.Text.Json tenet-performance Performance related issue and removed area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Aug 20, 2020
@layomia layomia added this to the 5.0.0 milestone Aug 20, 2020
@eerhardt
Copy link
Member

My uber concern here is unnecessarily building more upstack libraries for -Browser. In this case it is just for a single method. cc @ericstj

Maybe this is another case for RuntimeFeature.IsInterpreting (see #38439 (comment)). If running on an interpreter, it is slower to use an unrolled loop than a simple for, but running on JIT'd code it is faster to use an unrolled loop, then it would seem we should have some way to knowing which environment we are running on.

For 5.0, what if we used RuntimeFeature.IsDynamicCodeCompiled to check if we were interpreting (that will be the only case where IsDynamicCodeCompiled == false)? Then in 6.0 we make RuntimeFeature.IsInterpreting a real thing, and switch System.Text.Json over to use that instead?

@jkotas
Copy link
Member

jkotas commented Aug 20, 2020

For 5.0

I do not see this change meeting the bar for backporting to release/5.0.

@ericstj
Copy link
Member

ericstj commented Aug 20, 2020

We shouldn't be forking this library. Make this a runtime check please. I'm not an expert on the intricacies of getting this right so that the final JIT'ed code is still ideal perf, but other-places that needed some special behavior for interpreter were also using linker substitutions too, right?

I'd also like to block on @steveharter's feedback. He's out this week but should be able to take a look next week.

@jkotas
Copy link
Member

jkotas commented Aug 20, 2020

running on JIT'd code it is faster to use an unrolled loop,

This is not necessarily true for non-optimized code. Maybe what we need is bool IsCurrentMethodOptimized { get; } that will be true/false depending on whether the current method body has codegen optimizations applied to it.

@benaadams
Copy link
Member Author

I do not see this change meeting the bar for backporting to release/5.0.

For 6.0 will it run as precompiled wasm? Thus have different considerations?

@jkotas
Copy link
Member

jkotas commented Aug 20, 2020

For 6.0 will it run as precompiled wasm?

That is the plan so far.

@jeffhandley
Copy link
Member

I do not see this change meeting the bar for backporting to release/5.0.

I agree. I'm going to go ahead and move this to 6.0.0.

@jeffhandley jeffhandley removed this from the 5.0.0 milestone Aug 24, 2020
@ericstj ericstj assigned eiriktsarpalis and layomia and unassigned steveharter Mar 8, 2021
@ericstj
Copy link
Member

ericstj commented Mar 8, 2021

@layomia / @eiriktsarpalis can you take a look here?

@steveharter
Copy link
Member

| Slower | diff/base | Base Median (ns) | Diff Median (ns) | Modality|
| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| System.Text.Json.Serialization.Tests.ReadJson.Deseri | 1.07

@benaadams thoughts on this regression? Fluke or is something else like small payloads?

In general, the ~11% mean improvement is great!
better: 30, geomean: 1.113

@benaadams benaadams force-pushed the Json branch 2 times, most recently from 955f26f to 2241f2c Compare March 12, 2021 20:20
@benaadams
Copy link
Member Author

Rebased and applied feedback

<ItemGroup Condition="'$(TargetFramework)' == '$(NetCoreAppCurrent)'">
<Compile Include="System\Text\Json\Reader\JsonReaderHelper.Intrinsics.cs" />
</ItemGroup>
<ItemGroup Condition="'$(TargetFramework)' != '$(NetCoreAppCurrent)'">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that many of our libraries that are inbox and OOB do this:

    <TargetFrameworks>$(NetCoreAppCurrent);netstandard2.0;netcoreapp3.0;net461</TargetFrameworks>
    <ExcludeCurrentNetCoreAppFromPackage>true</ExcludeCurrentNetCoreAppFromPackage>

This means that NetCoreAppCurrent is the build that goes in the runtimepack/shared framework. But in the NuGet package, only netstandard2.0;netcoreapp3.0;net461 builds get packaged in there. So that means if someone references the OOB System.Text.Json v7 NuGet package, even on net6.0, they will use the netcoreapp3.0 version from the package, because that's the highest version available.

So I really think these conditions should be '$(TargetFramework)' == '$(NetCoreAppCurrent)' or '$(TargetFramework)' == 'netcoreapp3.0' - Include .Intrinsics. Else - include .NoIntrinsics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intrinsics will break on netcoreapp3.0 though because they include the arm variants? Should I include net5.0 and treat that as the netcoreapp3.0 in your example?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably needs the experts.

@ericstj @steveharter @layomia - how do you want to handle Arm intrinsics in STJ? Do you want to ship a net5.0 TFM in the OOB package?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could type forward, but not sure how that works either 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type forwards wouldn't work here. I assume to support Arm we will need a new TFM in the package. But I think that's up to the System.Text.Json owners to decide. I'm just a fly-by reviewer 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OOB packages make your app bigger and slower by design. For example, the OOB package does not come with R2R compiled code, so your app will startup slower when you reference the OOB package.

Not necessarily, I might build self-contained and run cross-gen myself. Or I might be running on an aot runtime.

If you want to depend on latest version, you better update the runtime too.

That's not always an option. Consider what happens in .NET 7.0. We're going to add API to this library and I bet the Azure SDK (or some other libs) will want to use it. At that point any 6.0 app that uses Azure SDK will have no choice but to use the package version.

I'd prefer we keep a simple rule here. If the configuration differs in conditions/ifdefs from the packaged configurations then it should also be included in the package.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At that point any 6.0 app that uses Azure SDK will have no choice but to use the package version.

Yes, it will continue to work - with small perf hit - even if we keep the status quo.

It is ok with me if we want build extra targets to ensure that newer OOB packages on downlevel platforms always get all performance tweaks possible.

There are other assemblies with this problem. For example,
https://github.com/dotnet/runtime/blob/main/src/libraries/System.Text.Encodings.Web/src/System.Text.Encodings.Web.csproj uses ARM intrisics inbox, but there is no OOB build with ARM intrinsics. I assume that we would want to fix all these.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider what happens in .NET 7.0. We're going to add API to this library and I bet the Azure SDK (or some other libs) will want to use it. At that point any 6.0 app that uses Azure SDK will have no choice but to use the package version.

Sounds like an issue to resolve before next year's 7.0 release? 😉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing gets resolved by that point. New API in the package means people will use that new API, so consuming the package that's newer than the framework is super common. Folks shouldn't have to pay a penalty when they do that. My position on this is don't use ExcludeCurrentNetCoreAppFromPackage if there is anything different about the NetCoreAppCurrent build. Don't split hairs on something saying it's OK because it's only a small perf fix. You might be able to convince yourself of that, but what about the next person that doesn't notice it's excluded? What about the customer that cares about that perf-fix.
The package growth problem is less of an issue now that we're starting to drop out of support frameworks from packages thanks to precedent that @ViktorHofer is setting.
Let's keep our rules simple: if you if'def/conditionally compile, then don't exclude from package.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the initial feedback, please don't set ExcludeCurrentNetCoreAppFromPackage anymore as we don't want to exclude the latest .NETCoreApp asset in packages anymore. See #53439 for more details.

As @ericstj mentioned, in the past we excluded specific tfms - even though they were shipping in some form - from our packages but based on the new package support policy we don't carry along unsupported .NETCoreApp assets along anymore which bounds the number of .NETCoreApp assets in a package.

@benaadams
Copy link
Member Author

Is there more to do here?

@@ -0,0 +1,193 @@
// Licensed to the .NET Foundation under one or more agreements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why two copies of the file, rather than an internal shim for intrinsics that always returns false/throws?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I asked this a while back here: #41097 (comment) and then @steveharter asked for a file split here: #41097 (comment)

I am really not a fan of duplicating the Vector<T> and fallback logic across two files just so the intrinsic logic can be cleanly separated. The JIT (even full framework) should correctly handle dead code elimination for properties that simply return false and it will not be pleasant remembering to update both copies of the files for half the code.

Copy link
Member Author

@benaadams benaadams Apr 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see any *.PlatformNotSupported.cs file references in System.Text.Encodings.Web.csproj?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like @GrabYourPitchforks removed them in #49373

@GrabYourPitchforks
Copy link
Member

@benaadams Can we get some extra unit test coverage on these APIs to help exercise edge cases in the SIMD / vectorized logic and prevent regressions? In particular, I'd like to see a unit test specifically exercise the IndexOfQuoteOrAnyControlOrBackSlash code path.

We have some types like BoundedMemory that can help with unit testing this. See Latin1UtilityTests.cs for some examples. The tests in that file create spans with interesting data at a bunch of different offsets to try to exercise edge cases in the SIMD logic, especially with regard to any pre-loop alignment or post-loop draining. The end of that file also shows using a delegate to invoke a byte*-consuming method, though in your case you'd probably want a ref byte-consuming method.

@steveharter
Copy link
Member

@benaadams based on your timings this is an important PR. Do you plan on responding to @GrabYourPitchforks around potential regressions; if not we can have someone else investigate. Thanks

@steveharter
Copy link
Member

The perf numbers look great, but at this point in the release it is risky to add this IMO. We should consider this for 7.0 early.

@steveharter steveharter closed this Jul 8, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Aug 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.