Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Developers apps using JSON serialization start up and run faster #1568

Closed
1 task
jkotas opened this issue Sep 27, 2019 · 23 comments
Closed
1 task

Developers apps using JSON serialization start up and run faster #1568

jkotas opened this issue Sep 27, 2019 · 23 comments
Assignees
Labels
area-System.Text.Json Cost:XL Work that requires one engineer more than 4 weeks Priority:0 Work that we can't release without Team:Libraries tenet-performance Performance related issue User Story A single user-facing feature. Can be grouped under an epic.
Milestone

Comments

@jkotas
Copy link
Member

jkotas commented Sep 27, 2019

Original proposal by @jkotas (click to view)
The generation of Json serializers via reflection at runtime has non-trivial startup costs. This has been identified as a bottleneck during prototyping of fast small cloud-first micro-services:

Repro: https://gist.github.com/jkotas/b0671e154791e287c38a627ca81d7197

The Json serializer generated using reflection at runtime has startup cost ~30ms. The manually written Json serializer has startup cost ~1ms.


Edited by @kevinwkt and @layomia :

Background

There are comprehensive documents detailing the needs and benefits of generating JSON serializers at compile time. Some of these benefits are improved startup time, reduction in private memory usage, faster throughput for serialization and deserialization, and being ILLinker-friendly due to avoiding reflection at run-time. There is also an opportunity to reduce the size of the trimmed System.Text.Json.dll after source generation and linker trimming due to code-paths that use reflection being potentially removed, and also unused built-in JsonConverter<T>s such as Uri, Ulong64 etc.

After discussing some approaches and pros/cons of some of them we decided to implement this feature using Roslyn source generators. Implementation details and code/usage examples can be seen in the design document. This document will outline the roadmap for the initial experiment and highlight actionable items.

This project requires numerous API changes and the design is being iterated on which is why we will be using the dotnet/runtimelab repository instead of dotnet/runtime. The main goal of this project is to get something up and running while changing implementation and iterating on public API without committing to dotnet/runtime master. We hope to share the project and get feedback for potential release on .NET 6.0. The project will be consumable through a prerelease package until then. Progress can be tracked through the JSON Code Gen project board in dotnet/runtimelab.

Approach

There are 3 main points in this project: type discovery, source code generation, generated source code integration (with user applications).

Type discovery

Type discovery can be thought of in two ways, an implicit model (where the user does not have to specify which types to generate code for) and an explicit model (user specifies through code or configuration which types to generate code for).

Various implicit approaches have been discussed such as source generating for all partial classes or scanning for calls into the serializer using Roslyn tree syntax. These models can be revisited in the future as the value/feasibility of the approach becomes clearer based on user feedback. It is important to note that some downsides to such a model include missing types to generate source for or generating source for types when not needed due to a bug or edge cases we didn’t consider.

The proposed approach for type discovery requires an explicit indication of serializable types by the user. This model supports indicating both owned and non-owned types. A new JsonSerializableAttribute will be used to detect these types. There are two patterns for JsonSerializiableAttribute. The first consists of applying the attribute on a type that the user owns, and the second consists of the user passing into the constructor of the attribute a non-owned serializable type.

We believe that an explicit model using attributes would be a simple first-approach to the problem. Within the Roslyn source generator, we parse the syntax tree to find usages of the JsonSerializableAttribute. The output of this phase would be a list of input types for the generator in order to code-gen recursively for each type in all the object graphs.

Source code generation

The design for the generated source focuses mainly on performance gains and extensibility to existing JsonSerializer functionality. Performance is improved in two ways. The first is during the first-time/warm-up performance for both CPU and memory by avoiding costly reflection to build up a Type metadata cache during runtime and moving it to compile time. These type metadata are then represented as JsonTypeInfo classes that can be used for (de)serialization at runtime. The second is throughput improvement by avoiding the initial metadata-dictionary lookup on calls to the serializer by generating an instance of the type’s JsonTypeInfo (metadata). These instances will be passed to new (de)serialize overloads.

We will use the types discovered in the type discovery phase and recurse through the type graph in order to source generate the functions mentioned above within each JsonTypeInfo and register them inside the user-facing wrapper JsonSerializerContext.

Generated source code integration

There are discussions regarding integration of generated metadata source code with user apps. The proposed approach consists of the generator creating a context class (JsonSerializerContext) which takes an options instance and contains references to the generated JsonTypeInfos for each type seen above. This relies on the creation of new overloads to the current serializer mentioned before that can be retrieved from the context. An example of the overload and usage can be seen here, while examples and details of the end to end approach can be seen in the design document.

Action items

Progress of this effort can be observed through the JSON Code Gen project board in dotnet/runtimelab.

The source generator (System.Text.Json.SourceGeneration.dll) and updated System.Text.Json.dll can be consumed via an experimental NuGet package. Issues can be logged at https://github.com/dotnet/runtimelab/issues/new?labels=area-JsonCodeGen with the area-JsonCodeGen label.

cc @jkotas @davidfowl @stephentoub @mjsabby @terrajobst @pranavkm @ericstj @layomia @steveharter @chsienki

@huoyaoyuan
Copy link
Member

In theory, any startup-only reflection/delegate initialization can be done AOT. Popular scenarios including:

  • Serialization
  • Entity framework queries
  • Linq to object

Please consider build some infrastructure to let the library provide AOT generation.
And also, custom converters support for serialization AOT is important.

@steveharter
Copy link
Member

steveharter commented Oct 1, 2019

The existing design depends on either manual storage of the JsonSerializerOptions class (e.g. held by your own static variable) or by using the default instance which is in a private static variable. Using the global ensures the options are not re-initialized unnecessarily.

However there is an first-time perf hit of initializing the options for each new Type encountered; this involves using reflection to lookup the properties and various attributes.

See issue #1562 which could be used to help facilitate custom converters per POCO type and collection type which for performance will likely be generated IL (run-time or ahead-of-time) and\or Roslyn generated source pending requirements\design. This wouldn't require the reflection hit.

@ericstj ericstj transferred this issue from dotnet/corefx Jan 9, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Text.Json untriaged New issue has not been triaged by the area owner labels Jan 9, 2020
@layomia layomia added tenet-performance Performance related issue and removed untriaged New issue has not been triaged by the area owner labels Feb 20, 2020
@layomia layomia added this to the 5.0 milestone Feb 20, 2020
@kevinwkt kevinwkt changed the title Generate Json serializers at build time to reduce startup time Generate Json serializers at build time Jul 29, 2020
@kevinwkt kevinwkt self-assigned this Jul 29, 2020
@kevinwkt kevinwkt modified the milestones: 5.0.0, 6.0.0 Jul 29, 2020
@kevinwkt kevinwkt added the Epic Groups multiple user stories. Can be grouped under a theme. label Jul 29, 2020
@ericstj
Copy link
Member

ericstj commented Aug 3, 2020

Cool to see progress on this @kevinwkt @layomia. For folks interested please check out @kevinwkt's post above and the links to work going on.

@steveharter
Copy link
Member

@mconnew again thank you for insights and experience here.

One note as @layomia also pointed out is that the current code-gen is not about generating self-contained "serializers" but about generating metadata and callbacks:

  • The list of properties and fields.
  • Attributes of each property\field including the type, name, whether it is get-only, custom converters, etc.
  • A callback for each property\field set\get.
  • A callback for creation.
  • An optional callback for serialize()\deserialize(). This is only called when the options instance + serializable type + the generated code + version checks are compatible, otherwise the normal serialize\deserialize is used.

This achieves the primary goals of fast startup and minimizing private bytes both done by avoid reflection and reflection emit. A secondary goal of increased throughput occurs when the generated callbacks for serialize()\deserialize() can be used. Another secondary goal is to support the ILLinker for to reduce the size of STJ.dll.

Incorrect ownership of pre-generated serializer
Multiple versions of pre-generated serializers

The current design and constraints of Roslyn source generators mean:

  • The types declared in a specific assembly have their own generated code in that same assembly.
  • Types declared in an external assembly either:
    • Have public compatible generated code. These generated types can be used by the caller and are thus shared.
    • Do not have public compatible generated code. Code is then generated in the caller assembly for these types.

Greater application startup time
Higher memory usage even when not needed

The [JsonSerializiable] attribute is only used during ahead-of-time code generation, and not at run-time. There is no "global" assembly walk at run-time of all types that have `[JsonSerializiable].

The new "context class" programming model is a pay-to-play meaning generated code is directly called at run-time for each type, and thus only that code is JITTed. If there are 1,000 generated types, for example, only the ones accessed at run-time (by calling the appropriate member on the context class) should be JITTed (along with any dependent generated types). This lazy JIT assumption of course should be verified.

@terrajobst
Copy link
Member

terrajobst commented Nov 24, 2020

@steveharter @layomia

Can we update the issue description to make sure this item tracks both, perf improvements as well as trimming? IOW, we need make it clear that the path towards making JSON serializable types trimmable is via source generation.

@terrajobst terrajobst added the Cost:XL Work that requires one engineer more than 4 weeks label Nov 25, 2020
@layomia
Copy link
Contributor

layomia commented Nov 25, 2020

Thanks @terrajobst. I've added notes about goals to facilitate more trimming (removing unused converters, reflection code-paths) and be linker friendly (due to avoiding run-time reflection) and action items (#36782, https://github.com/dotnet/runtimelab/projects/1#card-49468644).

@marek-safar marek-safar added tracking This issue is tracking the completion of other related issues. and removed User Story A single user-facing feature. Can be grouped under an epic. labels Nov 27, 2020
@marek-safar marek-safar changed the title Generate Json serializers at build time Developers can safely trim their apps which use System.Text.Json to reduce the size of their apps Nov 27, 2020
@marek-safar marek-safar added User Story A single user-facing feature. Can be grouped under an epic. and removed tracking This issue is tracking the completion of other related issues. labels Nov 27, 2020
@layomia layomia changed the title Developers can safely trim their apps which use System.Text.Json to reduce the size of their apps Use C# source generators to yield better performance for apps using System.Text.Json Dec 1, 2020
@layomia
Copy link
Contributor

layomia commented Dec 1, 2020

This issue was originally created to track multiple goals achievable with AOT source generation including:

  • Improved start up perf
  • Improved run-time throughput
  • Reduced private bytes usage
  • ILLinker friendliness due to avoiding runtime reflection
  • Reduced application size: facilitating linker removal of unused reflection-based code-paths of the serializer and unused converters.

I created #45441 to track the user story "Developers can safely trim their apps which use System.Text.Json to reduce the size of their apps" which depends on the work in this issue.

@danmoseley danmoseley changed the title Use C# source generators to yield better performance for apps using System.Text.Json Developers apps using JSON serialization start up and run faster Dec 1, 2020
@danmoseley
Copy link
Member

danmoseley commented Dec 1, 2020

@layomia we're trying to title all our User Stories in terms of customer benefit (WHO gets WHAT), so we focus on the result we are aiming for. Stories can depend on each other, but the actual work is in the issues parented by the stories.

So your title for #45441 is perfect, and I've retitled this story in that format as well. Feel free to adjust.

@danmoseley
Copy link
Member

danmoseley commented Dec 1, 2020

@layomia looking a little more here, I think this user story is missing the child issues that encompass the work required to achieve it. We should have issues for the various parts of the source generator work -- I assume you have an idea what those parts are, you will want to create them at some point and parent them under this story.

I suggest something like

Developers apps using JSON serialization start up and run faster #1568 (User Story)
|---------- JSON source generator (just issue)
|                      |---------------
|                      |-------------- various issues breaking up the work for the source generator
|
Developers can safely trim their apps which use System.Text.Json to reduce the size of their apps #45441 (User Story)
|----------JSON source genrator (same issue as above - it has two parents)
                    |----------- etc.

does that seem reasonable?

@layomia
Copy link
Contributor

layomia commented Dec 2, 2020

Thanks @danmosemsft that makes sense. I created #45448 (to be further fleshed out) to track the source generation work items which should satisfy these user stories

@layomia
Copy link
Contributor

layomia commented Jul 22, 2021

With the JSON source generator checked in, we can consider this work done. Please see "Try the new System.Text.Json source generator" for an end-to-end overview of how the generator works and its benefits.

@layomia layomia closed this as completed Jul 22, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Aug 22, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Text.Json Cost:XL Work that requires one engineer more than 4 weeks Priority:0 Work that we can't release without Team:Libraries tenet-performance Performance related issue User Story A single user-facing feature. Can be grouped under an epic.
Projects
None yet
Development

No branches or pull requests