-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.Text.Json's generator causes considerable package size increase #77897
Comments
Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis Issue DetailsDescriptionWe're currently working on migrating all our JSON serialization from Newtonsoft.Json to System.Text.Json in the Microsoft Store, and we're hitting some issues with respect to binary size after enabling the source generators for all our types. Our data models for JSON responses the Store client handles from service are about 260, which once annotated over a context, transitively cause a very large number of types (ie. When trying this out, this caused our package size to regress from 59MB to about 76MB, so that's a 17MB (~29%) increase. This is for a final package with x86, x64 and Arm64 architectures, compiled fully AOT with .NET Native, with trimming enabled. For reference, with trimming disabled we baseline around 80MB without System.Text.Json, so I'd expect the version with System.Text.Json to be around the 100MB mark. We'll have to investigate whether the tradeoff for this size regression is worth the benefits of the source generators here (specifically for us, faster performance and less memory use when deserializing responses, and reliable behavior that's trimmer-safe), but I wanted to open this issue to investigate whether the generator can also be improved to reduce the metadata increase it causes. For context: all these models are generated in metadata only mode, so the size increase is with just the metadata support code. No serialization fast-path is generated. One thing I noticed is, we have dozens and dozens of files like this, generated due to the transitive closure of the types we have annotated: Generated code (click to expand):// <auto-generated/>
#nullable enable annotations
#nullable disable warnings
// Suppress warnings about [Obsolete] member usage in generated code.
#pragma warning disable CS0618
namespace MyProject
{
internal sealed partial class MyJsonSerializerContext
{
private global::System.Text.Json.Serialization.Metadata.JsonTypeInfo<global::System.Collections.Generic.ICollection<global::SomeModelType>>? _ICollectionSomeModelType;
/// <summary>
/// Defines the source generated JSON serialization contract metadata for a given type.
/// </summary>
public global::System.Text.Json.Serialization.Metadata.JsonTypeInfo<global::System.Collections.Generic.ICollection<global::SomeModelType>> ICollectionSomeModelType
{
get => _ICollectionSomeModelType ??= Create_ICollectionSomeModelType(Options);
}
// Intentionally not a static method because we create a delegate to it. Invoking delegates to instance
// methods is almost as fast as virtual calls. Static methods need to go through a shuffle thunk.
private global::System.Text.Json.Serialization.Metadata.JsonTypeInfo<global::System.Collections.Generic.ICollection<global::SomeModelType>> Create_ICollectionSomeModelType(global::System.Text.Json.JsonSerializerOptions options)
{
global::System.Text.Json.Serialization.Metadata.JsonTypeInfo<global::System.Collections.Generic.ICollection<global::SomeModelType>>? jsonTypeInfo = null;
global::System.Text.Json.Serialization.JsonConverter? customConverter;
if (options.Converters.Count > 0 && (customConverter = GetRuntimeProvidedCustomConverter(options, typeof(global::System.Collections.Generic.ICollection<global::SomeModelType>))) != null)
{
jsonTypeInfo = global::System.Text.Json.Serialization.Metadata.JsonMetadataServices.CreateValueInfo<global::System.Collections.Generic.ICollection<global::SomeModelType>>(options, customConverter);
}
else
{
global::System.Text.Json.Serialization.Metadata.JsonCollectionInfoValues<global::System.Collections.Generic.ICollection<global::SomeModelType>> info = new global::System.Text.Json.Serialization.Metadata.JsonCollectionInfoValues<global::System.Collections.Generic.ICollection<global::SomeModelType>>()
{
ObjectCreator = null,
NumberHandling = default,
SerializeHandler = null
};
jsonTypeInfo = global::System.Text.Json.Serialization.Metadata.JsonMetadataServices.CreateICollectionInfo<global::System.Collections.Generic.ICollection<global::SomeModelType>, global::SomeModelType>(options, info);
}
return jsonTypeInfo;
}
}
} In this case, the Generated code (click to expand):private global::System.Text.Json.Serialization.Metadata.JsonTypeInfo<global::System.Collections.Generic.ICollection<T>> Create_ICollectionForType<T>(global::System.Text.Json.JsonSerializerOptions options)
{
global::System.Text.Json.Serialization.Metadata.JsonTypeInfo<global::System.Collections.Generic.ICollection<T>>? jsonTypeInfo = null;
global::System.Text.Json.Serialization.JsonConverter? customConverter;
if (options.Converters.Count > 0 && (customConverter = GetRuntimeProvidedCustomConverter(options, typeof(global::System.Collections.Generic.ICollection<T>))) != null)
{
jsonTypeInfo = global::System.Text.Json.Serialization.Metadata.JsonMetadataServices.CreateValueInfo<global::System.Collections.Generic.ICollection<T>>(options, customConverter);
}
else
{
global::System.Text.Json.Serialization.Metadata.JsonCollectionInfoValues<global::System.Collections.Generic.ICollection<T>> info = new global::System.Text.Json.Serialization.Metadata.JsonCollectionInfoValues<global::System.Collections.Generic.ICollection<T>>()
{
ObjectCreator = null,
NumberHandling = default,
SerializeHandler = null
};
jsonTypeInfo = global::System.Text.Json.Serialization.Metadata.JsonMetadataServices.CreateICollectionInfo<global::System.Collections.Generic.ICollection<T>, T>(options, info);
}
return jsonTypeInfo;
} And then the property would just do: public global::System.Text.Json.Serialization.Metadata.JsonTypeInfo<global::System.Collections.Generic.ICollection<global::SomeModelType>> ICollectionSomeModelType
{
get => _ICollectionSomeModelType ??= Create_ICollectionForType<global::SomeModelType>(Options);
} This trick can work for all "default" cases for several collection types as well (maybe for some concrete types as well, would have to check). Point is: the generator should identify all generated methods that have a high amount of shared code with others, and if possible rewrite them to be a single shared, generic method that all other consumers can invoke, instead of having their own copy. This would likely give us very nice improvements in terms of code size, if you consider this multiplied over several dozens of types. Let me know if we want a separate meta issue to track general size improvements for the generators, or if we just want to use this one as reference 🙂 cc. @eiriktsarpalis Reproduction Stepsn/a Expected behaviorn/a Actual behaviorn/a Regression?No response Known WorkaroundsNo response Configuration
Other informationNo response
|
Assuming the |
It's the Microsoft Store, so it's unfortunately not public 😅 |
Increase in the static footprint of a source generated application is to be expected, to an extent. Effectively it is trading off application size with runtime code generation. That being said, there certainly are opportunities to reduce the size of source generated code, e.g. by sharing metadata instantiation code between multiple types such as repeated generic type instantiations. |
@eiriktsarpalis Yup exactly, we'd be fine with some increase, but it just feels like we can do a bit better at least 🙂
|
We should definitely try to grab all of the low hanging fruits :-) Once we do that we can re-asses if we need/can do more. |
It is intentional. The idea is that ignored properties should still be accessible to users wishing to modify the JSON contract for a given type. |
Ah, I see. That's unfortunate in our case, as it seems to be causing a pretty big number of types to be brought in even though they're not actually needed. I'll see if maybe there's a way we could modify our models to remove those properties entirely, but it would really be nice if there was also a way to just tell the generator to really skip a property. Like, random idea, a new EDIT: looking at our codebase, it seems this amounts to almost half of the generated code 😥 |
@eiriktsarpalis I think it makes sense to remove them completely, users can add them back if they need them, right? |
We could, although it's likely some users will take a dependency on the .NET 7 behavior. We'd need to file a breaking change. |
Having that be changed to be opt-in sounds like a great solution to me. I'd also be happy to help if you wanted to try this out with a nightly build from a PR, I could use it in the Store and see how much of an impact it'd have, so we can get an idea 🙂 |
cc: @SilentCC who volunteered to help with JsonIgnore investigation |
Sharing another slightly different example we also just hit in the Store. Consider this: [JsonSerializable(typeof(MyModel))]
public partial class MyContext : JsonSerializerContext
{
}
public class MyModel
{
private List<string> _list = new();
public IReadOnlyList<string> List => _list;
} This will cause the generator to also generate type info metadata for This also makes me think: @eiriktsarpalis would it make sense to consider adding a new |
|
Possibly a dumb question, but how would one create instances of these models to serialize then 😅 I will also say I understand the point about diminishing returns though, I think maybe it's less noticeable now given that there's still the issue with too much generated code not being shared and also non-public members and ignored ones still being included in the generated metadata (ie. a mix of this issue, #77675 and #66679). Once those are addressed I agree it's possible that such a serialization-only mode wouldn't seem worth it anymore. Either way, it does seem like something we should only potentially reconsider again once these other issues are addressed, so yeah makes sense to shelve that for now 🙂 |
Bringing in this conversation from #74492. From earlier discussion from @agocke / @eiriktsarpalis / @davidfowlOK, this makes sense then, thanks.
This is good to know. Given that code size is a metric we're tracking for Native AOT we should keep an eye on how this scales out with larger apps. No action for now, though. From @layomia@eiriktsarpalis @agocke @davidfowl it might be good to think about it now. In the first release of the source generator, we decided not to introduce a global cache for sharing/abstracting metadata between assemblies. I think we should revisit the decision. Some potential challenges with this approach which would be good to keep in mind are documented here. My idea at the time (toward the end of .NET 6 dev time) was to use Assembly.GetCallingAssembly as a mechanism to partition the cache but @jkotas pointed out that it was slow and unreliable. His thought was that the solution requires new thinking (in .NET 7 or after) across the ASP.NET, serializer(s), source generators, and IL linker. Wanted to share this now in case it sparks some ideas. From @jkotasThe callsite rewriting gives you fast reliable equivalent of Assembly.GetCallingAssembly, but I do not see how it can help to solve the versioning problems described in the blog post. From @layomiaGotcha. If we have a design to reduce the size of source generated code (#77897) with the primary mechanism being shared caches between assemblies, do you think it would be incomplete if it doesn't solve the versioning issues? I recognize this area needs careful evaluation of the trade-offs (guided by .NET 8 goals). From @jkotasI think large projects have high probability of running into versioning issues. It is very common for large projects to run into versioning issues with Newtonsoft.Json today. We do not want source generators turn into a new source of versioning problems that are impossible to solve. If we are considering schemes that allow sharing of source generated serializers, I think we need to design it such that there is always an option to make explicit choice about when to share and when not share the source generated serializers. From @layomia👍 |
Just to add more context on this - at least for our use case in the Store, the size increase was not caused by duplicate generated code, as we only have a single JSON context (I mean, two, but one is very small and with separate types entirely). The issues in our cases are that:
This is not to say the global cache isn't useful - it sure seems to be! Just pointing out that size regressions caused by source generators aren't necessarily caused by duplication for the same types across different assemblies, is all 🙂 Also, as I mentioned in #74492 (comment), we're doing a bunch of work from our side at least to mitigate these issues, and we'll share updated size regression numbers as soon as we have a new working prototype with the new changes. I'm personally very curious to see if/how much the size regression will have been mitigated with just tweaks on the JSON model side, without yet having any new source generator improvements (which I'm still very much looking forwards too though ahah) 😄 |
Thanks for pointing this out. I agree with your assessment. It would be awesome to see what the size diff would be with your prototype. Nice to have all the related concerns in one spot. I'm just excited that a global cache is now a possibility given that there are many aspects of the generator's design that could have benefited from it earlier on. |
Tagging subscribers to 'size-reduction': @eerhardt, @SamMonoRT, @marek-safar Issue DetailsWe're currently working on migrating all our JSON serialization from Newtonsoft.Json to System.Text.Json in the Microsoft Store, and we're hitting some issues with respect to binary size after enabling the source generators for all our types. Our data models for JSON responses the Store client handles from service are about 260, which once annotated over a context, transitively cause a very large number of types (ie. When trying this out, this caused our package size to regress from 59MB to about 76MB, so that's a 17MB (~29%) increase. This is for a final package with x86, x64 and Arm64 architectures, compiled fully AOT with .NET Native, with trimming enabled. For reference, with trimming disabled we baseline around 80MB without System.Text.Json, so I'd expect the version with System.Text.Json to be around the 100MB mark. We'll have to investigate whether the tradeoff for this size regression is worth the benefits of the source generators here (specifically for us, faster performance and less memory use when deserializing responses, and reliable behavior that's trimmer-safe), but I wanted to open this issue to investigate whether the generator can also be improved to reduce the metadata increase it causes. For context: all these models are generated in metadata only mode, so the size increase is with just the metadata support code. No serialization fast-path is generated. One thing I noticed is, we have dozens and dozens of files like this, generated due to the transitive closure of the types we have annotated: Generated code (click to expand):// <auto-generated/>
#nullable enable annotations
#nullable disable warnings
// Suppress warnings about [Obsolete] member usage in generated code.
#pragma warning disable CS0618
namespace MyProject
{
internal sealed partial class MyJsonSerializerContext
{
private global::System.Text.Json.Serialization.Metadata.JsonTypeInfo<global::System.Collections.Generic.ICollection<global::SomeModelType>>? _ICollectionSomeModelType;
/// <summary>
/// Defines the source generated JSON serialization contract metadata for a given type.
/// </summary>
public global::System.Text.Json.Serialization.Metadata.JsonTypeInfo<global::System.Collections.Generic.ICollection<global::SomeModelType>> ICollectionSomeModelType
{
get => _ICollectionSomeModelType ??= Create_ICollectionSomeModelType(Options);
}
// Intentionally not a static method because we create a delegate to it. Invoking delegates to instance
// methods is almost as fast as virtual calls. Static methods need to go through a shuffle thunk.
private global::System.Text.Json.Serialization.Metadata.JsonTypeInfo<global::System.Collections.Generic.ICollection<global::SomeModelType>> Create_ICollectionSomeModelType(global::System.Text.Json.JsonSerializerOptions options)
{
global::System.Text.Json.Serialization.Metadata.JsonTypeInfo<global::System.Collections.Generic.ICollection<global::SomeModelType>>? jsonTypeInfo = null;
global::System.Text.Json.Serialization.JsonConverter? customConverter;
if (options.Converters.Count > 0 && (customConverter = GetRuntimeProvidedCustomConverter(options, typeof(global::System.Collections.Generic.ICollection<global::SomeModelType>))) != null)
{
jsonTypeInfo = global::System.Text.Json.Serialization.Metadata.JsonMetadataServices.CreateValueInfo<global::System.Collections.Generic.ICollection<global::SomeModelType>>(options, customConverter);
}
else
{
global::System.Text.Json.Serialization.Metadata.JsonCollectionInfoValues<global::System.Collections.Generic.ICollection<global::SomeModelType>> info = new global::System.Text.Json.Serialization.Metadata.JsonCollectionInfoValues<global::System.Collections.Generic.ICollection<global::SomeModelType>>()
{
ObjectCreator = null,
NumberHandling = default,
SerializeHandler = null
};
jsonTypeInfo = global::System.Text.Json.Serialization.Metadata.JsonMetadataServices.CreateICollectionInfo<global::System.Collections.Generic.ICollection<global::SomeModelType>, global::SomeModelType>(options, info);
}
return jsonTypeInfo;
}
}
} In this case, the Generated code (click to expand):private global::System.Text.Json.Serialization.Metadata.JsonTypeInfo<global::System.Collections.Generic.ICollection<T>> Create_ICollectionForType<T>(global::System.Text.Json.JsonSerializerOptions options)
{
global::System.Text.Json.Serialization.Metadata.JsonTypeInfo<global::System.Collections.Generic.ICollection<T>>? jsonTypeInfo = null;
global::System.Text.Json.Serialization.JsonConverter? customConverter;
if (options.Converters.Count > 0 && (customConverter = GetRuntimeProvidedCustomConverter(options, typeof(global::System.Collections.Generic.ICollection<T>))) != null)
{
jsonTypeInfo = global::System.Text.Json.Serialization.Metadata.JsonMetadataServices.CreateValueInfo<global::System.Collections.Generic.ICollection<T>>(options, customConverter);
}
else
{
global::System.Text.Json.Serialization.Metadata.JsonCollectionInfoValues<global::System.Collections.Generic.ICollection<T>> info = new global::System.Text.Json.Serialization.Metadata.JsonCollectionInfoValues<global::System.Collections.Generic.ICollection<T>>()
{
ObjectCreator = null,
NumberHandling = default,
SerializeHandler = null
};
jsonTypeInfo = global::System.Text.Json.Serialization.Metadata.JsonMetadataServices.CreateICollectionInfo<global::System.Collections.Generic.ICollection<T>, T>(options, info);
}
return jsonTypeInfo;
} And then the property would just do: public global::System.Text.Json.Serialization.Metadata.JsonTypeInfo<global::System.Collections.Generic.ICollection<global::SomeModelType>> ICollectionSomeModelType
{
get => _ICollectionSomeModelType ??= Create_ICollectionForType<global::SomeModelType>(Options);
} This trick can work for all "default" cases for several collection types as well (maybe for some concrete types as well, would have to check). Point is: the generator should identify all generated methods that have a high amount of shared code with others, and if possible rewrite them to be a single shared, generic method that all other consumers can invoke, instead of having their own copy. This would likely give us very nice improvements in terms of code size, if you consider this multiplied over several dozens of types. Let me know if we want a separate meta issue to track general size improvements for the generators, or if we just want to use this one as reference 🙂 cc. @eiriktsarpalis Known WorkaroundsDon't use the generators at all. Not desireable due to performance and trimming concerns. Configuration
|
Hey there! We're making good progress with the System.Text.Json migration in the Microsoft Store and we're finally ready to start testing out a full migration of our entire backend. We've done a bunch of changes to reduce the number of types to generated metadata for, to avoid unnecessary combinations of types (eg. we unified all of our collection types to just always use I re-run our pipeline to get updated size diffs (consider this is for x86 + x64 + Arm64):
Delta: +7.842 KB (+12.65%). @eiriktsarpalis I'm happy to share a new minimal repro (internally) with our updated packages if anyone wants to take a look at the generated code to identify more optimization opportunities, and I'm also more than happy to try out any preview builds from the 8.0.0 branch and report back updated size diffs if you also wanted to use the Store as another reference point to track the impact of the new size saving work going into the next System.Text.Json release 🙂 |
/cc @lewing @kotlarmilos |
Tested again after enabling trimming for all our JSON models (we had to previous keep metadata there due to Newtonsoft.Json, but that's no longer needed now that we're using the System.Text.Json source generators), updated results:
Delta: +6.291 KB (+10.15%). So that's a nice improvement already, though still a noticeable regression when enabling the source generators. FYI @krwq we have a total of 65 generated Basically this: private JsonTypeInfo<List<T>> CreateListTypeInfo<T>(JsonSerializerOptions options, bool makeReadOnly)
{
JsonTypeInfo<List<T>>? jsonTypeInfo = null;
JsonConverter? customConverter;
if (options.Converters.Count > 0 && (customConverter = GetRuntimeProvidedCustomConverter(options, typeof(List<T>))) != null)
{
jsonTypeInfo = JsonMetadataServices.CreateValueInfo<List<T>>(options, customConverter);
}
else
{
JsonCollectionInfoValues<List<T>> info = new JsonCollectionInfoValues<List<T>>()
{
ObjectCreator = () => new List<T>(),
NumberHandling = default,
SerializeHandler = null
};
jsonTypeInfo = JsonMetadataServices.CreateListInfo<List<T>, T>(options, info);
}
if (makeReadOnly)
{
jsonTypeInfo.MakeReadOnly();
}
return jsonTypeInfo;
} |
@Sergio0694 I believe most of the immediate size concerns have been addressed right? I'm going to close this issue, but feel free to reopen if other action items still exist. |
Yup, sounds good! Still a regression (which is by design) compared to no generators, but it's much better than before 🙂 |
We're currently working on migrating all our JSON serialization from Newtonsoft.Json to System.Text.Json in the Microsoft Store, and we're hitting some issues with respect to binary size after enabling the source generators for all our types. Our data models for JSON responses the Store client handles from service are about 260, which once annotated over a context, transitively cause a very large number of types (ie.
JsonTypeInfo<T>
properties) to be generated, precisely 742 of them (this also includes eg. collection types with any of these data types as element type, etc.). That's a lot 😅When trying this out, this caused our package size to regress from 59MB to about 76MB, so that's a 17MB (~29%) increase. This is for a final package with x86, x64 and Arm64 architectures, compiled fully AOT with .NET Native, with trimming enabled. For reference, with trimming disabled we baseline around 80MB without System.Text.Json, so I'd expect the version with System.Text.Json to be around the 100MB mark. We'll have to investigate whether the tradeoff for this size regression is worth the benefits of the source generators here (specifically for us, faster performance and less memory use when deserializing responses, and reliable behavior that's trimmer-safe), but I wanted to open this issue to investigate whether the generator can also be improved to reduce the metadata increase it causes.
For context: all these models are generated in metadata only mode, so the size increase is with just the metadata support code. No serialization fast-path is generated. One thing I noticed is, we have dozens and dozens of files like this, generated due to the transitive closure of the types we have annotated:
Generated code (click to expand):
In this case, the
Create_ICollectionSomeModelType
method is effectively the same as dozens of other files, just with a different type argument. One idea for the generator to produce less code is for it to identify all these cases where the "default" path is used (ie. there's no custom converter known at compile time for this type), and just emit a shared stub just once, that the various properties can then reuse. For instance, it could be something like this:Generated code (click to expand):
And then the property would just do:
This trick can work for all "default" cases for several collection types as well (maybe for some concrete types as well, would have to check). Point is: the generator should identify all generated methods that have a high amount of shared code with others, and if possible rewrite them to be a single shared, generic method that all other consumers can invoke, instead of having their own copy.
This would likely give us very nice improvements in terms of code size, if you consider this multiplied over several dozens of types.
Let me know if we want a separate meta issue to track general size improvements for the generators, or if we just want to use this one as reference 🙂
cc. @eiriktsarpalis
Known Workarounds
Don't use the generators at all. Not desireable due to performance and trimming concerns.
Configuration
The text was updated successfully, but these errors were encountered: