-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Object (de)serialization support with Utf8JsonReader\Writer - entry point and options #28325
Comments
Thank you for describing your API and performance concepts.
Performance
This probably won't be true if you want to support things like ReadOnlyCollection, I probably would simply say "related properties and types".
This will be slow, I recommend talking to @rynowak about his work in ASP.NET Core Routing and the IL Matcher (the code is pretty much what you need, technically matching a route and matching a property name is quite similar), this is by far the fastest approach, in both the Routing and all the experiments I did for SpanJson https://github.com/Tornhoof/SpanJson/wiki/Technology-and-Internals#3-comparison-against-integers (which is pretty much identical to the approach in Routing). |
Where’s the async API? |
In what scenario would you require async (de)serialize APIs? Can you please provide a sample usage? |
I'm going to be a bit obtuse here. If we're not making these APIs async then we need to have a long discussion again about ASP.NET Core usage of these APIs. Input: Output: Just to throw out some more examples here's the HttpClient JSON formatter that's used today in most applications: To mitigate sync reading, today we buffer the entire Stream into memory or disk (depending on the size): Which is unfortunate and we'd really love to avoid that. The plan was to have to get that benefit from the new JSON reader/writer/serializer. We don't have a good solution for output today. You end up doing synchronous IO once you hit a buffer threshold which ends up doing sync over async and leans towards causing thread pool starvation. See dotnet/aspnetcore#6397 for more details on output. ASP.NET Core is going to disable all sync IO on the request and response by default in 3.0. |
For the ASP.NET Core formatter you additionally need to support a bag of json properties, which is either serialized into the parent object or deserialized into the bag for RFC 7807 Error Details. See aspnet/Mvc#8529 for more details, that one uses JSON.NET's https://www.newtonsoft.com/json/help/html/T_Newtonsoft_Json_JsonExtensionDataAttribute.htm |
cc @KrzysztofCwalina, @jkotas, @glennc, @terrajobst as an FYI (note: this is still WIP) |
My feedback in a rough stream-of-consciousness format. On JsonConverter, we will need the ability to specify the type dynamically as an argument - in addition to the generic overloads. Without this, it will really cause a lot of folks to use a workaround to call the generic version with
I'm really intrigued by how I think it would be really neat to see a recipe guide or some side-by-side examples: "this is how you'd set the json property name with an attribute vs with code". Another natural thing that users will want from The Async is a higher-level bugaboo (I think that's a SFW replacement for what I was thinking 😆). I still think C# and .NET do async better than any other language out there, but we're still struggling to pull together an end-to-end here for the web that actually avoids synchronous I/O at all levels. From my perspective, I'm still excited to see all of the work that's here. I think we're solving a very critical problem and these are all steps in the right direction. |
Yes that makes sense. The API is still being defined, and this issue wasn't meant to be complete at this point.
Currently those simple settings are in JsonPropertyAttribute which can be "added" at run-time. If too verbose we may just place those right on JsonConverterSettings.
Yes, of course.
Thanks. I updated the description and implementation to use an interger-based array which is looking very promising at this point in time. |
Yes that will be added. Thanks
As it is now, JsonConverterSettings is fairly clean and only contains settings that would need to be specified at run-time like MaxDepth and DefaultBufferSize (I'll update the API doc soon with new members). The mashup is JsonPropertyAttribute for now.
Yes there is some loose coupling with the extension points like the property casing where you can provide the implementation. Other simple settings will not really be extensible in that way but the values can be specified via attributes at the assembly, class and property level plus run-time overrides to each of these -- i.e. the values for these simple settings can be changed in many flexible ways, but not necessarily the code that uses them.
Yes I will do this in the next update.
JsonClassMaterializer is just an enum. The implementation is internal. We may not even need the enum if we decide the default is ok. Currently the default will attempt to use the most efficient mechanism (currently IL gen for get,set,ctor) but if IL gen is not supported then use standard reflection. However, the enum is there currently for someone to change the default if for example they have a single object to deserialize and don't want any additional memory overhead (and initial perf hit) of IL so they want to use standard reflection instead. |
I added the async apis along with a brief description. Currently they only support a Stream, more if necessary. I have a prototype that verifies the feasibility. |
Some quick feedback:
|
@steveharter If you care about integer tricks, you can optimize the json member writing for utf8 with the same trick. Assuming you will bake precalculated byte-arrays for the property names into the IL, you can use the same concept as for deserialization. Instead of copying the precalculated buffer to the output, just write appropriate integer values to the output. jsonWriter.WriteUtf8Verbatim(7306916068917079330UL /*"message*/, 14882 /*":*/); Atleast in my SpanJson tests that was the last main difference in serialization speed between UTF16 and UTF8 as, atleast for expression trees, there is a large difference if I have That's more or less the last larger performance trick in SpanJson as now the UTF8 and UTF16 serialization speeds are the same, still deserialization of UTF8 is slower, but that's mostly due to the utf8 to utf16 overhead for strings., especially since @stephentoub did all those nice |
An additional thing we would need to support Azure SDK: version resilient serialization, including round-tripping. We need to be able to add a property bag field (dictionary) to deserialized types. If deserialized payload has a property that does not exist in the type, the property would be saved in the dictionary. The serializer would serialize not only actual properties of the type, but also the dictionary (so all payload round trips). cc: @schaabs |
@KrzysztofCwalina @Tornhoof's comment from here - https://github.com/dotnet/corefx/issues/34372#issuecomment-451928379 - goes in to this. JSON.NET supports it using |
I spent a little bit of time plugging in the deserializer in to MVC to see what comes out of it. This is based on version 0.1.1-e190205-1 of the package.
var settings = new JsonConverterSettings();
settings.AddAttribute(modelType, new JsonCamelCasingConverterAttribute()); This works for the immediate properties of
|
Thanks for the feedback! The attributes can be specified at an Assembly level for all types in a given Assembly. By using design-time attributes like this it helps to isolate the settings for a particular consumer's types assuming they belong to the same Assembly. I can revisit this and\or make it easier to specify "globally" per the settings class if we want to do that. settings.AddAttribute(modelType.Assembly, new JsonCamelCasingConverterAttribute());
Not by default (current plan). I relialize this is different than Json.Net. However this helps with performance slightly and is the "right" json-spec thing to do. I haven't added the case-insensitive switch yet.
Yes that is the current plan. I will be working on it this week.
I'll look into this. @ahsonkhan has this been considered?
Dictionary is not implemented yet. The next iteration (and current PR in corefxlab) support the Nullable types.
Yes I plan on having the exceptions cleaned up and have the information regarding line number etc. The internal reader throws FormatException as well and I'll switch to using the TryParse* methods instead to avoid that. |
Yes, and given single quote strings are not JSON RFC compliant, we decided against supporting it at the lowest layer. Adding too many of these switches/options results in perf costs for the most common cases of using the That said, as an alternative solution, we should evaluate if this is something that occurs frequently at the higher layer (i.e.
Unfortunately, this has performance implications since it requires potentially storing the path in some re-sizable data structure which ends up allocating. We have made an effort to keep the reader non-allocating with emphasis on performance which made providing |
Similarly I've tried out the deserializer on the SignalR side, version 0.1.1-e190201-1. One of the biggest issues is that we use the var bytes = reader.HasValueSequence ? reader.ValueSequence.ToArray() : reader.ValueSpan;
if (reader.TokenType == JsonTokenType.String)
{
var b = new byte[bytes.Length + 2];
b[0] = 34; // "
bytes.CopyTo(b.AsSpan().Slice(1, bytes.Length));
b[bytes.Length + 1] = 34; // "
bytes = b.AsSpan();
}
var obj = Serialization.JsonConverter.FromJson(bytes, type); (Yes I know this is unoptimal, but ignore that) Another thing is that the Because Json properties can be in any order we sometimes need to store a "JArray" like object to then be used later in the deserialzer. For now I store the Looking to the future where we'll be using the On a brigher note, when I did some microbenchmarks with the cc @ahsonkhan I probably missed some feedback from when I was talking with you. |
That wouldn't necessarily work for MVC since complex sub-properties could come multiple different assemblies.
That sounds perfect. |
I see three way to add global settings:
Basically options 2 and 3 prevent duplicating 10+ eventual properties from JsonPropertyAttribute. However if there is a need to use JsonCamelCasingConverterAttribute or having a different DateTime converter, etc. then option 2 looks more appealing. Even if we do option 2 we could still do option 1 for JsonPropertyAttribute for better discoverability of common options. Currently the API here is taking option 1 but currently just for JsonPropertyAttribute, not for camel-casing and others (pending API discussion). |
Is there really a case to be made that configuring a setting per-assembly is more common that configuring a setting globally? Put another way, I think there's a really strong case to be made for configuring a setting per-type, per-property and globally, and a weak case to be made for configuring settings per-assembly. Do you have use-cases in mind for that? |
Here you go:
|
That seems like you'd just apply the defaults globally though not per-assembly... |
Is the feature you are asking for controlling the deserialization order of properties? And secondary I suppose specify the serialization order although that's not important. |
Perhaps if you don't use types from other assemblies that have their own contract\schema. However, currently there is not a way to specify design-time attributes globally (across assemblies). You could use the run-time AddAttribute(null, myattribute) however if we decide that will work. Do you have something else in mind (config file support maybe)? |
@steveharter We need a way of storing a json array for later use in the deserializer. For example, the following two json payloads are equivalent and the target value is used to determine what types are in the arguments array, however, when using the forward reading only { "target": "name", "arguments": [ 1, "example", 2.3 ] }
{ "arguments": [ 1, "example", 2.3 ], "target": "name" } With Newtonsoft.Json we can do: argumentsToken = JArray.Load(reader);
// ... later
for (var i = 0; i < length; i++)
{
argumentsToken[i].ToObject(type, serializer);
} |
Right, I'm questioning why we need the ability to specify settings per-assembly. Existing JSON serializers don't have it and I've never see a user ask for it. Specifying settings for the application as a whole (regardless of what assembly the data types live in) is exceedingly common for a web project. |
Notes from today are here. |
IMO, the deserialization calls should implement 'TryRead'. Since the json attempted to be deserialized typically comes from external sources, serialization exceptions are expected to be a common occurrence. If we place deserialization at the edge of a protocol, then the try/catch pattern will cause a noticeable amount of performance (time) to serve the request due to exception catching. |
It seems you require three pending features:
For (1) we have had discussions on using the existing For (2) and (3) I am working on the explicit class (de)serialize feature where you can implement before- and after-serialize methods along with per-property callbacks including "overflow" properties that exist in the json but not the class. Here's a mock-up using // Either through an attribute or code, register your callback\converter:
settings.AddConverter<MyPocoType>(typeof(MyPocoTypeConverter));
...
// Implement your callbacks (an instance of this struct is created per POCO instance):
public struct MyPocoTypeConverter : ITypeConverterOnMissingPropertyDeserialized, ITypeConverterOnDeserialized
{
IList<JsonElement> _arguments;
void ITypeConverterOnMissingPropertyDeserialized.OnMissingPropertyDeserialized(
object obj,
JsonClassInfo jsonClassInfo,
string propertyName,
object propertyValue,
JsonTokenType tokenType,
JsonSerializerOptions options)
{
if (jsonPropertyInfo.PropertyName == "arguments")
{
Debug.Assert(tokenType == JsonTokenType.Array);
_arguments = (IList<JsonElement>)propertyValue;
}
}
void ITypeConverterOnDeserialized.OnDeserialized(
object obj,
JsonClassInfo jsonClassInfo,
JsonSerializerOptions options)
{
MyPocoType poco = (MyPocoType)obj;
if (poco.Target == "CallMethod2")
{
poco.Method2(
_arguments[1].GetString(),
_arguments[0].GetInt32(),
_arguments[2].GetDecimal()
}
}
} |
@steveharter I think there is some misunderstanding of our scenario. We use the I showed the Newtonsoft example which is what we already use for the current Json parsing and works perfectly. I think what you're showing is assuming we just call the deserializer with the whole payload which is definitely not what we're doing. We need interoperability with the if (utf8JsonReader.ValueSpan.SequenceEqual("arguments"))
{
argumentsToken = JArray.Load(utf8JsonReader);
}
// ... later
for (var i = 0; i < argumentsToken.Size(); i++)
{
argumentsToken[i].ToObject(type, serializer);
} |
@BrennanConroy thanks for the clarification. Yes I was assuming you wanted to use the (de)serializer for the entire scenario, at least for the flow-of-control because the (de)serializer will allow access to the lower-level reader\writer in specific callbacks. We currently have |
Should I open a new issue for that feature? |
@BrennanConroy I'll discuss the feature offline with @bartonjs and @ahsonkhan and then reply back here. One of the cool things about the reader (and writer) is that it supports a streaming model where the buffer doesn't have to contain all of the data. When the reader runs out of buffer for the current property, it returns false and expects to be called again with a fresh buffer (and to be passed the previous JsonReaderState object back in so it can pick up where it left off). However this complicates scenarios where a more than one property must be returned like in a "JArray" case because every property in theory may have to ask for a fresh buffer. The deserializer handles this scenario and returns the whole object\array when it is finished including when it spans multiple buffer refreshes (which occurs when using the async pipe\stream APIs). However, the reader doesn't support this scenario internally (it leaves it up to the caller, like the serializer) and the document\element supports this scenario but requires all of the data up-front in basically one large buffer (for stream cases it reads to the end up-front). So we have to decide who should support this multi-property functionality - the reader\writer, the document\element, the (de)serializer, and\or the caller. |
Is it really necessary for implementers of Could we add a method to Or alternatively, could we have a single |
I've been working with @ahsonkhan on addressing this by either pre-populating the property name and just calling WriteValue methods, wrapping the writer somehow, or by having the Write methods in the writer detect whether the property name is empty and call WriteValue if so (however unfortunately json allows empty property names so that may not work). |
@BrennanConroy update: we are working on extending |
Would it make sense to add a It wouldn't surprise me if we would want to add more options related to reading and writing later and having some options in |
Also it will be usefull to have something like custom Exression serializer builder
It will be usefull when class is separated from json serializer logic |
A big gap I see in the proposal as specified is constructor handling. For a domain-driven design I think it is common to instantiate required parameters in constructors so that the object is always in a valid state. Since this current proposal only looks for a constructor without parameters the current state of deserialization is unfortunately limited to simple types like data transfer objects. Is more advanced constructor handling being considered? |
namespace System.Text.Json.Serialization
{
partial class JsonSerializerOptions
{
public JsonClassInfo GetClassInfo<T>() => GetClassInfo(typeof(T));
}
} Update: Retracted suggestion based on comment by @khellang (https://github.com/dotnet/corefx/issues/34372#issuecomment-475721641) |
It was just covered in the review; https://www.youtube.com/watch?v=_CdV75tEsVk. TLDR; There's no need for it and it's easy enough for you to add it yourself as an extension method. |
@steveharter Have there been any updates/consideration to using The types in |
I believe that decision is closed. cc @terrajobst However other feedback is whether |
Has there been considered or is it possible for the serializer to provide an additional "report" of deserialized properties? Consider the following POCO: public class Person
{
public string FirstName { get; set; }
public string LastName { get; set; }
public DateTime BirthDay { get; set; }
} Since no property is nullable (assuming nullable reference types are enabled and considered), all properties are required by default, but if you call ReadOnlySpan<byte> utf8= ...
Person person = JsonSerializer.Parse<Person>(utf8); Where {
"firstName": "John",
"lastName": "Doe"
} You'll end up with At that point, you have no clue whether the The problem with that, is that now you have a DTO full of nullable properties, which all contain valid values (verified by validation). Every time you want pull a value from a property, you have to either guard against null (or use @rynowak Is this something that MVC could use? |
It sounds like what you're after here is input validation - validating what's on the wire based on metadata associated with properties. I don't think what you're proposing is going to be enough to do that in a good way - IMO if we want to have input validation support it would have to be based on callbacks built into the serializer, not layered on top. Speaking of input validation, I'm not sure why you'd want to use that to specify a behaviour difference between |
Yeah, but not any advanced validation, just present/not present.
Why would it need to be callback-based? What is my proposal lacking to do it "in a good way"?
That's not what I'm saying. For nullable properties, I think explicit |
OK thanks that helps. I think your proposal is lacking because it solves the problem at the wrong layer. It assumes that we can get a result from the serializer that ASP.NET knows how to interpret - I don't think this goes far enough. A feature like this has be more built in than returning a list of properties. Think about what would happen for a nested object - how would that work? |
What layer is the correct layer? The only one that knows this information is the serializer, so it has to provide this information somehow. Are you saying you want it at an even lower level? 🤔
That's what you have JSON path for 😉 |
Anyway, I'm not dead set on a list of properties. I'm totally open for alternative solutions. I just think it's a problem worth solving, cause today's "workarounds" are pretty bad. This seems (to me) like the best level to solve it, and a perfect opportunity as the APIs are being brought in. |
I'm not sure how a nice API could possibly look like for this though. A list of property paths would be a bit annoying since then you'd spent most of the time parsing paths. I think some callback during the binding of a single object would work best actually. Passing in the target object instance, the source JSON reader and maybe a path would probably work. And that callback could then allow you to customize the binding behavior, or it could be used for validation. |
In regards to the top-level namespace...
I think
Are you asking if the public types currently in If so, and seeing that |
One reason of choosing Maybe all public classes under |
in reference to @khellang's comment, I definitely think there should be a way to differentiate whether a property was supplied if it is not a nullable type. I specifically ran into this situation in a Web API 2 API for my employer. We want the RequestModel to be validated but I also want the RequestModel to include primitive properties such as int, bool, etc. Example: I have a request model like this: public class GetATMsRequestModel
{
[Required]
public decimal Latitude { get; set; }
[Required]
public decimal Longitude { get; set; }
} Notice that on my decimal properties is have added the
To solve this problem I had write a custom ContractResolver for use with newtonsoft.json like this: public class RequiredPropertyContractResolver : CamelCasePropertyNamesContractResolver
{
protected override JsonProperty CreateProperty(MemberInfo member, MemberSerialization memberSerialization)
{
JsonProperty property = base.CreateProperty(member, memberSerialization);
// if we are marking up the property as 'required', we need to make sure that the JSON serialization will
// respect that and force the JSON to contain the value we want. Setting the JsonProperty Required field
// to Required.Always causes the serializer to throw an exception if the JSON doesn't contain the property
if(member.CustomAttributes.Any(a => a.AttributeType == typeof(System.ComponentModel.DataAnnotations.RequiredAttribute)))
{
property.Required = Required.Always;
}
return property;
}
} If we add the |
This is the API proposal and feature set for object (de)serialization. It covers the main API entry point (
JsonSerializer
), the options (JsonSerializerOptions
) and the ability to ignore properties during (de)serialization.Review process update: due to the size of this issue and since new API issues are being added for new features, the overview and forward-looking information has been moved to https://github.com/dotnet/corefx/blob/master/src/System.Text.Json/docs/SerializerProgrammingModel.md. Future API additions will have their own independent API issue created instead of re-using this issue.
Current Status
The reviewed portion of the API is in corefx master and Preview 4. It consists of the
JsonSerializer
and theJsonSerializerOptions
class and provides no extensibility features.There is a previous prototype at CoreFxLab in the package
System.Text.JsonLab.Serialization
. It contains non-reviewed APIs including extensibility features like using attributes to define custom value converters, property name policies (like camel-casing), etc. It is build against .NET Core 3.0 Preview 2.The last status of the API review is shown below.
API
JsonSerializer
This static class is the main entry point.
Let's start with coding examples before the formal API is provided.
Using a simple POCO class:
To deserialize JSON bytes into a POCO instance:
To serialize an object to JSON bytes:
The string-based
Parse()
andToString()
are convenience methods for strings, but slower than using the<byte>
flavors because UTF8 must be converted to\from UTF16.JsonSerializerOptions
This class contains the options that are used during (de)serialization.
If an instance of
JsonSerializerOptions
is not specified when calling read\write then a default instance is used which is immutable and private. Having a global\static instance is not a viable feature because of unintended side effects when more than one area of code changes the same settings. Having a instance specified per thread\context mechanism is possible, but will only be added pending feedback. It is expected that ASP.NET and other consumers that have non-default settings maintain their own global, thread or stack variable and pass that in on every call. ASP.NET and others may also want to read a .config file at startup in order to initialize the options instance.An instance of this class and exposed objects will be immutable once (de)serialization has occurred. This allows the instance to be shared globally with the same settings without the worry of side effects. The immutability is also desired with a future code-generation feature. Due to the immutable nature and fine-grain control over options through Attributes, it is expected that the instance is shared across users and applications. We may provide a Clone() method pending feedback.
For performance, when a
JsonSerializerOptions
instance is used, it should be cached or re-used especially when run-time attributes are added because when that occurs, caches are held by the instance instead of being global.The text was updated successfully, but these errors were encountered: