-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement JSON serialization/deserialization via Utf8JsonReader/Utf8JsonWriter #30604
Comments
Consider caching the JSON property names as JsonEncodedText in a annotation on the property |
@maumar on the UTF16/UTF8 question (2nd point above)... Utf8JsonReader requires a UTF8 string, represented as a I think this issue should do the simple/minimal thing, and just transcode the string you get from DbDataReader into a UTF8 byte array on the heap. This isn't the most efficient thing (although most probably still significantly more efficient than the current DOM implementation), but it would work everywhere. Later, as an optimization we can skip that steps for ADO.NET providers where getting a raw UTF8 byte array is supported (Npgsql allows this; SqlClient might, if not we can raise the request). |
…JsonReader/Utf8JsonWriter Using Utf8JsonReader to read JSON data rather than caching it using DOM. This should reduce allocations significantly. Tricky part is that entity materializers are build in a way that assumes we have random access to all the data we need. This is not the case here. We read JSON data sequentially and can only do it once, and we don't know the order in which we get the data. This is somewhat problematic in case where entity takes argument in the constructor. Those could be at the very end of the JSON string, so we must read all the data before we can instantiate the object, and populate it's properties and do navigation fixup. This requires us reading all the JSON data, store them in local variables, and only when we are done reading we instantiate the entity and populate all the properties with data stored in those variables. This adds some allocations (specifically navigations). We also have to disable de-duplication logic - we can't always safely re-read the JSON string, and definitely can't start reading it from arbitrary position, so now we have to add JSON string for every aggregate projected, even if we already project it's parent. Serialization implementation (i.e. Utf8JsonWriter) is pretty straighforward. Also fix to #30993 - Query/Json: data corruption for tracking queries with nested json entities, then updating nested entities outside EF and re-querying Fix is to recognize and modify shaper in case of tracking query, so that nav expansions are not skipped when parent entity is found in Change Tracker. This is necessary to fix alongside streaming, because now we throw exception from reader (unexpected token) if we don't process the entire stream correctly. Before it would be silently ignored apart from the edge case described in the bug. Fixes #30604 Fixes #30993
…JsonReader/Utf8JsonWriter Using Utf8JsonReader to read JSON data rather than caching it using DOM. This should reduce allocations significantly. Tricky part is that entity materializers are build in a way that assumes we have random access to all the data we need. This is not the case here. We read JSON data sequentially and can only do it once, and we don't know the order in which we get the data. This is somewhat problematic in case where entity takes argument in the constructor. Those could be at the very end of the JSON string, so we must read all the data before we can instantiate the object, and populate it's properties and do navigation fixup. This requires us reading all the JSON data, store them in local variables, and only when we are done reading we instantiate the entity and populate all the properties with data stored in those variables. This adds some allocations (specifically navigations). We also have to disable de-duplication logic - we can't always safely re-read the JSON string, and definitely can't start reading it from arbitrary position, so now we have to add JSON string for every aggregate projected, even if we already project it's parent. Serialization implementation (i.e. Utf8JsonWriter) is pretty straighforward. Also fix to #30993 - Query/Json: data corruption for tracking queries with nested json entities, then updating nested entities outside EF and re-querying Fix is to recognize and modify shaper in case of tracking query, so that nav expansions are not skipped when parent entity is found in Change Tracker. This is necessary to fix alongside streaming, because now we throw exception from reader (unexpected token) if we don't process the entire stream correctly. Before it would be silently ignored apart from the edge case described in the bug. Fixes #30604 Fixes #30993
…JsonReader/Utf8JsonWriter Using Utf8JsonReader to read JSON data rather than caching it using DOM. This should reduce allocations significantly. Tricky part is that entity materializers are build in a way that assumes we have random access to all the data we need. This is not the case here. We read JSON data sequentially and can only do it once, and we don't know the order in which we get the data. This is somewhat problematic in case where entity takes argument in the constructor. Those could be at the very end of the JSON string, so we must read all the data before we can instantiate the object, and populate it's properties and do navigation fixup. This requires us reading all the JSON data, store them in local variables, and only when we are done reading we instantiate the entity and populate all the properties with data stored in those variables. This adds some allocations (specifically navigations). We also have to disable de-duplication logic - we can't always safely re-read the JSON string, and definitely can't start reading it from arbitrary position, so now we have to add JSON string for every aggregate projected, even if we already project it's parent. Serialization implementation (i.e. Utf8JsonWriter) is pretty straighforward. Also fix to #30993 - Query/Json: data corruption for tracking queries with nested json entities, then updating nested entities outside EF and re-querying Fix is to recognize and modify shaper in case of tracking query, so that nav expansions are not skipped when parent entity is found in Change Tracker. This is necessary to fix alongside streaming, because now we throw exception from reader (unexpected token) if we don't process the entire stream correctly. Before it would be silently ignored apart from the edge case described in the bug. Fixes #30604 Fixes #30993
…JsonReader/Utf8JsonWriter Using Utf8JsonReader to read JSON data rather than caching it using DOM. This should reduce allocations significantly. Tricky part is that entity materializers are build in a way that assumes we have random access to all the data we need. This is not the case here. We read JSON data sequentially and can only do it once, and we don't know the order in which we get the data. This is somewhat problematic in case where entity takes argument in the constructor. Those could be at the very end of the JSON string, so we must read all the data before we can instantiate the object, and populate it's properties and do navigation fixup. This requires us reading all the JSON data, store them in local variables, and only when we are done reading we instantiate the entity and populate all the properties with data stored in those variables. This adds some allocations (specifically navigations). We also have to disable de-duplication logic - we can't always safely re-read the JSON string, and definitely can't start reading it from arbitrary position, so now we have to add JSON string for every aggregate projected, even if we already project it's parent. Serialization implementation (i.e. Utf8JsonWriter) is pretty straighforward. Also fix to #30993 - Query/Json: data corruption for tracking queries with nested json entities, then updating nested entities outside EF and re-querying Fix is to recognize and modify shaper in case of tracking query, so that nav expansions are not skipped when parent entity is found in Change Tracker. This is necessary to fix alongside streaming, because now we throw exception from reader (unexpected token) if we don't process the entire stream correctly. Before it would be silently ignored apart from the edge case described in the bug. Fixes #30604 Fixes #30993
…JsonReader/Utf8JsonWriter Using Utf8JsonReader to read JSON data rather than caching it using DOM. This should reduce allocations significantly. Tricky part is that entity materializers are build in a way that assumes we have random access to all the data we need. This is not the case here. We read JSON data sequentially and can only do it once, and we don't know the order in which we get the data. This is somewhat problematic in case where entity takes argument in the constructor. Those could be at the very end of the JSON string, so we must read all the data before we can instantiate the object, and populate it's properties and do navigation fixup. This requires us reading all the JSON data, store them in local variables, and only when we are done reading we instantiate the entity and populate all the properties with data stored in those variables. This adds some allocations (specifically navigations). We also have to disable de-duplication logic - we can't always safely re-read the JSON string, and definitely can't start reading it from arbitrary position, so now we have to add JSON string for every aggregate projected, even if we already project it's parent. Serialization implementation (i.e. Utf8JsonWriter) is pretty straighforward. Also fix to #30993 - Query/Json: data corruption for tracking queries with nested json entities, then updating nested entities outside EF and re-querying Fix is to recognize and modify shaper in case of tracking query, so that nav expansions are not skipped when parent entity is found in Change Tracker. This is necessary to fix alongside streaming, because now we throw exception from reader (unexpected token) if we don't process the entire stream correctly. Before it would be silently ignored apart from the edge case described in the bug. Fixes #30604 Fixes #30993
…JsonReader/Utf8JsonWriter (#31160) Using Utf8JsonReader to read JSON data rather than caching it using DOM. This should reduce allocations significantly. Tricky part is that entity materializers are build in a way that assumes we have random access to all the data we need. This is not the case here. We read JSON data sequentially and can only do it once, and we don't know the order in which we get the data. This is somewhat problematic in case where entity takes argument in the constructor. Those could be at the very end of the JSON string, so we must read all the data before we can instantiate the object, and populate it's properties and do navigation fixup. This requires us reading all the JSON data, store them in local variables, and only when we are done reading we instantiate the entity and populate all the properties with data stored in those variables. This adds some allocations (specifically navigations). We also have to disable de-duplication logic - we can't always safely re-read the JSON string, and definitely can't start reading it from arbitrary position, so now we have to add JSON string for every aggregate projected, even if we already project it's parent. Serialization implementation (i.e. Utf8JsonWriter) is pretty straighforward. Also fix to #30993 - Query/Json: data corruption for tracking queries with nested json entities, then updating nested entities outside EF and re-querying Fix is to recognize and modify shaper in case of tracking query, so that nav expansions are not skipped when parent entity is found in Change Tracker. This is necessary to fix alongside streaming, because now we throw exception from reader (unexpected token) if we don't process the entire stream correctly. Before it would be silently ignored apart from the edge case described in the bug. Fixes #30604 Fixes #30993
We currently do JSON serialization/deserialization by passing through DOM. That is, when deserializing, we get a JsonDocument from S.T.Json, and then construct entity CLR types from that (serialization works similarly in the other direction). Instead, we could simply parse the JSON via Utf8JsonReader (which is what JsonSerializer uses under the hood), and directly materialize entity CLR types from that. This should substantially reduce CPU time and heap allocations.
Note that we could use code generation to compile the JSON materializer, just like we do today for regular materialization from DbDataReader. In fact, Utf8jsonreader and DbDataReader are similar, offering the same kind of low-level access to primitive int/string values.
Note: benchmark to see the perf differences
The text was updated successfully, but these errors were encountered: