-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Passing a Stream Utf8JsonWriter to a JsonSerializer.Serialize method results in Pending bytes being misreported #66102
Comments
Tagging subscribers to this area: @dotnet/area-system-text-json Issue DetailsDescriptionWhen using JsonSerializer to serialize to a stream, if more than one custom converter is in play we are seeing that the internal buffer is not able to be written to the stream until the entirety of the object has been serialized to the buffer. This can result in significant memory usage and high allocations in ArrayPool.Rent and Resize called from PooledByteBufferWriter. It seems to me like the serialization state/context gets trapped when a second custom converter is in play both sync and async -- and we lose the ability to write to the stream and clear the buffer until the entire object has been serialized. This can be extremely problematic when serializing large objects with custom converters. If we had a 5gb object in memory, we'd effectively need a similarly large buffer just to hold the serialized representation of the object. I am aware of this issue here -- and understand that a workaround can be to return IAsyncEnumerable in an async context... but not much info provided for sync. My question is: in a sync context, is there a recommended approach to buffer/stream outside of what is posted on the docs? The docs recommend creating a writer for the stream and then passing the writer to the serializer -- this is also not ideal and can cause extremely high CPU if flushing after serializing using a custom converter. I'm currently playing around with wrapping the stream with a MS's BCL PooledBufferedStream, but would really appreciate any recommendations from your side to remedy these perf issues. ConfigurationHere is a sample where you can observe the buffer being filled and not cleared/flushed until the entire payload is serialized to the buffer: ` const string fileName = "testFile.json"; var fileStream = File.Create(fileName); string RandomString(int length) IEnumerable CreateManyTestObjects() class Parent class Child class ParentConverter : JsonConverter
} class ChildConverter : JsonConverter
}
|
Actually this seems like a bug with using System;
using System.Collections.Generic;
using System.Text.Json;
using System.Text.Json.Serialization;
var options = new JsonSerializerOptions { DefaultBufferSize = 512 };
options.Converters.Add(new MyStringConverter());
using var stream = Console.OpenStandardOutput();
JsonSerializer.Serialize(stream, CreateManyTestObjects(), options);
//await JsonSerializer.Serialize(stream, CreateManyTestObjects(), options); // issue also impacts async method
IEnumerable<string> CreateManyTestObjects()
{
int i = 0;
while (true)
{
if (++i % 10_000 == 0)
{
System.Threading.Thread.Sleep(500);
}
yield return "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
}
}
class MyStringConverter : JsonConverter<string>
{
public override string Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options) =>
throw new NotImplementedException();
public override void Write(Utf8JsonWriter writer, string value, JsonSerializerOptions options)
{
writer.WriteStringValue(value);
// JsonSerializer.Serialize(writer, value); // replace with this line to trigger bug
}
} When passing the
|
Just to add for anyone who might follow this -- a large enough property could cause an OutOfMemoryException when trying to resize the buffer from Utf8JsonWriter.Grow As noted in my initial report, creating a BufferedStream implementation to flush the writer could be a workaround to prevent OOM. So instead of serializing directly to the stream, we would construct a Utf8JsonWriter of the bufferedstream and pass the writer to the serializer instead of the stream. @eiriktsarpalis -- we're looking to Microsoft here to recommend a workaround or possible move up the bug fix timeline? |
We will try to get this fixed for .NET 7, but it doesn't meet the bar for .NET 6 servicing. |
@eiriktsarpalis would this issue also be present in an async context? We noticed your comment here and are hoping that these are two separate issue fixes. It's important to stress here that in both sync and async contexts we can balloon a buffer enough to cause an out of memory crash. |
Hi @mattchidley, I believe the two issues are unrelated. |
custom contract resolver should help since you should be able to avoid many of custom resolver usages |
I'm a little disgruntled that this case was pushed out to 8.0: #63795 @krwq can you advise on how you think custom resolver usages would help us avoid the memory buffering issues for custom conversion? Say we have an IAsyncEnumerable of MyType which has a string property and requires custom conversion, if we want to serialize a large string we will balloon a buffer to do so. I'm surprised this isn't higher priority considering the performance implications and the risk of application crashes. |
It depends. What use case made you write a custom converter in the first place? It might be the case that you could change the contract instead (which would still use the streaming built-in converters). |
@eiriktsarpalis thanks, just reading a bit more.. I might be able to replace our troublesome custom converter with a contract resolver |
Here are the official docs in case they haven't been shared in this thread already: https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json/custom-contracts |
@eiriktsarpalis I don't think you understand the problem fully. I took another look at your repro above and it does not wholly represent the problem... I think there is another issue here: One large property must be entirely buffered before it can be flushed. Here is your repro but adapted:
In this case we fill and resize the buffer in JsonSerializer.Write.Stream.cs WriteStreamAsync with as many characters as we can possibly generate, all the way up to OutOfMemoryException. Please note there are no custom converters/JsonSerializerOptions. Am I missing something here?? We came up with a custom solution which works well for us in a sync context, but async we cannot clear the buffer and flush the stream because of how your internal buffer management works. WriteCore will never call a flush because it's writing to the internal buffer. This is causing some significant issues. IMO async serialization in System.Text.Json is completely broken because of this, how can you expect people to use this?? |
@mattchidley I ran your latest repro code (with a tiny tweak to write to a null stream instead of stdout) and got an out of memory exception, so +1 to that. But if I make one additional simple change (a call to using System.Text.Json;
var options = new JsonSerializerOptions { };
Random random = new Random();
using var stream = Stream.Null;
await JsonSerializer.SerializeAsync(stream, CreateManyTestObjectsAsync(), options);
async IAsyncEnumerable<Child> CreateManyTestObjectsAsync()
{
while (true)
{
await Task.Yield();
yield return new Child { Prop = RandomString(50_000_000) };
}
}
string RandomString(int length)
{
const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
class Child
{
public string Prop { get; set; }
} I'm not claiming this is a fix or that you should use |
@mattchidley I think what you're seeing is a result of
Surely that's exaggeration. I get that it can cause issues when you need to serialize strings whose lengths are in the order of tens of millions, but if anything that's a niche concern. In any case #67337 is tracking a fix for that. |
I appreciate the prompt responses, espcially considering the time of year -- thanks for that. Based on the exchange we've had so far it seems like Microsoft doesn't think this is a problem and is not treating it like a priority. My case as well as the one you noted above were both submitted in Mar 2022. Some questions:
I'm not exaggerating. There are some serious implications to these issues that are going to cost lots of time and effort to work around/back out... we were sold this grand idea of System.Text.Json but the more we use of it the more problems we encounter. |
I'm sorry if you think that's the case. We do take all reported issues seriously, but please do keep in mind that our backlog is substantial so we need to prioritize our efforts according to impact and business needs. This means that certain issues might remain unresolved for longer than otherwise desired.
The buffering issue that you reported concerns large strings specifically, not large objects in general. My personal opinion is far from what somebody might consider "official", but yes, I do think that serializing strings containing tens of millions of characters isn't what one might call a common scenario. That doesn't mean we're not acknowledging the problem though, which is why we're still tracking #67337.
I would recommend avoiding use of very large strings until the issue is resolved. This can be done either by chunking your data or extracting the raw data into a separate resource.
Rome wasn't built in one day.. STJ is a young library and Microsoft is invested in continually improving it. |
@mattchidley What workaround did you use in your sync code? |
@olgolo I described it briefly above -- but you could implement Stream and Flush during Write after a certain number of Bytes written. You could then pass this new Stream to the Utf8JsonWriter which gets used for serialization. This would then give you the opportunity to flush and clear while writing. |
Can confirm that the original issue as described in #66102 (comment) still occurs in .NET 7. This is a serious problem that can result in unexpected memory leaks and we should try to fix it. For issues pertaining to large string serialization I would recommend upvoting and contributing to the discussion in #67337. For streaming support in user-defined converters I would recommend upvoting and contributing to the discussion in #63795. |
Semi-accidentally fixed this in #102541 |
Description
When using JsonSerializer to serialize to a stream, if more than one custom converter is in play we are seeing that the internal buffer is not able to be written to the stream until the entirety of the object has been serialized to the buffer. This can result in significant memory usage and high allocations in ArrayPool.Rent and Resize called from PooledByteBufferWriter. It seems to me like the serialization state/context gets trapped when a second custom converter is in play both sync and async -- and we lose the ability to write to the stream and clear the buffer until the entire object has been serialized. This can be extremely problematic when serializing large objects with custom converters. If we had a 5gb object in memory, we'd effectively need a similarly large buffer just to hold the serialized representation of the object.
I am aware of this issue here -- and understand that a workaround can be to return IAsyncEnumerable in an async context... but not much info provided for sync.
My question is: in a sync context, is there a recommended approach to buffer/stream outside of what is posted on the docs? The docs recommend creating a writer for the stream and then passing the writer to the serializer -- this is also not ideal and can cause extremely high CPU if flushing after serializing using a custom converter, especially if there's a large result with many properties requiring custom conversion. I'm currently playing around with wrapping the stream with a MS's BCL BufferedStream, but would really appreciate any recommendations from your side to remedy these perf issues.
Configuration
Here is a sample where you can observe the buffer being filled and not cleared/flushed until the entire payload is serialized to the buffer:
The text was updated successfully, but these errors were encountered: