Add Utf8JsonWriter along with unit tests #34425

ahsonkhan · 2019-01-08T09:32:06Z

Implements the priority-0 APIs proposed in https://github.com/dotnet/corefx/issues/33552

~90% line and branch coverage

TODO:

Improve test coverage (including using PipeWriter and custom IBufferWriter implementations that have different semantics (such as FixedSizedBufferWriter)). Issue to improve code coverage: https://github.com/dotnet/corefx/issues/34570
Deferring the WriteNumberArray and WriteStringArray "accelerator" APIs to the future (i.e. outside of this PR) as they are not crucial and mainly there for convenience/performance.
Deferring writing directly to a span for now since we need a mechanism to grow the destination span if there is not enough space.

cc @joshfree, @KrzysztofCwalina, @bartonjs, @steveharter, @stephentoub, @GrabYourPitchforks, @davidfowl, @tannergooding, @Tornhoof

consistency.

callers behalf instead of throwing.

ahsonkhan · 2019-01-08T09:33:13Z

src/System.Text.Json/src/System/Text/Json/JsonConstants.cs

+
+        public const int MaximumInt64Length = 20;   // 19 + sign (i.e. -9223372036854775808)
+        public const int MaximumUInt64Length = 20;  // i.e. 18446744073709551615
+        public const int MaximumDoubleLength = 128;  // default (i.e. 'G'), using 128 (rather than say 32) to be future-proof.


@tannergooding, I set these higher than necessary (i.e. 128) as per our discussion. Please review.

Is this used for parsing or formatting?

For formatting, you only need 17 digits to guarantee roundtripping (unless the user asks for more). For parsing, you may have to track up to 768 significant digits in order to return the correctly rounded result.

Is this used for parsing or formatting?

This is for formatting only. I will add a comment (and update the names).

What should be the max required length for formatting a double using the Utf8Formatter by default, i.e. when the user doesn't pass a StandardFormat? Should it be 17 (i.e. the smallest amount that guarantees round tripping) where the implementation of Utf8Formatter is fixed to behave accordingly (breaking change?)? The Utf8JsonWriter, by default, calls the default implementation of Utf8Formatter (whatever default it sets for different .NET types). I want to make sure the max value I am using here will work for that case. Should I therefore keep it at 128? If we can guarantee 17, then it would be great if I can reduce the MaximumDoubleLength down.
OR
Do you think the Utf8JsonWriter should be formatting the JSON text in a round-trippable manner by default, anyway, regardless of the default behavior of Utf8JsonWriter? Then I could set it to something like 17, but it would mean the writer becomes slower.

src/System.Text.Json/src/System/Text/Json/Writer/JsonWriterHelper.cs

ahsonkhan · 2019-01-08T09:36:21Z

src/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.cs

+                // would result in an inconsistent state. Therefore, calling Flush before getting the current state.
+                if (_buffered != 0)
+                {
+                    Flush();


Should we Flush here on the user's behalf or change this to a method and throw an exception?

@KrzysztofCwalina earlier was talking about Flush vs FlushAsync and such. If the required usage is to call Flush(Async) before being able to snap the state, it seems like a method is the way to go. It also stops a heisen-bug of the debugger ends up calling Flush for you every time you hit F10.

We end up losing parity with the Utf8JsonReader.CurrentState property if we change it to a method, which is why I was flushing for the caller. Are we fine with it diverging here by making it a method that throws?

There's still time to change the reader version to a method; too. But the property having a side effect of calling Flush seems to me to violate property rules; so this one needs to change.

Properties behaving like properties is more important than "property on the reader => property on the writer".

src/System.Text.Json/src/System/Text/Json/Writer/JsonWriterHelper.cs

src/System.Runtime.Serialization.Formatters/tests/BinaryFormatterTestData.cs

Tornhoof · 2019-01-08T09:44:58Z

src/System.Text.Json/src/System/Text/Json/Writer/JsonWriterHelper.cs

+        /// <param name="bytesConsumed">On exit, contains the number of bytes that were consumed from the <paramref name="source"/>.</param>
+        /// <param name="bytesWritten">On exit, contains the number of bytes written to <paramref name="destination"/></param>
+        /// <returns>A <see cref="OperationStatus"/> value representing the state of the conversion.</returns>
+        public unsafe static OperationStatus ToUtf8(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesConsumed, out int bytesWritten)


Is this method really necessary compared to the normal Encoder methods? If yes, maybe move it to an extra file? This is a very specific implementation, with important comments and should not be changed individually without good cause, putting it into a different file might help to make sure this is never touched.

Nit: source => utf16source, destination => utf8destination, or something like that. In 99.999% of the rest of corefx, ReadOnlySpan<byte> in the context of text means UTF8, so the fact that the source here is UTF16 is surprising.

But I also agree with @Tornhoof: it'd be really good if this could use the existing UTF8 encoder in some way.

This should really be done as code reuse (API or shared source), not code copying.

This should really be done as code reuse (API or shared source), not code copying.

I agree but this is a point-in-time concern.
I brought in this API (along with the supporting helpers) from corefxlab (https://github.com/dotnet/corefxlab/blob/0abb70efee949cff78a6b485f022dcd2e7896644/src/System.Text.Primitives/System/Text/Encoders/Utf16.cs#L108) to remove the dependency on https://github.com/dotnet/corefx/issues/34094. Ideally, once the Utf8 class is made public, we would replace this implementation (along with having to convert char -> byte while keeping it as UTF-16). I need the operation status based transcoding.
https://github.com/dotnet/corefxlab/blob/0abb70efee949cff78a6b485f022dcd2e7896644/src/System.Text.Primitives/System/Text/Encoders/Utf16.cs#L108

I will add a comment to replace this in the (near?) future.

I need the operation status based transcoding.

Per #34425 (comment), do we really?

do we really?

I need to be able to do partial writes to the destination buffer since it is possible that the destination is too small for the given input.

I can't do something like the following since it would throw if _buffer was too small.

int written = Encoding.UTF8.GetBytes(escapedValue, _buffer.Slice(idx));

I would have to use the pointer overload and calculate how many characters will fit into the leftover destination buffer, up front.

Maybe something like:

fixed (char* charPtr = temp1) { fixed (byte* bytePtr = destination) { int written = Encoding.UTF8.GetBytes(charPtr, destination.Length / 3, bytePtr, destination.Length); } }

these implementations have diverged with all the intrinsic work put into corelib version and comments are outdated https://github.com/dotnet/runtime/blob/eeef4b14c5a137c2f665a9be4f30e4383d1038da/src/libraries/System.Text.Json/src/System/Text/Json/Writer/JsonWriterHelper.Transcoding.cs#L28

@eiriktsarpalis can it be replaced now or is it permanent?
btw OperationStatus can be replaced with value tuple throughout this project (even for .net framework).

Not sure, could you create a new issue please?

dotnet/runtime#75779

src/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteProperties.DateTime.cs

src/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteProperties.String.cs

Tornhoof

All the specific Write partial classes for the individual supported types, are those auto-generated or manually created?

src/System.Text.Json/src/System/Text/Json/Writer/UnicodeScalar.cs

src/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteProperties.DateTime.cs

src/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteProperties.Guid.cs

src/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteProperties.Helpers.cs

ahsonkhan · 2019-01-08T10:45:04Z

src/System.Text.Json/src/System/Text/Json/Writer/JsonWriterHelper.Escaping.cs

+        // and exclude characters that need to be escaped by adding a backslash: '\n', '\r', '\t', '\\', '/', '\b', '\f'
+        //
+        // non-zero = allowed, 0 = disallowed
+        private static ReadOnlySpan<byte> AllowList => new byte[byte.MaxValue + 1] {


@GrabYourPitchforks, please review the escaping logic here. Note that this is a temporary workaround and we should consider replacing this implementation with a publicly shipping API once we have something suitable.

jkotas · 2019-01-08T12:15:34Z

src/System.Text.Json/src/System/Text/Json/Writer/JsonWriterHelper.cs

+        }
+
+        [MethodImpl(MethodImplOptions.AggressiveInlining)]
+        public static void ValidateProperty(ref ReadOnlySpan<byte> propertyName)


There are a lot of places that pass Span as ref. It makes the code very hard to reason about because of you can never tell whether the methods change the argument. Could you please change the Spans to be passed as in where the value is not changed by the caller and passing by reference is just used as an optimization?

and passing by reference is just used as an optimization?

Is it actually a meaningful optimization? We don't do it in pretty much any other place we pass around spans, and if it was meaningful, I'd expect it to be something the JIT should do instead of forcing the developer.

I had the thought that in was probably meant, too; then wondered if it was because the compiler makes defensive copies when Slice is called, making in an anti-optimization.

Is it actually a meaningful optimization?

In certain cases, it improves performance by 10-20% (in or ref rather than pass by value)

In certain cases, it improves performance by 10-20% (in or ref rather than pass by value)

This call site makes a 10-20% difference? On what test? You can prove me wrong, but I find that really hard to believe. And if it's true, to me that suggests a codegen problem that should be fixed rather than contorting our code.

I could believe that there are a few methods where for one reason or another the extra word argument pushes it over some threshold in some optimization, maybe some argument that could have been passed in a register is now passed on the stack, or some such thing. But I'm very skeptical that it then translates into benefiting from passing around all of these by ref or in.

I don't like this practice for a few reasons:

It causes complications when trying to do things like stackalloc spans due to compiler errors when passing around multiple ref structs, some by ref. And in many cases, the compiler is flagging very real problems; as Jeremy highlighted, the hack to use unsafe to stackalloc a pointer and create a span from it in order to silence the compiler error actually resulted in leaking garbage pointers up the stack.

We do not believe customers should have to write code like this, and applying such patterns across a codebase (and one that we're distributing in a source package no less) will suggest to folks "this is how you write code with spans", when it's not actually what we recommend.

If it's really, truly necessary to get good performance, then that suggests we have much deeper rooted issues that should be addressed at their core rather than working around it in this way.

I still believe it is in our collective best interest to pass small structs

Follow up question:
Should we pass by value even if the span is being passed around in a deep call graph?

For example, something like this:

WriteStringValue(ReadOnlySpan<char> value, bool suppressEscaping = false) => WriteStringSuppressFalse(in ReadOnlySpan<char> value) => WriteStringByOptions(in ReadOnlySpan<char> value) => WriteStringMinimized(in ReadOnlySpan<char> escapedValue) => WriteStringValue(in ReadOnlySpan<char> escapedValue, ref int idx)

Similarly, when we have 2 spans as arguments?

This is example of callstack where it would be ok to use some amount of AggresiveInlining to reduce its depth.

For example, WriteStringSuppressFalse has just a single callsite, so it would ok to inline it into the caller - either using AggresiveInlining; or by just not having the separate method at all and just inlining the code in manually.

so it would ok to inline it into the caller - either using AggresiveInlining; or by just not having the separate method at all and just inlining the code in manually.

Makes sense. I don't want to remove the separate method since it is easier to read what the code is doing when its split up into smaller methods. I will keep # of callsites in mind and mark methods as AI if they have a single reference (and I notice a performance improvement when doing so).

https://github.com/dotnet/corefx/issues/34568

In general, yes.

Whether in actually helps depends on a lot of factors: what each level does with the span, the relative frequencies of operations on the span, the impact of other code in each method, whether the intermediate layers get inlined, the target ABI, and so on. And the perf characteristics will change over time as the jit improves handling of structs in general.

If -- as your example implies -- you simply pass the span down to a single callee then in and by-value (implicit by-ref, on windows) will generate identical code on Windows x64 as the jit can recognize that in the apparent by-value case no copy is needed.

src/System.Text.Json/src/System/Text/Json/Writer/JsonWriterHelper.cs

src/System.Text.Json/src/System/Text/Json/Writer/UnicodeScalar.cs

ahsonkhan · 2019-01-12T11:49:55Z

Any other feedback? I would like to merge this as soon as possible, unless there is blocking feedback. I have addressed (and resolved) all the actionable feedback that's within the scope of this PR.

At the very minimum, sending a stackalloc span up to a caller via ref-overwrite needs to be fixed. Making sure there's a red X for that

@bartonjs, I am no longer passing spans by ref, so this issue has been resolved.

ahsonkhan · 2019-01-14T02:17:20Z

@dotnet-bot test Windows x64 Debug Build

src/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteProperties.DateTime.cs

…nWriter

separate file.

src/System.Text.Json/ref/System.Text.Json.cs

bartonjs

LGTM, preferably with ReadOnlySpan<byte> method parameters (things in the input role that ideally would have been char8/utf8string) given the utf8 prefix as recently discussed :smile;

name.

invariant.

ahsonkhan · 2019-01-15T08:03:42Z

@dotnet-bot test Windows x64 Debug Build
@dotnet-bot test UWP NETNative x86 Release Build

ahsonkhan · 2019-01-15T21:37:06Z

@dotnet-bot test Windows x64 Debug Build

* Move JsonReader related files to the Reader directory. * Update S.T.Json ref to include new JsonWriter APIs. * Port initial JsonWriter files from corefxlab. * Auto-generate System.Text.Json ref to update the ordering for consistency. * Fixed some TODOs, delete dead code, and formatting cleanup. * Make use of self-descriptive constants where possible. * Fix leftover TODOs and update throw helper/exception messages. * Add xml comments/docs to the public surface area. * Add JsonWriter unit tests. * Change GetCurrentState back to a property and explicitly Flush on the callers behalf instead of throwing. * Save current depth as part of state and update tests. * Fix constant names, account for quotes in bytes needed, and add fixed buffer writer tests. * Fix inconsistent use of braces by adding them for single line ifs * Update parameter name to exclude encoding. * Remove JsonWriterException and use InvalidOperationException instead. * Use Rune and Utf8Formatter/TryFormat in more places and remove UnicodeScalar. * Fix nits, typos, and reorder field assignment and method calls. * Pass spans by in (or by value) instead of by ref. * Update comments and remove unnecessary test. * Remove some aggressive inlining and pass spans by value rather than in * Update comments, dont compute bytes needed, and use if instead of loop before advancing. * Reduce code bloat by removing duplciate calls to ValidateX. * Add details on how .NET types are formatted to comments. * Reduce code duplication when writing values. * Change the StandardFormat used for DateTime(Offset) to 'O' * Refactor calculating the maximum escaped length to a helper. * Remove unnecessary checks and rename locals to be more descriptive. * Rename suppressEscaping to escape and flip default from false to true. * Comment cleanup, add debug.asserts, and move transcoding helpers to a separate file. * Increase the deicmal max size to account for sign and add tests. * Rename ROS<byte> property name and value params to include utf8 in the name. * Remove redundant code where idx is set to 0 unnecessarily. * Remove dead code (dont escape forward slash) and make tests culture invariant. Commit migrated from dotnet/corefx@f84927d

ahsonkhan added 10 commits January 7, 2019 02:12

Move JsonReader related files to the Reader directory.

aeb2103

Update S.T.Json ref to include new JsonWriter APIs.

76a459b

Port initial JsonWriter files from corefxlab.

f1eae6f

Auto-generate System.Text.Json ref to update the ordering for

0e1a283

consistency.

Fixed some TODOs, delete dead code, and formatting cleanup.

fd8bd6d

Make use of self-descriptive constants where possible.

39fc64d

Fix leftover TODOs and update throw helper/exception messages.

41b7b2c

Add xml comments/docs to the public surface area.

7988444

Add JsonWriter unit tests.

8276f4c

Change GetCurrentState back to a property and explicitly Flush on the

9bec478

callers behalf instead of throwing.

ahsonkhan added the area-System.Text.Json label Jan 8, 2019

ahsonkhan self-assigned this Jan 8, 2019

ahsonkhan requested review from stephentoub, steveharter and bartonjs January 8, 2019 09:32

ahsonkhan commented Jan 8, 2019

View reviewed changes

src/System.Text.Json/src/System/Text/Json/Writer/JsonWriterHelper.cs Outdated Show resolved Hide resolved

ahsonkhan commented Jan 8, 2019

View reviewed changes

Tornhoof reviewed Jan 8, 2019

View reviewed changes

src/System.Text.Json/src/System/Text/Json/Writer/JsonWriterHelper.cs Outdated Show resolved Hide resolved

ahsonkhan commented Jan 8, 2019

View reviewed changes

src/System.Runtime.Serialization.Formatters/tests/BinaryFormatterTestData.cs Outdated Show resolved Hide resolved

Tornhoof reviewed Jan 8, 2019

View reviewed changes

src/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteProperties.DateTime.cs Outdated Show resolved Hide resolved

Tornhoof reviewed Jan 8, 2019

View reviewed changes

src/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteProperties.String.cs Outdated Show resolved Hide resolved

Save current depth as part of state and update tests.

5385574

Tornhoof reviewed Jan 8, 2019

View reviewed changes

ahsonkhan commented Jan 8, 2019

View reviewed changes

jkotas reviewed Jan 8, 2019

View reviewed changes

src/System.Text.Json/src/System/Text/Json/Writer/JsonWriterHelper.cs Outdated Show resolved Hide resolved

jkotas reviewed Jan 8, 2019

View reviewed changes

src/System.Text.Json/src/System/Text/Json/Writer/JsonWriterHelper.cs Outdated Show resolved Hide resolved

jkotas reviewed Jan 8, 2019

View reviewed changes

src/System.Text.Json/src/System/Text/Json/Writer/UnicodeScalar.cs Outdated Show resolved Hide resolved

ahsonkhan added 6 commits January 11, 2019 22:12

Reduce code bloat by removing duplciate calls to ValidateX.

644118c

Add details on how .NET types are formatted to comments.

baa4931

Reduce code duplication when writing values.

2b91e6c

Change the StandardFormat used for DateTime(Offset) to 'O'

6c43954

Refactor calculating the maximum escaped length to a helper.

eb33db6

Remove unnecessary checks and rename locals to be more descriptive.

171f6c3

Rename suppressEscaping to escape and flip default from false to true.

a14a57e

bartonjs reviewed Jan 14, 2019

View reviewed changes

src/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteProperties.DateTime.cs Outdated Show resolved Hide resolved

bartonjs reviewed Jan 15, 2019

View reviewed changes

src/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteProperties.DateTime.cs Outdated Show resolved Hide resolved

ahsonkhan added 2 commits January 14, 2019 17:22

Merge branch 'master' of https://github.com/dotnet/corefx into AddJso…

712502b

…nWriter

Comment cleanup, add debug.asserts, and move transcoding helpers to a

3bc364c

separate file.

bartonjs reviewed Jan 15, 2019

View reviewed changes

src/System.Text.Json/ref/System.Text.Json.cs Outdated Show resolved Hide resolved

bartonjs approved these changes Jan 15, 2019

View reviewed changes

ahsonkhan added 4 commits January 14, 2019 18:21

Increase the deicmal max size to account for sign and add tests.

56382db

Rename ROS<byte> property name and value params to include utf8 in the

2900bc5

name.

Remove redundant code where idx is set to 0 unnecessarily.

6f41164

Remove dead code (dont escape forward slash) and make tests culture

c3c83e4

invariant.

ahsonkhan merged commit f84927d into dotnet:master Jan 15, 2019

ahsonkhan deleted the AddJsonWriter branch January 15, 2019 22:42

ahsonkhan mentioned this pull request Jan 23, 2019

Productize JsonReader and JsonWriter dotnet/corefxlab#2167

Closed

karelz added this to the 3.0 milestone Mar 18, 2019

kasperk81 mentioned this pull request Sep 16, 2022

JsonWriterHelper.Transcoding.cs replacement dotnet/runtime#75779

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Utf8JsonWriter along with unit tests #34425

Add Utf8JsonWriter along with unit tests #34425

ahsonkhan commented Jan 8, 2019 •

edited

Loading

ahsonkhan Jan 8, 2019

tannergooding Jan 8, 2019

ahsonkhan Jan 9, 2019

ahsonkhan Jan 8, 2019

bartonjs Jan 8, 2019

ahsonkhan Jan 9, 2019 •

edited

Loading

bartonjs Jan 12, 2019

Tornhoof Jan 8, 2019

stephentoub Jan 8, 2019 •

edited

Loading

stephentoub Jan 8, 2019

bartonjs Jan 8, 2019

ahsonkhan Jan 9, 2019

stephentoub Jan 9, 2019

ahsonkhan Jan 12, 2019 •

edited

Loading

kasperk81 Sep 16, 2022

eiriktsarpalis Sep 16, 2022

kasperk81 Sep 16, 2022

Tornhoof left a comment

ahsonkhan Jan 8, 2019

jkotas Jan 8, 2019

stephentoub Jan 8, 2019

bartonjs Jan 8, 2019

ahsonkhan Jan 12, 2019

stephentoub Jan 12, 2019 •

edited

Loading

ahsonkhan Jan 12, 2019 •

edited

Loading

jkotas Jan 12, 2019

ahsonkhan Jan 12, 2019

ahsonkhan Jan 12, 2019

AndyAyersMS Jan 14, 2019

ahsonkhan commented Jan 12, 2019 •

edited

Loading

ahsonkhan commented Jan 14, 2019

bartonjs left a comment

ahsonkhan commented Jan 15, 2019

ahsonkhan commented Jan 15, 2019

Add Utf8JsonWriter along with unit tests #34425

Add Utf8JsonWriter along with unit tests #34425

Conversation

ahsonkhan commented Jan 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahsonkhan Jan 9, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephentoub Jan 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahsonkhan Jan 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Tornhoof left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephentoub Jan 12, 2019 • edited Loading

Choose a reason for hiding this comment

ahsonkhan Jan 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahsonkhan commented Jan 12, 2019 • edited Loading

ahsonkhan commented Jan 14, 2019

bartonjs left a comment

Choose a reason for hiding this comment

ahsonkhan commented Jan 15, 2019

ahsonkhan commented Jan 15, 2019

ahsonkhan commented Jan 8, 2019 •

edited

Loading

ahsonkhan Jan 9, 2019 •

edited

Loading

stephentoub Jan 8, 2019 •

edited

Loading

ahsonkhan Jan 12, 2019 •

edited

Loading

stephentoub Jan 12, 2019 •

edited

Loading

ahsonkhan Jan 12, 2019 •

edited

Loading

ahsonkhan commented Jan 12, 2019 •

edited

Loading