Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new System.Net.Http.Json project/namespace #33459

Merged
merged 32 commits into from
Mar 26, 2020
Merged

Conversation

jozkee
Copy link
Member

@jozkee jozkee commented Mar 11, 2020

Fixes #32937

This PR includes the necessary setup to generate the NuGet package for this project.

Once this is in, our CI will start to generate the preview package requested by the Blazor team.
cc @mkArtakMSFT @terrajobst

@jozkee jozkee added this to the 5.0 milestone Mar 11, 2020
@jozkee jozkee requested review from steveharter, layomia, joperezr and a team March 11, 2020 01:24
@jozkee jozkee self-assigned this Mar 11, 2020
@Dotnet-GitSync-Bot
Copy link
Collaborator

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

Copy link
Member Author

@jozkee jozkee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open questions.


if (mediaType != JsonContent.JsonMediaType &&
mediaType != MediaTypeNames.Text.Plain)
{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the spec, we allow application/json and text/plain only.

If Content-Type is null we assume it as "application/octet-stream" in accordance with section 7.2.1 of the HTTP spec.
This is how Formatting API works as well.

And at the same time, it is contrary to how HttpContent.ReadAsStringAsync works (it allows null content-type and tries to read the content using UTF-8 for that case).

IMO, null Content-Type should default to application/json; charset=utf-8.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And at the same time, it is contrary to how HttpContent.ReadAsStringAsync works (it allows null content-type and tries to read the content using UTF-8 for that case).

I know that JSON does have some rules about BOMs (I believe writing them is forbidden, reading them is left up to implementer?). We should make sure we're consistent with that.

Copy link
Member

@rynowak rynowak Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The web is a pretty wild place 👍. There are specs but there are also lots of implementations that don't conform in subtle ways, so this is good advice (and similar to what we do on the server side with JSON).


I would expect this to also support suffix content types. application/<something>+json - https://tools.ietf.org/html/rfc6839

In use by places like github: https://developer.github.com/v3/media/ - it's a newer spec, but lots of more serious and large scale kinds of API design will prefer this over application/json.

We might be able to get away with ignoring text/json which had some usage long ago, but isn't to spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little bit surprised about text/plain. I'm not sure I've seen that in use for JSON.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember why I put it in the spec; I think it was based on feedback of someone because I wouldn't have been smart enough to put either of these mime types in. If you believe text/plain is wrong let's remove it. Do we consider adding additional mime types in the future as a breaking change? IOW, would people commonly catch exceptions can do something different if a given mime type isn't accepted/produced by the server?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on the support for suffix content type. As of now, you are unable to read problem details responses that MVC produces for error responses. It has a application/problem+json content type.

Copy link
Member Author

@jozkee jozkee Mar 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I get from this is that we need to white-list the following:

  • application/json
  • application/<something>+json
  • text/plain

@rynowak @pranavkm Is that correct or is there any other Content-Type you want to add?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds right to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this looks fine. ASP.NET Core has two implementations for calculating media-type subsets:

The first has fewer code dependencies and might be easier to easier to copy to S.T.Json if you'd like to use it as an implementation.

<Import Project="..\Directory.Build.props" />
<PropertyGroup>
<StrongNameKeyId>Microsoft</StrongNameKeyId>
<IsNETCoreApp>true</IsNETCoreApp>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to ship this inbox as well? I thought this was only goin to be OOB. Assuming that is still the case then you’ll have to remove this line

Copy link
Member

@terrajobst terrajobst Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in .NET 5 we probably want this in box (would be my guess) but clearly not for .NET Core 3.1 anymore

@@ -1035,6 +1035,15 @@
"System.Net.HttpListener"
]
},
{
"Name": "System.Net.Http.Json",
"Description": "Provides extension methods for System.Net.Http.HttpClient and System.Net.Http.HttpContent that focus on ease of use of JSON as the data interchange format for HTTP clients (pending approval).",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cc @terrajobst to validate what should go here

Copy link
Member

@terrajobst terrajobst Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say:

Suggested change
"Description": "Provides extension methods for System.Net.Http.HttpClient and System.Net.Http.HttpContent that focus on ease of use of JSON as the data interchange format for HTTP clients (pending approval).",
"Description": "Provides extension methods for System.Net.Http.HttpClient and System.Net.Http.HttpContent that perform automatic serialization and deserialization using System.Text.Json.",

Copy link
Member

@joperezr joperezr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of comments to address, but other than that infrastructure related stuff looks good.

Copy link
Contributor

@scalablecory scalablecory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass done. There are some possible optimization opportunities, but it looks good overall.

throw new NotSupportedException("The provided ContentType is not supported; the supported types are 'application/json' and 'text/plain'.");
}

// Code taken from https://source.dot.net/#System.Net.Http/System/Net/Http/HttpContent.cs,047409be2a4d70a8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to move this code into the Common directory so we can avoid duplicating it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code has semantics of determining the encoding out of the content.
Do we want that to apply here as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this code was removed from implementation, so original comment no longer applies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still there, I just moved it to a separate method called GetEncoding, see here. Sorry for not making that clear.

Compared with the logic used to determine the encoding in HttpContent.ReadAsStringAsync (see here), which is very similar to GetEncoding method in this class.

Do we want to apply the same rules here? More precisely, do we want to use the BOM to determine the encoding when is not present in the response? Is it safe? cc @GrabYourPitchforks

@rynowak has any BOM usage shown up in users requests/responses?

Copy link
Member

@rynowak rynowak Mar 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a problem with the logic in GetEncoding - but it probably does more than we need. There's a few issues at play.

Something I learned when drilling into the question (and would love for @GrabYourPitchforks and/or @Tratcher your weigh in on)...

The RFC for application/json actually mandates encoding sniffing. https://www.ietf.org/rfc/rfc4627.txt (section 3).

JSON text SHALL be encoded in Unicode. The default encoding is
UTF-8.

Since the first two characters of a JSON text will always be ASCII
characters [RFC0020], it is possible to determine whether an octet
stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
at the pattern of nulls in the first four octets.

      00 00 00 xx  UTF-32BE
      00 xx 00 xx  UTF-16BE
      xx 00 00 00  UTF-32LE
      xx 00 xx 00  UTF-16LE
      xx xx xx xx  UTF-8

This seems a little bit terrifying... For the record, server-side ASP.NET Core does not implement this. We use the CharSet to get the encoding, and the fallback to UTF8 without BOM.

There are many things that get written in RFCs and not doing in practice. It's pretty common to see JSON with a CharSet even though it's not to spec. So I guess - my preference would be to just use UTF8 when not otherwise specified.

I don't think we need detection based on BOM because it's not common to have a BOM, and it's not common to have non-UTF8. If users are blocked by this, they can work around it by reading the content to a string first.

I tested out what happens you submit UTF8 text with and without a BOM to an ASP.NET Core server. Today we happily accept the content without BOM.

has any BOM usage shown up in users requests/responses?

I can think of an easy way to hit a case with a BOM:

  • Create a file on disk saved with a UTF8 BOM, call it pokemon.json
  • Set up a webserver to serve that file

Your webserver will likely assign application/json to the response based on the file extension (ASP.NET Core and IIS both do). Your webserver will likely pass through the entire file contents including the BOM.

I don't have any data about usage info - because I'd need to answer for pretty much every server on the 'net.

Server-side ASP.NET Core does not write a BOM when producing JSON - and we tolerate a BOM while reading JSON - so there's little user evidence about whether this is common.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rynowak the latest json RFC (there have been a few updates) states:

JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8 [RFC3629].

Previous specifications of JSON have not required the use of UTF-8 when transmitting JSON text. However, the vast majority of JSON-based software implementations have chosen to use the UTF-8 encoding, to the extent that it is the only encoding that achieves interoperability.

Implementations MUST NOT add a byte order mark (U+FEFF) to the beginning of a networked-transmitted JSON text. In the interests of interoperability, implementations that parse JSON texts MAY ignore the presence of a byte order mark rather than treating it as an error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rynowak I didn't interpret RFC 4627, Sec. 3, to mandate encoding sniffing. It simply says "SHALL be encoded in Unicode." It gives a technique for implementers to sniff the encoding if they desire, but there's no text here precluding an implementation from using a sidecar mechanism (such as the Content-Type header) to determine the proper encoding to use. I think ASP.NET's behavior is fine. :)

Re: byte-order marks, I'm impartial as to whether we support them. In my experience I've not seen web sites specify both a charset and a BOM in the response. Even in the "reading files from disk" scenario, most web servers realize that they're acting as a pass-through and shouldn't make judgment calls about what the proper charset of the content should be.

If we do support byte-order marks, we shouldn't sniff the BOM itself. Instead, we should continue to rely on the incoming charset, stripping out the BOM only if: (a) it matches the selected encoding's BOM; and (b) it occurs at the beginning of the payload.

But honestly I wouldn't add this behavior unless somebody's asking for it. It's just going to complicate matters.

=> Create(value, CreateMediaType(mediaType ?? throw new ArgumentNullException(nameof(mediaType))), options);

public static JsonContent Create<T>(T value, MediaTypeHeaderValue mediaType, JsonSerializerOptions? options = null)
=> new JsonContent(JsonSerializer.SerializeToUtf8Bytes(value, options), typeof(T), value, mediaType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't currently have an HttpContent class with a factory method. That doesn't mean we can't, but I'm interested to hear what people think about a JsonContent<T> class instead.

@stephentoub
Copy link
Member

This hasn't been API reviewed yet, has it?

@stephentoub stephentoub added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Mar 11, 2020
@ericstj
Copy link
Member

ericstj commented Mar 11, 2020

I thought this was API reviewed. I know we've had a couple API review meetings and discussed it. @terrajobst

@stephentoub
Copy link
Member

Immo's comment on that issue suggests it hasn't:
"We did a brief run in a smaller group; at this point we should go ahead with the implementation first to see what issues we encounter and do the API review after with the complete API set and all modifications we made since then."
and the issue is still marked as being a suggestion rather than approved.

<Import Project="$([MSBuild]::GetPathOfFileAbove(Directory.Build.props))" />
<ItemGroup>
<ProjectReference Include="..\ref\System.Net.Http.Json.csproj">
<SupportedFramework>net461;netcoreapp2.0;uap10.0.16299;$(AllXamarinFrameworks)</SupportedFramework>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll still want an asset for .NET 5, right? Some of the base types involved here have new virtual methods in .NET 5 we'll want to override such that when building for / running on .NET 5, things work as well as possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we don't list the latest framework, we list the oldest framework we support and I believe with netcoreapp2.0 should be enough because of the compatibility, but given the new TFM, not sure in this case, @ericstj ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this list is fine as it is. The tfm for the ref assembly targets ns2.0 so this support table is correct. As Santi points out, by specifying nca2.0 here it will ensure to valudate from that version and all the newer ones as well.

}

public static JsonContent Create<T>(T value, JsonSerializerOptions? options = null)
=> Create(value, CreateMediaType(JsonMediaType), options);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is going to box the T if it's a value type. Is that impactful? The API as it's currently defined can't avoid that, but simply tweaks to the API could, such as if the Value property were made virtual.

{
public static System.Threading.Tasks.Task<object?> GetFromJsonAsync(this System.Net.Http.HttpClient client, string requestUri, System.Type type, System.Text.Json.JsonSerializerOptions? options = null, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { throw null; }
public static System.Threading.Tasks.Task<object?> GetFromJsonAsync(this System.Net.Http.HttpClient client, System.Uri requestUri, System.Type type, System.Text.Json.JsonSerializerOptions? options = null, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { throw null; }
public static System.Threading.Tasks.Task<T> GetFromJsonAsync<T>(this System.Net.Http.HttpClient client, string requestUri, System.Text.Json.JsonSerializerOptions? options = null, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { throw null; }
Copy link
Member

@stephentoub stephentoub Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This design means that if I want to supply a cancellation token, I always either need to pass a JsonSerializerOptions or I need to name the argument, e.g. cancellationToken: cancellationToken?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems unfortunate indeed. We could inject an overload that just takes and a non-optional cancellation token.

Copy link
Contributor

@scalablecory scalablecory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only small things. looks good once outstanding comments are addressed.

@jozkee
Copy link
Member Author

jozkee commented Mar 24, 2020

@stephentoub I think I already addressed all of the comments; is there anything else you want me to change?

private static readonly int OverflowBufferSize = Encoding.UTF8.GetMaxByteCount(1); // The most number of bytes used to represent a single UTF char

internal const int MaxByteBufferSize = 4096;
internal const int MaxCharBufferSize = 3 * MaxByteBufferSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did the constant 3 come from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think from this thread dotnet/aspnetcore#8362 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In UTF-8, 1 char worst-case expands to 3 bytes. 1 byte does not expand to 3 chars.

Copy link
Member Author

@jozkee jozkee Mar 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, It does not make sense to expect a char buffer that is 3X the size of byte buffer, should we remove this all together and just use sourceEncoding.GetMaxCharCount(MaxByteBufferSize)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to call GetMaxCharCount if you want. Otherwise you can just re-use MaxByteBufferSize for simplicity. Either solution should work.

if (_charBuffer.Count == 0)
{
int bytesRead = await ReadInputChars(cancellationToken).ConfigureAwait(false);
shouldFlushEncoder = bytesRead == 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this flag is being set correctly. It's possible that if ReadInputChars returns 0, both _byteBuffer and _charBuffer are non-empty. This means that there's still data remaining to be decoded in the source buffer, so we shouldn't set flush = true on the encoder.

This goes back to my question at the top of this file - where did the constants come from? I'm having trouble reasoning about how those affect the logic in ReadInputChars.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, we should flush when no more bytes are read from the underlying stream and _byteBuffer.Count == 0

// If the destination buffer is smaller than GetMaxByteCount(1), we avoid encoding to the destination and we use the overflow buffer instead.
if (readBuffer.Count > OverflowBufferSize || _charBuffer.Count == 0)
{
_encoder.Convert(_charBuffer.Array!, _charBuffer.Offset, _charBuffer.Count, readBuffer.Array!, readBuffer.Offset, readBuffer.Count,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could fail if _charBuffer is empty, _encoder contains some leftover state from the previous call, flush = true, and readBuffer is too small to contain the transcoded result. At that point the Convert routine won't be able to make any forward progress at all, and it will throw.

The fix would be to make sure that the destination buffer to which Convert writes is always large enough to make at least some forward progress. I think using your overflow buffer would work well here.

internal sealed class TranscodingWriteStream : Stream
{
internal const int MaxCharBufferSize = 4096;
internal const int MaxByteBufferSize = 4 * MaxCharBufferSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did the constant 4 come from?

Copy link
Member Author

@jozkee jozkee Mar 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This decision comes from dotnet/aspnetcore#12142 (comment).
EDIT: Disregard this comment, I didn't notice that the linked comment is talking about the OverflowBuffer, not the MaxByteBufferSize.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should at minimum be a comment explaining why the constant was chosen. Keep in mind that the max chars-to-bytes expansion for UTF-8 is 3 (not 4) bytes per char. But clearly we're not transcoding to UTF-8 (since the input was already UTF-8), so using any UTF-8 specific constant would not be correct.

It's perfectly ok if we say something like "this is a reasonable guess for a worst-case expansion for a typical encoding". But IMO there needs to be a comment that says something. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
Added the following comment:

Upper bound that limits the byte buffer size to prevent an encoding that has a very poor worst-case scenario.

Now that makes me question if we should limit the char buffer on the Read side for the same reason.

_charsDecoded += charsDecoded;
bufferSegment = bufferSegment.Slice(bytesDecoded);

if (!decoderCompleted)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to put this inside an if block. The decoder will always make as much progress as possible, so it's optimal to call WriteBufferAsync unconditionally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand what the original author was trying to achieve by skipping WriteBufferAsync on the last decoding but I assume that if we remove this condition, we no longer need to call WriteBufferAsync on FinalWriteAsync.

cc @pranavkm

@jozkee jozkee removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Mar 25, 2020
Copy link
Member

@GrabYourPitchforks GrabYourPitchforks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All my remaining comments were nits / code cleanup. The logic LGTM. Thanks for driving this! :)

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a bunch of additional comments, but otherwise assuming .NET 5-specific work will be done subsequently (e.g. all of the overrides we've discussed), LGTM.

@jozkee
Copy link
Member Author

jozkee commented Mar 26, 2020

CI issues seem to be unrelated.

@jozkee
Copy link
Member Author

jozkee commented Mar 26, 2020

I will proceed to merge this.

Next steps:

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add extension methods for HttpClient that allow serialization from/to JSON