Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

[release/3.0] Fix BytesConsumed and error messages when reading JSON payloads within a multi-segment ReadOnlySequence via Utf8JsonReader #40422

Merged
merged 2 commits into from
Aug 20, 2019

Conversation

ahsonkhan
Copy link
Member

@ahsonkhan ahsonkhan commented Aug 19, 2019

Ports #40303 and #40349 to 3.0

Addresses https://github.com/dotnet/corefx/issues/39974 for valid JSON and invalid JSON (where exception message is consistent/accurate)

cc @steveharter, @ericstj, @danmosemsft, @ahsonkhan @eerhardt, @Anipik, @wtgodbe, @bartonjs, @stephentoub, @GSPP

Description

For processing valid JSON input:
We missed updating the bytes consumed in one instance when parsing numbers within JSON that is contained within a multi-segment ReadOnlySequence<byte>. Updating other instances of consumed to be updated correctly as well. Additionally, we are now consistently recovering the necessary reader state when the user passes in incomplete payload so they can continue with more data on subsequent reads.

Also fix setting up the initial positions during the ctor when the first segment happens to be empty.

For processing invalid JSON input:
There were a few places where the exception message and the values we returned as part of usability/diagnostics was inconsistent (or incorrect) when the user passed-in multi-segment data. Also, even though we don't provide guarantees on the reader state being recoverable after an error, certain properties like Line Number and Position In Line are still useful. This change also makes sure to avoid incorrectly increasing line number when seeing escaped new line characters within quoted strings.

Customer Impact

The bug was customer-reported as part of testing various JSON payloads (both valid/invalid) and making sure the behavior is consistent.

When polling Utf8JsonReader.BytesConsumed, the user will now see a consistent result regardless of which input source contained their data (whether it was a span, or a multi-segment sequence where the number being parsed straddled a segment boundary). For example, after reading "2e2", that was split into three segments, we reported BytesConsumed as 2 instead of 3.

When the user observes the exception for invalid JSON (or programmatically polls the Line Number/Position properties), the message and values are accurate in many of the edge cases now. This helps with end-user-experience and diagnostics.

Regression?

No.

Risk

Low. Significant test cases were added and this only affects an edge use case on multi-segment buffers where the user is relying on the BytesConsumed property (low usage). The other changes involve the exception path for invalid JSON. The only risk is that we are updating code that was previously prone to off-by-one errors, but we have significant tests for all the various number inputs.

…adOnlySequence via Utf8JsonReader (dotnet#40303)

* Add tests for reading numbers when data is within a multi-segment
ReadOnlySequence.

* Test rename and add debug.assert.

* Test clean up - remove unnecessary/expensive Debug.Assert.

* Test and code cleanup.

* Address feedback - add stronger test condition for token length.
…ion message is accurate (dotnet#40349)

* Add tests for reading numbers when data is within a multi-segment
ReadOnlySequence.

* Test rename and add debug.assert.

* Test clean up - remove unnecessary/expensive Debug.Assert.

* Test and code cleanup.

* Address feedback - add stronger test condition for token length.

* Revert state on incomplete read and on error. Make sure exception
message is accurate.

* Fix byte position in line for invalid JSON and update tests.

* Address feedback - pass by in and move private struct to end.

* Fix string consumption and add various tests.

* Fix off-by-one error when calculating position.

* Test clean up - make sure multi-segment tests are in the right file.

* Update string parsing - do not increment line number for escaped newline
chars within quoted strings.

* Remove unnecessary custom state struct for string and re-use the same
one for both string and numbers.

* Update tests based on PR feedback.

* Create helper to get start sequence position from roll back state.
@ahsonkhan ahsonkhan changed the title [release/3.0] Fix BytesConsumed when reading JSON numbers within a multi-segment ReadOnlySequence via Utf8JsonReader [release/3.0] Fix BytesConsumed and error messages when reading JSON payloads within a multi-segment ReadOnlySequence via Utf8JsonReader Aug 20, 2019
@ahsonkhan ahsonkhan merged commit 91a30d4 into dotnet:release/3.0 Aug 20, 2019
@ahsonkhan ahsonkhan deleted the FixBytesConsumedMultiSegment branch August 20, 2019 05:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant