Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utf8JsonReader.BytesConsumed incorrect for input "2e2" in multi-segment mode #30462

Closed
GSPP opened this issue Aug 2, 2019 · 9 comments
Closed
Assignees
Milestone

Comments

@GSPP
Copy link

GSPP commented Aug 2, 2019

When parsing "2e2", BytesConsumed seems to be smaller by 1 than it should be. The number itself is being parsed correctly. "[2e2]" also is being parsed correctly and again, BytesConsumed is incorrect.

Single-segment mode is correct.

Another interesting case to test is 2e+2.

static class ReproProgram
{
    public static void Run()
    {
        IEnumerable<Memory<byte>> GetMemories(byte[] array)
        {
            for (int i = 0; i < array.Length; i++)
                yield return array.AsMemory(i, 1);
        }

        var array = Encoding.UTF8.GetBytes("2e2");
        var jsonReader = new Utf8JsonReader(ReadOnlyBufferSegment.Create(GetMemories(array)));

        while (jsonReader.Read())
        {
            Console.WriteLine($"{jsonReader.TokenType} {jsonReader.BytesConsumed} {jsonReader.GetDouble()}");
        }

        Console.WriteLine($"Input length: {array.Length}, BytesConsumed: {jsonReader.BytesConsumed}");
    }

    class ReadOnlyBufferSegment : ReadOnlySequenceSegment<byte>
    {
        public static ReadOnlySequence<byte> Create(IEnumerable<Memory<byte>> buffers)
        {
            ReadOnlyBufferSegment segment = null;
            ReadOnlyBufferSegment first = null;
            foreach (Memory<byte> buffer in buffers)
            {
                var newSegment = new ReadOnlyBufferSegment()
                {
                    Memory = buffer,
                };

                if (segment != null)
                {
                    segment.Next = newSegment;
                    newSegment.RunningIndex = segment.RunningIndex + segment.Memory.Length;
                }
                else
                {
                    first = newSegment;
                }

                segment = newSegment;
            }

            if (first == null)
            {
                first = segment = new ReadOnlyBufferSegment();
            }

            return new ReadOnlySequence<byte>(first, 0, segment, segment.Memory.Length);
        }
    }
}

Number 2 200
Input length: 3, BytesConsumed: 2

This is 3.0.0-preview7-27912-14.

@GSPP
Copy link
Author

GSPP commented Aug 2, 2019

If you compare the parsing of 0{ in single-segment mode with parsing in multi-segment mode BytesConsumed is different again. Here, the JSON is invalid and an exception is thrown (which is correct).

Are you interested in fixing error cases as well? If yes I can report them, too. I can see that the tests attempt to validate situations like this e.g. in TestSingleStringsMultiSegmentByOne.


For {"\"" there is a discrepancy in the exception message. BytePositionInLine is reported differently.

BytesConsumed: 5, CurrentDepth: 1, TokenStartIndex: 1, TokenType: String, Exception: Expected a value, but instead reached end of data. LineNumber: 0 | BytePositionInLine: 5.
BytesConsumed: 5, CurrentDepth: 1, TokenStartIndex: 1, TokenType: String, Exception: Expected a value, but instead reached end of data. LineNumber: 0 | BytePositionInLine: 6.


Another discrepancy:

0e+1b
BytesConsumed: 0, CurrentDepth: 0, TokenStartIndex: 0, TokenType: None, Exception: 'e' is an invalid end of a number. Expected a delimiter. LineNumber: 0 | BytePositionInLine: 4.
BytesConsumed: 3, CurrentDepth: 0, TokenStartIndex: 0, TokenType: None, Exception: 'b' is an invalid end of a number. Expected a delimiter. LineNumber: 0 | BytePositionInLine: 4.

Here, it says

... 'e' is an invalid end of a number
... 'b' is an invalid end of a number


Here is a discrepancy in TokenStartIndex:

1e+0}
BytesConsumed: 4, CurrentDepth: 0, TokenStartIndex: 4, TokenType: Number, Exception: '}' is invalid after a single JSON value. Expected end of data. LineNumber: 0 | BytePositionInLine: 4.
BytesConsumed: 3, CurrentDepth: 0, TokenStartIndex: 3, TokenType: Number, Exception: '}' is invalid after a single JSON value. Expected end of data. LineNumber: 0 | BytePositionInLine: 4.


I'm doing some exhaustive testing with Utf8JsonReader. I am generating all possible strings to a certain length. For each string, I am testing certain invariants. For example, I am comparing the parsing results of the single and multi segment situations.

It would help my testing a lot if you could fix all the discrepancies in the messages. Maybe, I can find more serious bugs that way. Currently, the logs are polluted from all the differences in the error messages and in the positions.

@ahsonkhan ahsonkhan self-assigned this Aug 2, 2019
@ahsonkhan
Copy link
Member

@GSPP - thanks for the detailed report and validating with interesting test cases. I'll look into why some of these discrepancies exist. I am super interested in seeing what your additional testing effort uncovers :)

Also, cc @layomia

@GSPP
Copy link
Author

GSPP commented Aug 3, 2019

OK, please let me know when a new nightly is out with fixes. I have paused my efforts because the logs are currently not legible.

@GSPP
Copy link
Author

GSPP commented Aug 9, 2019

I think I have more issues to report but the logs are too polluted to work efficiently on this. I think we'll need 2-3 cycles of fixing and testing. Can we be in time to get the fixes into 3.0?

@ahsonkhan
Copy link
Member

ahsonkhan commented Aug 14, 2019

Are you interested in fixing error cases as well? If yes I can report them, too.

Yes, that would be great. We should fix those.

Edit: For the specific case where BytesConsumed is reporting different values after an exception (for invalid JSON), I don't think that is something I'd prioritize fixing because there is no guarantee that the reader is in any consistent state after an invalid json is parsed (and that includes BytesConsumed/TokenStartIndex). The bytes position in line, and the exception message should be correct though.

OK, please let me know when a new nightly is out with fixes. I have paused my efforts because the logs are currently not legible.

Can you share the whole log within a gist or some other format? It might be easier to fix many of them in one shot, especially if the root cause is identical. I have just fixed the first issue (for valid JSON): dotnet/corefx#40303

I think we'll need 2-3 cycles of fixing and testing. Can we be in time to get the fixes into 3.0?

Depends on the severity of the issue, but we should definitely fix anything that could affect correctness within user code. The mutli-segment code paths within the Utf8JsonReader don't have 100% test coverage.

@ahsonkhan
Copy link
Member

Fixed in dotnet/corefx#40303 and dotnet/corefx#40349 for master.

Fixed in dotnet/corefx#40422 release/3.0.

@GSPP - if you observe other inconsistencies or bugs within the Utf8JsonReader as part of your testing, please file a new issue. Thanks for finding these!

@GSPP
Copy link
Author

GSPP commented Aug 23, 2019

I installed the latest 5.0 master SDK (5.0.0-alpha1.19422.13) and tested the System.Text.Json.dll from it. I used Process Explorer to confirm that this DLL is actually loaded. The fix does not seem to be in there.

In there a build that I can download to test the fix?

@ahsonkhan
Copy link
Member

Looks like the latest SDK is referencing a version of the corefx libraries from a few days old (~Aug 9) and hence doesn't contain the fix:
https://github.com/dotnet/core-setup/blob/10bdf83b80772ddd94eb838d24edd0fb3babb5c2/eng/Version.Details.xml#L14

You could explicitly add a reference to the latest S.T.Json NuGet package (even with the SDK version you currently have) and test the fix:

  <ItemGroup>
    <PackageReference Include="System.Text.Json" Version="5.0.0-alpha1.19423.7" />
  </ItemGroup>

Otherwise, you'd have to wait for a new build of the SDK that contains the latest fix. Looks like master is up-to-date, but the build being produced isn't latest:
https://github.com/dotnet/core-setup/blob/66e183e466589cb661cec1a76991f1a5bbd7cc56/eng/Version.Details.xml#L12-L15

@GSPP
Copy link
Author

GSPP commented Aug 24, 2019

Thanks for that tip! These prerelease versions are not on the official NuGet site but I added the custom NuGet.config described in the Wiki. Is this where these prerelease packages come from?

I can confirm the fix. I will open a new issue with more findings.

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the 3.0 milestone Feb 1, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants