Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

throwInvalidUnicodeCharacter loops forever if input ends with part of unicode char #109

Closed
daviderenger opened this issue Nov 12, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@daviderenger
Copy link

This happened for me pretty frequently when parsing streaming result from chatGPT and it ends with one half emoji. Easy to test with:

const testString = '{"s \ud';

Suggested solution:

function throwInvalidUnicodeCharacter(start: number) {
  let end = start + 2;
  const maxUnicodeLength = 6;  // Maximum length of Unicode escape sequences

  while (end - start <= maxUnicodeLength && /\w/.test(text[end])) {
    end++;
  }

  const chars = text.slice(start, end);
  throw new JSONRepairError(`Invalid unicode character "${chars}"`, i);
}

Or something similar

@josdejong
Copy link
Owner

Thanks for reporting, that while loop is indeed missing a check to stop after 6 characters.

I've fixed this in [email protected] by adding some code to fix truncated unicode characters by removing the partial unicode character when located at the end of the input.

@josdejong josdejong added the bug Something isn't working label Nov 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants