Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve repairing missing end quotes of strings containing delimiters like comma or single quote #102

Closed
stouch opened this issue Oct 9, 2023 · 8 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@stouch
Copy link

stouch commented Oct 9, 2023

For example, in playground (https://josdejong.github.io/jsonrepair/) :

{
  "message": "This test isn't successful

or

{
  "message": "This test is not successful, right?

both fail.
(strings with comma or simple-quote, without ending double quote)

My personal fixes are the followings :

  const lastDoubleQuote = uglyJson.trim().lastIndexOf('"');
  const lastSimpleQuote = uglyJson.trim().lastIndexOf("'");
  if (lastSimpleQuote > lastDoubleQuote) {
    // When the JSON string ends with a text containing a simple quote, without an ending double-quote
    //  jsonrepeair fails ..
    cleanJson = `${cleanJson}"`; // .. So we prepare the ugly JSON with an ending double-quote
  }
  const lastComma = cleanJson.trim().lastIndexOf(',');
  if (lastComma > lastDoubleQuote + 1) { // The + 1 is to avoid to catch a property separator comma `", "`
    // When the JSON string ends with a text containing a comma, without an ending double-quote
    //  jsonrepeair explodes.
    cleanJson = `${cleanJson}"`; // So we prepare the ugly JSON with an ending double-quote
  }
  cleanJson = jsonrepair(cleanJson);
@josdejong
Copy link
Owner

Thanks for reporting, it would be nice to improve on this.

The simplest logic to repair a missing quote is to add a missing quote at the end of the text. However, this can completely mess up a JSON document if only a single value somewhere at the beginning of a document is missing an end quote (see #97). Therefore, in case of a missing end quote, jsonrepair searches for the most logic place where the quote should have been, like right before a comma, or at a newline (see #101). So we should see if we can tune that logic further to cater for this case too.

@josdejong josdejong changed the title String with simple quote makes the json repair explode Improve repairing missing end quotes of strings containing delimiters like comma or single quote Oct 9, 2023
@josdejong josdejong added enhancement New feature or request help wanted Extra attention is needed labels Oct 9, 2023
@josdejong
Copy link
Owner

I was thinking about this. Right now, the logic checks for "any" delimiter to find the place where the end quote is missing. We can improve this by providing the internal function parseString with information on which delimiter is expected after the string. When parsing an object key, we expect : right after the string, when parsing an object value, we expect , or }, when parsing an array, we expect , or ]. That context information is necessary to decide on how to repair a string by finding the right place where the quote should have been.

@bboyle
Copy link

bboyle commented Oct 12, 2023

ran into an issue today where string values contained unescaped quotes, like so:

  "key": "apple "bee" carrot"

rather than

  "key": "apple \"bee\" carrot"

not sure if that's a similar issue with quote delimiters or completely opposite. thoughts? what sort of help would you like with this?

@josdejong
Copy link
Owner

That is a nice one. Right now, the library works locally: whilst parsing and encountering a missing delimiter like a comma or quote, it will repair the document locally. Your example probably requires looking at the document as a whole. I think in general, a document contains one type of error like missing quotes, a truncated document, or in your case having unescaped quotes in strings. To solve that for real, I think jsonrepair should go though a number of scenarios, applying only one type of repair at a time.

It would be interesting to think that through, though I don't think your case is trivial to solve. For example, how would one interpret the following case:

[
  "apple ","bee"," carrot"
]

Is this one string containing 4 unescaped quotes? Or are these just three valid strings? Maybe this is just a hypothetical case though, in your example it is possible to see which quote is the "real" end quote from the context.

@bboyle
Copy link

bboyle commented Oct 13, 2023

great question. without knowing the author's intent impossible to be sure what they wanted, but if I consider "what is valid JSON?" then your sample is already valid and thus jsonrepair should not alter it. the example I gave does not parse, so needs repair of some kind… turning it into "key": ["apple", "bee", "carrot"] would also make it valid… to me that feels like a bigger changes vs escaping the quotes, but again, impossible to be sure without knowing intent. I think the best a lib like jsonrepair can do is make it valid, and do so in a consistent way — possibly the behaviour could be toggled with config settings to enable/disable or favor certain rules? I think you get to pick how the library works :)

@josdejong
Copy link
Owner

Yes indeed, if the JSON is already valid, we're done directly. When not, the library somehow needs to go through a couple of "most logical" scenarios to identify what is wrong an how to fix it, looking at the whole of the document. It will be interesting to that out 🤔.

@josdejong
Copy link
Owner

The issue with repairing strings that miss an end quote and contain single quote is fixed via bc46250.

I managed to let the library repair unescaped double quotes inside a string too 😎, see #116. Published in v3.6.0.

@bboyle
Copy link

bboyle commented Feb 13, 2024

nice! :) thanks @josdejong!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants