Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(parser): quoted strings should not allow interior adjacent spaces #636

Merged
merged 1 commit into from
Jun 24, 2020

Conversation

gdamore
Copy link
Collaborator

@gdamore gdamore commented Jun 24, 2020

This doesn't catch all the ways that quoted strings should be treated
as constrained quoted text, but does fix this non-pathological test,
with fixed test cases.

Fixes #622

This doesn't catch all the ways that quoted strings should be treated
as constrained quoted text, but does fix this non-pathological test,
with fixed test cases.

Also added quoted_strings_test for xhtml5 (forgot to add this earlier).

Fixes bytesparadise#622
@codecov
Copy link

codecov bot commented Jun 24, 2020

Codecov Report

Merging #636 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #636   +/-   ##
=======================================
  Coverage   86.09%   86.09%           
=======================================
  Files          70       70           
  Lines        4661     4661           
=======================================
  Hits         4013     4013           
  Misses        399      399           
  Partials      249      249           

Copy link
Member

@xcoulon xcoulon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. Just a few rules in the grammar that can be updated (to be consistent with the rest of the grammar)

return types.NewQuotedString(types.SingleQuote, elements.([]interface{}))
}

SingleQuotedStringElements <- elements:(SingleQuotedStringElement+) {
return types.NewInlineElements(elements)
}

SingleQuoteStringStart <- "'`" ![ \t\r\n]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should be able to replace ![ \r\n\t] with [^ \r\n\t]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I think that is wrong. I don’t want to consume the token. This leaves the next tokens available in the input stream without consuming them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important for nesting quotes text for example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right! I think that &[^ \r\n\t] would work in that case then. I like the & syntax because it really says that it looks forward in the content without consuming the next character.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the two syntaxes are basically the same. Using ! should convey that as well as & with one fewer characters. I guess it’s all what you’re used to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I can try and refactor that later if you don't want to make the change here. It's not a big deal indeed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there might be a difference - what happens with EOF. Though that difference won’t matter in this particular case it might in others.

More precisely the ! syntax says that the element is NOT followed by one of those characters whereas the & says that it IS followed by a character that is not in that list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I believe it's fine here, but it matters for delimited blocks for which the ending delimiter can be omitted when the block is at the end of the document. (Asciidoc is permissive on that regards.)

@@ -431,7 +435,11 @@ DoubleQuotedStringElement <- element:(
return element, nil
}

DoubleQuotedStringFallbackCharacter <- ([^\r\n`] / "`" !"\"") {
DoubleQuoteStringStart <- "\"`" ![ \t\r\n]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reply.

@xcoulon xcoulon merged commit fc8a97e into bytesparadise:master Jun 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Smart quotes wrong when adjacent interior space present
2 participants