Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

\8 in sloppy non-template strings remains unspecified #2039

Closed
rkirsling opened this issue Jun 9, 2020 · 9 comments · Fixed by #2054
Closed

\8 in sloppy non-template strings remains unspecified #2039

rkirsling opened this issue Jun 9, 2020 · 9 comments · Fixed by #2054

Comments

@rkirsling
Copy link
Member

Currently, JSC bans \8 (and \9) in templates and strict strings but SM/V8/XS/Ch all allow it.

At first I thought this was simply due to a lack of tests, so I created tc39/test262#2654, but it turns out this has never been specified at all (though it was planned for ES7).

I still view the tests above as the goal, because:

  • We definitely should not leave this unspecified.
  • 11.8.4 and 11.8.6 are clear about disallowing B.1.2 for templates and strict strings.
  • B.1.1 has a non-octal production, so it makes sense for B.1.2 to follow suit.

Assuming no one is deeply opposed to this, I'll make a PR and present it at the next meeting.

@michaelficarra
Copy link
Member

I don't like calling things like this "unspecified". There are source texts which must be accepted by a conforming implementation, source texts which must be rejected by a conforming implementation (due to Forbidden Extensions or Early Errors, maybe other reasons), and source text which may be accepted by a conforming implementation but are not required to be.

The example of \8 and \9 in strings falls in to the last case. It can be considered an allowed language extension. If it is necessary for a web browser to implement this language extension to be compatible with web content, we should require it in Annex B. If it is not, we should consider adding it to the Forbidden Extensions because it is terrible.

@rkirsling
Copy link
Member Author

Hmm, sorry, I wasn't seeing an issue with having "unspecified" and "non-forbidden" be synonymous.

I don't think there's a web compatibility concern for templates and strict strings, but are you saying that we should try to ban these even in sloppy non-template strings? I figured it'd be the least surprising to have non-octals allowed just when legacy octals are allowed, but we could certainly consider going further.

@michaelficarra
Copy link
Member

Yeah, if we can get rid of them, we should. I don't care about consistency with numbers. I don't know if implementations are going to be willing to go through the effort of figuring out whether these are necessary for web compatibility or not, though.

@avp
Copy link

avp commented Jun 10, 2020

To clarify: the unspecified behavior discussion is only relevant for non-template strings, right?

11.8.6 does say that "A conforming implementation must not use the extended definition of EscapeSequence described in B.1.2 when parsing a TemplateCharacter." This should mean that Annex B discussion isn't relevant for template strings and

`\8`

should be an error due to the NotEscapeSequence behavior.

FWIW, Hermes allows \8 in strict string literals, but specifically rejects \8 in untagged template literals due to this.

@michaelficarra
Copy link
Member

michaelficarra commented Jun 10, 2020

@avp Yes I believe that is correct.

edit: To be very clear, I just mean untagged templates. Tagged templates have much more freedom with what can follow \.

@claudepache
Copy link
Contributor

B.1.2 says “The syntax and semantics of 11.8.4 is extended as follows, etc.”, except that it is not extended, it is patched.

So, while you’re at it, you could make the spec less schizophrenic, i.e., merge non-annex-b and annex-b grammars and semantics, and add early error rules for non-strict mode.

@rkirsling
Copy link
Member Author

Hmm, so upon investigating further, there are two points worth noting:

  1. Currently all engines agree that '\8' is '8' and not '\x08' in sloppy mode, in spite of no spec:

    λ eshost -sx "print('\7'.charCodeAt(0), '\8'.charCodeAt(0));"
    #### ch, jsc, sm, v8, xs
    7 56
    
  2. From a cursory look at some HTTP Archive data, @syg and I noticed a number of pages using \9 when document.writeing CSS into a page. 🤔 For instance, background: #000\9; or width: auto\9;.

    Now, \x09 is the same as \t...but given (1), \9 is the same as just 9. So it seems like these pages are rendering in spite of malformed CSS, but then they'd fail to render at all if '\9' were illegal.

It's hard to know for sure just how many pages would break due to (2), but I think it may be wisest to just spec (1) as web reality. 😓

@mathiasbynens
Copy link
Member

mathiasbynens commented Jun 12, 2020

I can elaborate on the Web usage of \9 in CSS: it’s a “CSS hack” that targets IE9 and older. See https://mathiasbynens.be/notes/safe-css-hacks#css-hacks for details.

.foo {
  color: red;
  color: green\9; /* IE9 and older */
}

The intention of the document.write examples you found was likely to write code like that to the document, except they forgot to escape the \ character in the string literal.

@syg
Copy link
Contributor

syg commented Jun 12, 2020

Wow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants