From 3ec8d70ec6c2fae174203da2b78c9aaf4a19bf6c Mon Sep 17 00:00:00 2001 From: Domenic Denicola Date: Thu, 5 May 2022 15:46:38 -0400 Subject: [PATCH 01/11] Allow more characters in element/attribute names and prefixes Closes #849. --- dom.bs | 96 +++++++++++++++++++++++++++++++++++++--------------------- 1 file changed, 62 insertions(+), 34 deletions(-) diff --git a/dom.bs b/dom.bs index 4f4a767a1..bca341ce1 100644 --- a/dom.bs +++ b/dom.bs @@ -12,12 +12,12 @@ Indent: 1
 urlPrefix: https://www.w3.org/TR/xml/#NT-
- type: type
-  text: Name; url: Name
+ type: dfn
+  text: Name; url: Name; for: XML
   text: Char; url: Char
   text: PubidChar; url: PubidChar
 urlPrefix: https://www.w3.org/TR/xml-names/#NT-
- type: type
+ type: dfn
   text: QName; url: QName
 url: https://w3c.github.io/DOM-Parsing/#dfn-createcontextualfragment-fragment
  type: method; text: createContextualFragment(); for: Range
@@ -209,20 +209,43 @@ against a set, run these steps:
 added.
 
 
-

Namespaces

+

Name validation

-

To validate a qualifiedName, throw an -"{{InvalidCharacterError!!exception}}" {{DOMException}} if qualifiedName does not match -the QName production. +A [=string=] |string| is a valid base-case DOM API name if it does not contain +[=ASCII whitespace=], U+0000 NULL, U+002F (/), or U+003E (>). -

To validate and extract a namespace and qualifiedName, -run these steps: +

This concept is used to validate [=attribute=] [=Attr/local names=] and +[=Attr/qualified names=], as well as both [=Element/namespace prefix|element=] and +[=Attr/namespace prefix|attribute=] namespace prefixes, in the cases where elements and attributes +are constructed with DOM APIs. Note that the prefixes additionally cannot contain U+003A (:) by +construction since they always result from the [=validate and extract=] algorithm. + +A [=string=] |string| is a valid DOM API element local name if it matches the following +[=DOMAPIElementName=] +EBNF production. The notation used here is as +defined in XML. [[!XML]] + +

+DOMAPIElementName             ::= HTMLParserCompatibleName | BeyondHTMLParserName
+
+HTMLParserCompatibleName      ::= [a-zA-Z] [^#x00#x09#x0A#0xCx0D#x20/>]*
+
+BeyondHTMLParserName          ::= BeyondHTMLParserNameStartChar (BeyondHTMLParserNameChar)*
+BeyondHTMLParserNameStartChar ::= ":" | "_" | [#x80-#x10FFFF]
+BeyondHTMLParserNameChar      ::= BeyondHTMLParserNameStartChar | [a-zA-Z] | "-" | "." | [0-9]
+
+ +

This concept is used to validate [=/element=] [=Element/local names=], when +constructed by DOM APIs. The intention is to allow any name that is possible to construct using the +HTML parser, plus some additional possibilities. For those additional possibilities, the ASCII range +is restricted for historical reasons, but beyond ASCII anything is allowed. + + +

To validate and extract a namespace and qualifiedName:

  1. If namespace is the empty string, then set it to null. -

  2. Validate qualifiedName. -

  3. Let prefix be null.

  4. Let localName be qualifiedName. @@ -230,6 +253,12 @@ run these steps:

  5. If qualifiedName contains a U+003A (:), then strictly split the string on it and set prefix to the part before and localName to the part after. +

  6. If prefix is not a [=valid base-case DOM API name=], then [=throw=] an + {{InvalidCharacterError}}" {{DOMException}}. + +

  7. If localName is not a [=valid DOM API element local name=], then [=throw=] an + "{{InvalidCharacterError}}" {{DOMException}}. +

  8. If prefix is non-null and namespace is null, then throw a "{{NamespaceError!!exception}}" {{DOMException}}. @@ -5076,8 +5105,8 @@ method steps are to return the list of elements with class names classNa document is an HTML document or document's content type is "application/xhtml+xml"; otherwise null. -

    If localName does not match the Name production an - "{{InvalidCharacterError!!exception}}" {{DOMException}} will be thrown. +

    If localName is not a valid DOM API element local name an + "{{InvalidCharacterError}}" {{DOMException}} will be thrown.

    When supplied, options's {{ElementCreationOptions/is}} can be used to create a customized built-in element. @@ -5090,8 +5119,9 @@ method steps are to return the list of elements with class names classNa qualifiedName or null. Its local name will be everything after U+003A (:) in qualifiedName or qualifiedName. -

    If qualifiedName does not match the QName production an - "{{InvalidCharacterError!!exception}}" {{DOMException}} will be thrown. +

    If qualifiedName is not a (possibly-prefixed) + valid DOM API element local name an "{{InvalidCharacterError}}" {{DOMException}} will be + thrown.

    If one of the following conditions is true a "{{NamespaceError!!exception}}" {{DOMException}} will be thrown: @@ -5135,8 +5165,7 @@ method steps are to return the list of elements with class names classNa node whose target is target and data is data. - If target does not match the - Name production an + If target is not a valid base-case DOM API name an "{{InvalidCharacterError!!exception}}" {{DOMException}} will be thrown. If data contains "?>" an "{{InvalidCharacterError!!exception}}" {{DOMException}} will be thrown. @@ -5153,7 +5182,7 @@ method steps are to return the list of elements with class names classNa method steps are:

      -
    1. If localName does not match the Name production, then +

    2. If localName is not a valid DOM API element local name, then throw an "{{InvalidCharacterError!!exception}}" {{DOMException}}.

    3. If this is an HTML document, then set localName to @@ -5210,7 +5239,7 @@ to return a new {{Text}} node whose data i and node document is this.

      No check is performed that data consists of -characters that match the Char production. +characters that match the Char production.

      The createCDATASection(data) method steps are: @@ -5231,7 +5260,7 @@ to return a new {{Comment}} node whose datanode document is this.

      No check is performed that data consists of -characters that match the Char production +characters that match the Char production or that it contains two adjacent hyphens or ends with a hyphen.

      The @@ -5239,9 +5268,9 @@ or that it contains two adjacent hyphens or ends with a hyphen. method steps are:

        -
      1. If target does not match the +
      2. If target is does not match the - Name production, + [=XML/Name=] production, then throw an "{{InvalidCharacterError!!exception}}" {{DOMException}}.
      3. If data contains the string @@ -5258,7 +5287,7 @@ method steps are:

        No check is performed that target contains "xml" or ":", or that data contains characters that match the -Char production. +Char production.


        @@ -5359,7 +5388,7 @@ these steps: steps are:
          -
        1. If localName does not match the Name production in XML, +

        2. If localName is not a valid base-case DOM API name, then throw an "{{InvalidCharacterError!!exception}}" {{DOMException}}.

        3. If this is an HTML document, then set localName to @@ -5521,10 +5550,8 @@ interface DOMImplementation { Returns a doctype, with the given qualifiedName, publicId, and systemId. If qualifiedName does not - match the Name production, an - "{{InvalidCharacterError!!exception}}" {{DOMException}} is thrown, and if it does not match the - QName production, a - "{{NamespaceError!!exception}}" {{DOMException}} is thrown. + match the QName production, an + "{{InvalidCharacterError!!exception}}" {{DOMException}} is thrown.
          doc = document . {{Document/implementation}} . createDocument(namespace, qualifiedName [, doctype = null]) @@ -5557,7 +5584,8 @@ interface DOMImplementation { method steps are:
            -
          1. Validate qualifiedName. +

          2. If qualifiedName does not match the QName production, + then throw an "{{InvalidCharacterError}}" {{DOMException}}.

          3. Return a new doctype, with qualifiedName as its name, publicId as its public ID, and systemId @@ -5566,7 +5594,7 @@ method steps are:

          No check is performed that publicId code points match the -PubidChar production or that systemId does not contain both a +PubidChar production or that systemId does not contain both a '"' and a "'".

          The @@ -6573,8 +6601,8 @@ method steps are: method steps are:

            -
          1. If qualifiedName does not match the Name production in - XML, then throw an "{{InvalidCharacterError!!exception}}" {{DOMException}}. +

          2. If qualifiedName is not a valid base-case DOM API name, then throw + an "{{InvalidCharacterError!!exception}}" {{DOMException}}.

          3. If this is in the HTML namespace and its node document is an HTML document, then set qualifiedName to qualifiedName in @@ -6632,8 +6660,8 @@ steps are: method steps are:

              -
            1. If qualifiedName does not match the Name production in - XML, then throw an "{{InvalidCharacterError!!exception}}" {{DOMException}}. +

            2. If qualifiedName is not a valid base-case DOM API name, then throw + an "{{InvalidCharacterError!!exception}}" {{DOMException}}.

            3. If this is in the HTML namespace and its node document is an HTML document, then set qualifiedName to qualifiedName in From 4f5efc06faf73857b54443b8d4d40e15fa946aee Mon Sep 17 00:00:00 2001 From: Domenic Denicola Date: Mon, 16 May 2022 17:48:27 -0700 Subject: [PATCH 02/11] Review comments --- dom.bs | 35 +++++++++++++++++------------------ 1 file changed, 17 insertions(+), 18 deletions(-) diff --git a/dom.bs b/dom.bs index bca341ce1..f0d213ba9 100644 --- a/dom.bs +++ b/dom.bs @@ -211,7 +211,7 @@ added.

              Name validation

              -A [=string=] |string| is a valid base-case DOM API name if it does not contain +A [=string=] |string| is a valid base-case name if it does not contain [=ASCII whitespace=], U+0000 NULL, U+002F (/), or U+003E (>).

              This concept is used to validate [=attribute=] [=Attr/local names=] and @@ -220,13 +220,13 @@ A [=string=] |string| is a valid base-case DOM API name if it does no are constructed with DOM APIs. Note that the prefixes additionally cannot contain U+003A (:) by construction since they always result from the [=validate and extract=] algorithm. -A [=string=] |string| is a valid DOM API element local name if it matches the following -[=DOMAPIElementName=] +A [=string=] |string| is a valid element local name if it matches the following +[=ElementName=] EBNF production. The notation used here is as defined in XML. [[!XML]]

              -DOMAPIElementName             ::= HTMLParserCompatibleName | BeyondHTMLParserName
              +ElementName                   ::= HTMLParserCompatibleName | BeyondHTMLParserName
               
               HTMLParserCompatibleName      ::= [a-zA-Z] [^#x00#x09#x0A#0xCx0D#x20/>]*
               
              @@ -253,10 +253,10 @@ is restricted for historical reasons, but beyond ASCII anything is allowed.
                
            4. If qualifiedName contains a U+003A (:), then strictly split the string on it and set prefix to the part before and localName to the part after. -

            5. If prefix is not a [=valid base-case DOM API name=], then [=throw=] an +

            6. If prefix is not a [=valid base-case name=], then [=throw=] an {{InvalidCharacterError}}" {{DOMException}}. -

            7. If localName is not a [=valid DOM API element local name=], then [=throw=] an +

            8. If localName is not a [=valid element local name=], then [=throw=] an "{{InvalidCharacterError}}" {{DOMException}}.

            9. If prefix is non-null and namespace is null, then throw a @@ -5105,7 +5105,7 @@ method steps are to return the list of elements with class names classNa document is an HTML document or document's content type is "application/xhtml+xml"; otherwise null. -

              If localName is not a valid DOM API element local name an +

              If localName is not a valid element local name an "{{InvalidCharacterError}}" {{DOMException}} will be thrown.

              When supplied, options's {{ElementCreationOptions/is}} can be used to create a @@ -5120,7 +5120,7 @@ method steps are to return the list of elements with class names classNa U+003A (:) in qualifiedName or qualifiedName.

              If qualifiedName is not a (possibly-prefixed) - valid DOM API element local name an "{{InvalidCharacterError}}" {{DOMException}} will be + valid element local name an "{{InvalidCharacterError}}" {{DOMException}} will be thrown.

              If one of the following conditions is true a "{{NamespaceError!!exception}}" {{DOMException}} @@ -5165,7 +5165,7 @@ method steps are to return the list of elements with class names classNa node whose target is target and data is data. - If target is not a valid base-case DOM API name an + If target is not a valid base-case name an "{{InvalidCharacterError!!exception}}" {{DOMException}} will be thrown. If data contains "?>" an "{{InvalidCharacterError!!exception}}" {{DOMException}} will be thrown. @@ -5182,7 +5182,7 @@ method steps are to return the list of elements with class names classNa method steps are:

                -
              1. If localName is not a valid DOM API element local name, then +

              2. If localName is not a valid element local name, then throw an "{{InvalidCharacterError!!exception}}" {{DOMException}}.

              3. If this is an HTML document, then set localName to @@ -5268,8 +5268,7 @@ or that it contains two adjacent hyphens or ends with a hyphen. method steps are:

                  -
                1. If target is does not match the - +
                2. If target does not match the [=XML/Name=] production, then throw an "{{InvalidCharacterError!!exception}}" {{DOMException}}. @@ -5388,8 +5387,8 @@ these steps: steps are:
                    -
                  1. If localName is not a valid base-case DOM API name, - then throw an "{{InvalidCharacterError!!exception}}" {{DOMException}}. +

                  2. If localName is not a valid base-case name, then + throw an "{{InvalidCharacterError!!exception}}" {{DOMException}}.

                  3. If this is an HTML document, then set localName to localName in ASCII lowercase. @@ -6601,8 +6600,8 @@ method steps are: method steps are:
                      -
                    1. If qualifiedName is not a valid base-case DOM API name, then throw - an "{{InvalidCharacterError!!exception}}" {{DOMException}}. +

                    2. If qualifiedName is not a valid base-case name, then throw an + "{{InvalidCharacterError!!exception}}" {{DOMException}}.

                    3. If this is in the HTML namespace and its node document is an HTML document, then set qualifiedName to qualifiedName in @@ -6660,8 +6659,8 @@ steps are: method steps are:

                        -
                      1. If qualifiedName is not a valid base-case DOM API name, then throw - an "{{InvalidCharacterError!!exception}}" {{DOMException}}. +

                      2. If qualifiedName is not a valid base-case name, then throw an + "{{InvalidCharacterError!!exception}}" {{DOMException}}.

                      3. If this is in the HTML namespace and its node document is an HTML document, then set qualifiedName to qualifiedName in From 57afe005a5f06805f868272ee5ebdec8709b946f Mon Sep 17 00:00:00 2001 From: Domenic Denicola Date: Mon, 16 May 2022 17:52:53 -0700 Subject: [PATCH 03/11] Code review comments --- dom.bs | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/dom.bs b/dom.bs index f0d213ba9..5384a4cdb 100644 --- a/dom.bs +++ b/dom.bs @@ -211,7 +211,7 @@ added.

                        Name validation

                        -A [=string=] |string| is a valid base-case name if it does not contain +A [=string=] |string| is a valid basic name if it does not contain [=ASCII whitespace=], U+0000 NULL, U+002F (/), or U+003E (>).

                        This concept is used to validate [=attribute=] [=Attr/local names=] and @@ -221,12 +221,12 @@ are constructed with DOM APIs. Note that the prefixes additionally cannot contai construction since they always result from the [=validate and extract=] algorithm. A [=string=] |string| is a valid element local name if it matches the following -[=ElementName=] +[=ValidElementLocalName=] EBNF production. The notation used here is as defined in XML. [[!XML]]

                        -ElementName                   ::= HTMLParserCompatibleName | BeyondHTMLParserName
                        +ValidElementLocalName         ::= HTMLParserCompatibleName | BeyondHTMLParserName
                         
                         HTMLParserCompatibleName      ::= [a-zA-Z] [^#x00#x09#x0A#0xCx0D#x20/>]*
                         
                        @@ -253,8 +253,8 @@ is restricted for historical reasons, but beyond ASCII anything is allowed.
                          
                      4. If qualifiedName contains a U+003A (:), then strictly split the string on it and set prefix to the part before and localName to the part after. -

                      5. If prefix is not a [=valid base-case name=], then [=throw=] an - {{InvalidCharacterError}}" {{DOMException}}. +

                      6. If prefix is not a [=valid basic name=], then [=throw=] an + "{{InvalidCharacterError}}" {{DOMException}}.

                      7. If localName is not a [=valid element local name=], then [=throw=] an "{{InvalidCharacterError}}" {{DOMException}}. @@ -5165,7 +5165,7 @@ method steps are to return the list of elements with class names classNa node whose target is target and data is data. - If target is not a valid base-case name an + If target is not a valid basic name an "{{InvalidCharacterError!!exception}}" {{DOMException}} will be thrown. If data contains "?>" an "{{InvalidCharacterError!!exception}}" {{DOMException}} will be thrown. @@ -5387,7 +5387,7 @@ these steps: steps are:

                          -
                        1. If localName is not a valid base-case name, then +

                        2. If localName is not a valid basic name, then throw an "{{InvalidCharacterError!!exception}}" {{DOMException}}.

                        3. If this is an HTML document, then set localName to @@ -6600,7 +6600,7 @@ method steps are: method steps are:
                            -
                          1. If qualifiedName is not a valid base-case name, then throw an +

                          2. If qualifiedName is not a valid basic name, then throw an "{{InvalidCharacterError!!exception}}" {{DOMException}}.

                          3. If this is in the HTML namespace and its node document is an @@ -6659,7 +6659,7 @@ steps are: method steps are:

                              -
                            1. If qualifiedName is not a valid base-case name, then throw an +

                            2. If qualifiedName is not a valid basic name, then throw an "{{InvalidCharacterError!!exception}}" {{DOMException}}.

                            3. If this is in the HTML namespace and its node document is an From 93633752b31be00b7e8d4d29f064b2511c60a7dd Mon Sep 17 00:00:00 2001 From: Domenic Denicola Date: Mon, 16 May 2022 17:57:33 -0700 Subject: [PATCH 04/11] Add note about = --- dom.bs | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/dom.bs b/dom.bs index 5384a4cdb..3c687d085 100644 --- a/dom.bs +++ b/dom.bs @@ -214,11 +214,17 @@ added. A [=string=] |string| is a valid basic name if it does not contain [=ASCII whitespace=], U+0000 NULL, U+002F (/), or U+003E (>). -

                              This concept is used to validate [=attribute=] [=Attr/local names=] and -[=Attr/qualified names=], as well as both [=Element/namespace prefix|element=] and -[=Attr/namespace prefix|attribute=] namespace prefixes, in the cases where elements and attributes -are constructed with DOM APIs. Note that the prefixes additionally cannot contain U+003A (:) by -construction since they always result from the [=validate and extract=] algorithm. +

                              +

                              This concept is used to validate [=attribute=] [=Attr/local names=] and + [=Attr/qualified names=], as well as both [=Element/namespace prefix|element=] and + [=Attr/namespace prefix|attribute=] namespace prefixes, in the cases where elements and attributes + are constructed with DOM APIs. Note that the prefixes additionally cannot contain U+003A (:) by + construction since they always result from the [=validate and extract=] algorithm. + +

                              Note that this means [=attribute=] [=Attr/local names=] for attributes constructed with DOM APIs + can contain U+003D (=) in positions beyond the initial character, which is not possible when using + the HTML parser. +

                              A [=string=] |string| is a valid element local name if it matches the following [=ValidElementLocalName=] From 738fce9076cdd61a13774b818f879c477ddf5c03 Mon Sep 17 00:00:00 2001 From: Domenic Denicola Date: Wed, 18 May 2022 09:45:55 -0700 Subject: [PATCH 05/11] Oops --- dom.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dom.bs b/dom.bs index 3c687d085..3ae87b1ab 100644 --- a/dom.bs +++ b/dom.bs @@ -5171,7 +5171,7 @@ method steps are to return the list of elements with class names classNa node whose target is target and data is data. - If target is not a valid basic name an + If target does not match the [=XML/Name=] production an "{{InvalidCharacterError!!exception}}" {{DOMException}} will be thrown. If data contains "?>" an "{{InvalidCharacterError!!exception}}" {{DOMException}} will be thrown. From 79cae38eb9285f0f56781294ed77d2d9f28e899b Mon Sep 17 00:00:00 2001 From: Domenic Denicola Date: Mon, 6 Jun 2022 17:27:23 -0400 Subject: [PATCH 06/11] Seems reasonable --- dom.bs | 72 ++++++++++++++++++++++++++++++++++++---------------------- 1 file changed, 45 insertions(+), 27 deletions(-) diff --git a/dom.bs b/dom.bs index 3ae87b1ab..d345392d7 100644 --- a/dom.bs +++ b/dom.bs @@ -211,22 +211,13 @@ added.

                              Name validation

                              -A [=string=] |string| is a valid basic name if it does not contain -[=ASCII whitespace=], U+0000 NULL, U+002F (/), or U+003E (>). +A [=string=] is a valid namespace prefix if it does not contain [=ASCII whitespace=], +U+0000 NULL, U+002F (/), or U+003E (>). -
                              -

                              This concept is used to validate [=attribute=] [=Attr/local names=] and - [=Attr/qualified names=], as well as both [=Element/namespace prefix|element=] and - [=Attr/namespace prefix|attribute=] namespace prefixes, in the cases where elements and attributes - are constructed with DOM APIs. Note that the prefixes additionally cannot contain U+003A (:) by - construction since they always result from the [=validate and extract=] algorithm. - -

                              Note that this means [=attribute=] [=Attr/local names=] for attributes constructed with DOM APIs - can contain U+003D (=) in positions beyond the initial character, which is not possible when using - the HTML parser. -

                              +A [=string=] is a valid attribute local name if it does not contain [=ASCII whitespace=], +U+0000 NULL, U+002F (/), U+003D (=), or U+003E (>). -A [=string=] |string| is a valid element local name if it matches the following +A [=string=] is a valid element local name if it matches the following [=ValidElementLocalName=] EBNF production. The notation used here is as defined in XML. [[!XML]] @@ -234,7 +225,7 @@ defined in XML. [[!XML]]
                               ValidElementLocalName         ::= HTMLParserCompatibleName | BeyondHTMLParserName
                               
                              -HTMLParserCompatibleName      ::= [a-zA-Z] [^#x00#x09#x0A#0xCx0D#x20/>]*
                              +HTMLParserCompatibleName      ::= [a-zA-Z] [^#x00#x09#x0A#0x0Cx0D#x20/>]*
                               
                               BeyondHTMLParserName          ::= BeyondHTMLParserNameStartChar (BeyondHTMLParserNameChar)*
                               BeyondHTMLParserNameStartChar ::= ":" | "_" | [#x80-#x10FFFF]
                              @@ -246,8 +237,19 @@ constructed by DOM APIs. The intention is to allow any name that is possible to
                               HTML parser, plus some additional possibilities. For those additional possibilities, the ASCII range
                               is restricted for historical reasons, but beyond ASCII anything is allowed.
                               
                              +
                              + An equivalent EBNF is the following: + +
                              + ValidElementLocalName          ::= ValidElementLocalNameStartChar (ValidElementLocalNameChar)*
                              + ValidElementLocalNameStartChar ::= [a-zA-Z] | ":" | "_" | [#x80-#x10FFFF]
                              + ValidElementLocalNameChar      ::= [^#x00#x09#x0A#0x0Cx0D#x20/>]
                              + 
                              +
                              -

                              To validate and extract a namespace and qualifiedName: + +

                              To validate and extract a namespace and qualifiedName, given a +context:

                              1. If namespace is the empty string, then set it to null. @@ -259,11 +261,14 @@ is restricted for historical reasons, but beyond ASCII anything is allowed.

                              2. If qualifiedName contains a U+003A (:), then strictly split the string on it and set prefix to the part before and localName to the part after. -

                              3. If prefix is not a [=valid basic name=], then [=throw=] an +

                              4. If prefix is not a [=valid namespace prefix=], then [=throw=] an "{{InvalidCharacterError}}" {{DOMException}}. -

                              5. If localName is not a [=valid element local name=], then [=throw=] an - "{{InvalidCharacterError}}" {{DOMException}}. +

                              6. If context is "attribute" and localName is not a + [=valid attribute local name=], then [=throw=] an "{{InvalidCharacterError}}" {{DOMException}}. + +

                              7. If context is "element" and localName is not a + [=valid element local name=], then [=throw=] an "{{InvalidCharacterError}}" {{DOMException}}.

                              8. If prefix is non-null and namespace is null, then throw a "{{NamespaceError!!exception}}" {{DOMException}}. @@ -5214,7 +5219,8 @@ method steps are:

                                1. Let namespace, prefix, and localName be the result of - passing namespace and qualifiedName to validate and extract. + [=validate and extract|validating and extracting=] namespace and + qualifiedName given "element".

                                2. Let is be null. @@ -5393,7 +5399,7 @@ these steps: steps are:

                                    -
                                  1. If localName is not a valid basic name, then +

                                  2. If localName is not a valid attribute local name, then throw an "{{InvalidCharacterError!!exception}}" {{DOMException}}.

                                  3. If this is an HTML document, then set localName to @@ -5409,7 +5415,8 @@ method steps are:
                                    1. Let namespace, prefix, and localName be the result of - passing namespace and qualifiedName to validate and extract. + [=validate and extract|validating and extracting=] namespace and + qualifiedName given "attribute".

                                    2. Return a new attribute whose namespace is namespace, namespace prefix is prefix, local name is @@ -6606,8 +6613,14 @@ method steps are: method steps are:

                                        -
                                      1. If qualifiedName is not a valid basic name, then throw an - "{{InvalidCharacterError!!exception}}" {{DOMException}}. +

                                      2. +

                                        If qualifiedName is not a valid attribute local name, then throw an + "{{InvalidCharacterError!!exception}}" {{DOMException}}. + +

                                        Despite the parameter naming, + qualifiedName is only used as a [=Attr/qualified name=] if an [=attribute=] already + exists with that qualified name. Otherwise, it is used as the [=Attr/local name=] of the new + attribute. We only need to validate it for the latter case.

                                      3. If this is in the HTML namespace and its node document is an HTML document, then set qualifiedName to qualifiedName in @@ -6632,7 +6645,8 @@ method steps are:

                                        1. Let namespace, prefix, and localName be the result of - passing namespace and qualifiedName to validate and extract. + [=validate and extract|validating and extracting=] namespace and + qualifiedName given "element".

                                        2. Set an attribute value for this using localName, value, and also prefix and namespace. @@ -6665,8 +6679,12 @@ steps are: method steps are:

                                            -
                                          1. If qualifiedName is not a valid basic name, then throw an - "{{InvalidCharacterError!!exception}}" {{DOMException}}. +

                                          2. +

                                            If qualifiedName is not a valid attribute local name, then throw an + "{{InvalidCharacterError!!exception}}" {{DOMException}}. + +

                                            See the discussion above about why + we validate it as a local name, instead of a qualified name.

                                          3. If this is in the HTML namespace and its node document is an HTML document, then set qualifiedName to qualifiedName in From aaab1b5c0167c9f21af5707aeff066bcbfb8fca8 Mon Sep 17 00:00:00 2001 From: Domenic Denicola Date: Tue, 7 Jun 2022 11:52:06 -0400 Subject: [PATCH 07/11] EBNF bad --- dom.bs | 38 ++++++++++---------------------------- 1 file changed, 10 insertions(+), 28 deletions(-) diff --git a/dom.bs b/dom.bs index d345392d7..21f146630 100644 --- a/dom.bs +++ b/dom.bs @@ -211,43 +211,25 @@ added.

                                            Name validation

                                            -A [=string=] is a valid namespace prefix if it does not contain [=ASCII whitespace=], -U+0000 NULL, U+002F (/), or U+003E (>). +A [=string=] is a valid namespace prefix if its [=string/length=] is at least 1 and it +does not contain [=ASCII whitespace=], U+0000 NULL, U+002F (/), or U+003E (>). -A [=string=] is a valid attribute local name if it does not contain [=ASCII whitespace=], -U+0000 NULL, U+002F (/), U+003D (=), or U+003E (>). +A [=string=] is a valid attribute local name if its [=string/length=] is at least 1 and +it does not contain [=ASCII whitespace=], U+0000 NULL, U+002F (/), U+003D (=), or U+003E (>). -A [=string=] is a valid element local name if it matches the following -[=ValidElementLocalName=] -EBNF production. The notation used here is as -defined in XML. [[!XML]] +A [=string=] is a valid element local name if all of the following conditions hold: -
                                            -ValidElementLocalName         ::= HTMLParserCompatibleName | BeyondHTMLParserName
                                            -
                                            -HTMLParserCompatibleName      ::= [a-zA-Z] [^#x00#x09#x0A#0x0Cx0D#x20/>]*
                                            -
                                            -BeyondHTMLParserName          ::= BeyondHTMLParserNameStartChar (BeyondHTMLParserNameChar)*
                                            -BeyondHTMLParserNameStartChar ::= ":" | "_" | [#x80-#x10FFFF]
                                            -BeyondHTMLParserNameChar      ::= BeyondHTMLParserNameStartChar | [a-zA-Z] | "-" | "." | [0-9]
                                            -
                                            +* its [=string/length=] is at least 1; +* its first [=code point=] is an [=ASCII alpha=], U+003A (:), U+005F (_), or in the range U+0080 to + U+10FFFF inclusive; and +* its subsequent [=code points=], if any, are not [=ASCII whitespace=], U+0000 NULL, U+002F (/), or + U+003E (>).

                                            This concept is used to validate [=/element=] [=Element/local names=], when constructed by DOM APIs. The intention is to allow any name that is possible to construct using the HTML parser, plus some additional possibilities. For those additional possibilities, the ASCII range is restricted for historical reasons, but beyond ASCII anything is allowed. -

                                            - An equivalent EBNF is the following: - -
                                            - ValidElementLocalName          ::= ValidElementLocalNameStartChar (ValidElementLocalNameChar)*
                                            - ValidElementLocalNameStartChar ::= [a-zA-Z] | ":" | "_" | [#x80-#x10FFFF]
                                            - ValidElementLocalNameChar      ::= [^#x00#x09#x0A#0x0Cx0D#x20/>]
                                            - 
                                            -
                                            - -

                                            To validate and extract a namespace and qualifiedName, given a context: From 8566a753f45819b072bdcc9526fd3e2de14a6097 Mon Sep 17 00:00:00 2001 From: Domenic Denicola Date: Tue, 7 Jun 2022 12:26:33 -0400 Subject: [PATCH 08/11] Code review comments --- dom.bs | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/dom.bs b/dom.bs index 21f146630..7469826c5 100644 --- a/dom.bs +++ b/dom.bs @@ -211,19 +211,21 @@ added.

                                            Name validation

                                            -A [=string=] is a valid namespace prefix if its [=string/length=] is at least 1 and it +

                                            A [=string=] is a valid namespace prefix if its [=string/length=] is at least 1 and it does not contain [=ASCII whitespace=], U+0000 NULL, U+002F (/), or U+003E (>). -A [=string=] is a valid attribute local name if its [=string/length=] is at least 1 and -it does not contain [=ASCII whitespace=], U+0000 NULL, U+002F (/), U+003D (=), or U+003E (>). +

                                            A [=string=] is a valid attribute local name if its [=string/length=] is at least 1 +and it does not contain [=ASCII whitespace=], U+0000 NULL, U+002F (/), U+003D (=), or U+003E (>). -A [=string=] is a valid element local name if all of the following conditions hold: +

                                            A [=string=] is a valid element local name if all of the following conditions hold: -* its [=string/length=] is at least 1; -* its first [=code point=] is an [=ASCII alpha=], U+003A (:), U+005F (_), or in the range U+0080 to - U+10FFFF inclusive; and -* its subsequent [=code points=], if any, are not [=ASCII whitespace=], U+0000 NULL, U+002F (/), or - U+003E (>). +

                                              +
                                            • its [=string/length=] is at least 1; +
                                            • its first [=code point=] is an [=ASCII alpha=], U+003A (:), U+005F (_), or in the range U+0080 + to U+10FFFF, inclusive; and +
                                            • its subsequent [=code points=], if any, are not [=ASCII whitespace=], U+0000 NULL, U+002F (/), + or U+003E (>). +

                                            This concept is used to validate [=/element=] [=Element/local names=], when constructed by DOM APIs. The intention is to allow any name that is possible to construct using the From acb56a72f775910067a65a099f86d58caf966b09 Mon Sep 17 00:00:00 2001 From: Domenic Denicola Date: Tue, 7 Jun 2022 13:23:39 -0400 Subject: [PATCH 09/11] Branching --- dom.bs | 47 ++++++++++++++++++++++++++++++++++++----------- 1 file changed, 36 insertions(+), 11 deletions(-) diff --git a/dom.bs b/dom.bs index 7469826c5..5a7855fcc 100644 --- a/dom.bs +++ b/dom.bs @@ -217,20 +217,45 @@ does not contain [=ASCII whitespace=], U+0000 NULL, U+002F (/), or U+003E (>).

                                            A [=string=] is a valid attribute local name if its [=string/length=] is at least 1 and it does not contain [=ASCII whitespace=], U+0000 NULL, U+002F (/), U+003D (=), or U+003E (>). -

                                            A [=string=] is a valid element local name if all of the following conditions hold: +

                                            A [=string=] |name| is a valid element local name if the following steps return true: -

                                              -
                                            • its [=string/length=] is at least 1; -
                                            • its first [=code point=] is an [=ASCII alpha=], U+003A (:), U+005F (_), or in the range U+0080 - to U+10FFFF, inclusive; and -
                                            • its subsequent [=code points=], if any, are not [=ASCII whitespace=], U+0000 NULL, U+002F (/), - or U+003E (>). -
                                            +
                                              +
                                            1. If |name|'s [=string/length=] is 0, then return false. + +

                                            2. +

                                              If |name|'s first [=code point=] is an [=ASCII alpha=], then: + +

                                                +
                                              1. If |name| contains [=ASCII whitespace=], U+0000 NULL, U+002F (/), or U+003E (>), then + return false. + +

                                              2. Return true. +

                                              + +
                                            3. If |name|'s first [=code point=] is not U+003A (:), U+005F (_), or in the range U+0080 + to U+10FFFF, inclusive, then return false. -

                                              This concept is used to validate [=/element=] [=Element/local names=], when +

                                            4. If |name|'s subsequent [=code points=], if any, are not [=ASCII alphas=], [=ASCII digits=], + U+002D (-), U+002E (.), U+003A (:), U+005F (_), or in the range U+0080 to U+10FFFF, inclusive, then + return false. + +

                                            5. Return true. +

                                            + +

                                            This concept is used to validate [=/element=] [=Element/local names=], when constructed by DOM APIs. The intention is to allow any name that is possible to construct using the -HTML parser, plus some additional possibilities. For those additional possibilities, the ASCII range -is restricted for historical reasons, but beyond ASCII anything is allowed. +HTML parser (the branch where the first [=code point=] is an [=ASCII alpha=]), plus some additional +possibilities. For those additional possibilities, the ASCII range is restricted for historical +reasons, but beyond ASCII anything is allowed. + +

                                            +

                                            The following JavaScript-compatible regular expression is an implementation of the above + definition: + +

                                            +  /^(?:[A-Za-z][^\0\t\n\f\r\u0020/>]*)|(?:[:_\u0080-][A-Za-z-.:_\u0080-]*)$/u
                                            + 
                                            +

                                            To validate and extract a namespace and qualifiedName, given a context: From 0d5d94593406e432cebc838d5c7a5394305f03e5 Mon Sep 17 00:00:00 2001 From: Domenic Denicola Date: Tue, 7 Jun 2022 13:29:58 -0400 Subject: [PATCH 10/11] Strings seem to be zero-based per Infra --- dom.bs | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/dom.bs b/dom.bs index 5a7855fcc..1a37d003d 100644 --- a/dom.bs +++ b/dom.bs @@ -223,7 +223,7 @@ and it does not contain [=ASCII whitespace=], U+0000 NULL, U+002F (/), U+003D (=

                                          4. If |name|'s [=string/length=] is 0, then return false.

                                          5. -

                                            If |name|'s first [=code point=] is an [=ASCII alpha=], then: +

                                            If |name|'s 0th [=code point=] is an [=ASCII alpha=], then:

                                            1. If |name| contains [=ASCII whitespace=], U+0000 NULL, U+002F (/), or U+003E (>), then @@ -232,7 +232,7 @@ and it does not contain [=ASCII whitespace=], U+0000 NULL, U+002F (/), U+003D (=

                                            2. Return true.

                                            -
                                          6. If |name|'s first [=code point=] is not U+003A (:), U+005F (_), or in the range U+0080 +

                                          7. If |name|'s 0th [=code point=] is not U+003A (:), U+005F (_), or in the range U+0080 to U+10FFFF, inclusive, then return false.

                                          8. If |name|'s subsequent [=code points=], if any, are not [=ASCII alphas=], [=ASCII digits=], From bea65ca34ef5e489f6d86c71c9b57eb6d910045f Mon Sep 17 00:00:00 2001 From: Domenic Denicola Date: Wed, 8 Jun 2022 11:22:37 -0400 Subject: [PATCH 11/11] Fix missing end of range and ASCII digits --- dom.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dom.bs b/dom.bs index 1a37d003d..1643aad64 100644 --- a/dom.bs +++ b/dom.bs @@ -253,7 +253,7 @@ reasons, but beyond ASCII anything is allowed. definition:

                                            -  /^(?:[A-Za-z][^\0\t\n\f\r\u0020/>]*)|(?:[:_\u0080-][A-Za-z-.:_\u0080-]*)$/u
                                            +  /^(?:[A-Za-z][^\0\t\n\f\r\u0020/>]*)|(?:[:_\u0080-\u{10FFFF}][A-Za-z0-9-.:_\u0080-\u{10FFFF}]*)$/u