-
Notifications
You must be signed in to change notification settings - Fork 5
RFC 003
2017/05/06 - Approved (1.3) - @flyx @ingydotnet @perlpunk #YS17
The 1.2 spec productions for a YAML anchor allow too many characters to be in a anchor name. Specifically, it's a bad idea to allow YAML syntax characters in an anchor name.
These are the 1.2 productions:
c-ns-anchor-property ::= “&” ns-anchor-name
ns-anchor-name ::= ns-anchor-char+
ns-anchor-char ::= ns-char - c-flow-indicator
ns-char ::= nb-char - s-white
nb-char ::= c-printable - b-char - c-byte-order-mark
c-printable ::= #x9 | #xA | #xD | [#x20-#x7E] /* 8 bit */
| #x85 | [#xA0-#xD7FF] | [#xE000-#xFFFD] /* 16 bit */
| [#x10000-#x10FFFF] /* 32 bit */
b-char ::= b-line-feed | b-carriage-return
b-line-feed ::= #xA /* LF */
b-carriage-return ::= #xD /* CR */
c-byte-order-mark ::= #xFEFF
s-white ::= s-space | s-tab
s-space ::= #x20 /* SP */
s-tab ::= #x9 /* TAB */
c-flow-indicator ::= “,” | “[” | “]” | “{” | “}”
This reduces to:
c-ns-anchor-property ::= “&” ns-anchor-name
ns-anchor-name ::= ns-anchor-char+
ns-anchor-char ::= [#x21-#x7E] | #x85 | [#xA0-#xD7FF]
| [#xE000-#xFFFD] | [#x10000-#x10FFFF]
Certainly #x85
and #xA0
do not belong as they are whitespace.
Neither do !
, #
, :
, *
, &
and many other punctuation characters.
Anchor names don't need to be that expressive and shouldn't look like they mean
something else to YAML.
Anchor names should effectively be all the unicode "word" characters.
This change should probably occur in 2 phases. Once in 1.3 and once in 1.5 or beyond.
In 1.3 we should make this strict as possible using just the ASCII word
characters [A-Za-z0-9]
. libyaml currently supports these and also the '-'
characters, so we should probably use that set for 1.3.
It is doubtful that anyone in the world currently uses characters outside this set; mostly because libyaml doesn't support them.
A-Z | a-z | 0-9 | '-' | '_'
NOTE: Leading, trailing and multiple consecutive dashes -
are not allowed.
Also each anchor name must contain at least one Number or Letter.
Modern languages should embrace Unicode, and thus we can open this up to word characters beyond ASCII.
\p{Letter} | \p{Number} | '-' | '_'