Skip to content

RFC 003

Ingy döt Net edited this page May 1, 2017 · 13 revisions

RFC-3 -- Characters that can appear in anchor

Tests: ...

The 1.2 spec productions for a YAML anchor allow too many characters to be in a anchor name. Specifically, it's a bad idea to allow YAML syntax characters in an anchor name.

These are the 1.2 productions:

c-ns-anchor-property ::= “&” ns-anchor-name
ns-anchor-name ::= ns-anchor-char+
ns-anchor-char ::= ns-char - c-flow-indicator
ns-char ::= nb-char - s-white
nb-char ::= c-printable - b-char - c-byte-order-mark
c-printable ::= #x9 | #xA | #xD | [#x20-#x7E]   /* 8 bit */
    | #x85 | [#xA0-#xD7FF] | [#xE000-#xFFFD]    /* 16 bit */
    | [#x10000-#x10FFFF]                        /* 32 bit */
b-char ::= b-line-feed | b-carriage-return
b-line-feed ::= #xA         /* LF */
b-carriage-return ::= #xD   /* CR */
c-byte-order-mark ::= #xFEFF
s-white ::= s-space | s-tab
s-space ::= #x20    /* SP */
s-tab ::= #x9       /* TAB */
c-flow-indicator ::= “,” | “[” | “]” | “{” | “}”

This reduces to:

c-ns-anchor-property ::= “&” ns-anchor-name
ns-anchor-name ::= ns-anchor-char+
ns-anchor-char ::= [#x21-#x7E] | #x85 | [#xA0-#xD7FF]
                 | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

Certainly #x85 and #xA0 do not belong as they are whitespace.

Neither do !, #, :, *, & and many other punctuation characters. Anchor names don't need to be that expressive and shouldn't look like they mean something else to YAML.

Anchor names should effectively be all the unicode "word" characters. This could mean (using Perl regex semantics):

\p{Letter} | \p{Number} | '-' | '_' | '/' | '.'

1.3 Rationale

This change should be made in YAML 1.3. Most real world usage is [A-Za-z0-9_], so we should tighten this down now.

To Do

  • Verify assumptions
  • Determine all code points in the Perl char classes above.

Comments

@perlpunk: would like to add the dot. &a.1

@ingydotnet: I think that's ok. added.

Clone this wiki locally