Skip to content

RFC 003

Ingy döt Net edited this page May 6, 2017 · 13 revisions

RFC-3 -- Characters that can appear in anchor

Tests: ...

The 1.2 spec productions for a YAML anchor allow too many characters to be in a anchor name. Specifically, it's a bad idea to allow YAML syntax characters in an anchor name.

These are the 1.2 productions:

c-ns-anchor-property ::= “&” ns-anchor-name
ns-anchor-name ::= ns-anchor-char+
ns-anchor-char ::= ns-char - c-flow-indicator
ns-char ::= nb-char - s-white
nb-char ::= c-printable - b-char - c-byte-order-mark
c-printable ::= #x9 | #xA | #xD | [#x20-#x7E]   /* 8 bit */
    | #x85 | [#xA0-#xD7FF] | [#xE000-#xFFFD]    /* 16 bit */
    | [#x10000-#x10FFFF]                        /* 32 bit */
b-char ::= b-line-feed | b-carriage-return
b-line-feed ::= #xA         /* LF */
b-carriage-return ::= #xD   /* CR */
c-byte-order-mark ::= #xFEFF
s-white ::= s-space | s-tab
s-space ::= #x20    /* SP */
s-tab ::= #x9       /* TAB */
c-flow-indicator ::= “,” | “[” | “]” | “{” | “}”

This reduces to:

c-ns-anchor-property ::= “&” ns-anchor-name
ns-anchor-name ::= ns-anchor-char+
ns-anchor-char ::= [#x21-#x7E] | #x85 | [#xA0-#xD7FF]
                 | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

Certainly #x85 and #xA0 do not belong as they are whitespace.

Neither do !, #, :, *, & and many other punctuation characters. Anchor names don't need to be that expressive and shouldn't look like they mean something else to YAML.

Anchor names should effectively be all the unicode "word" characters.

This change should probably occur in 2 phases. Once in 1.3 and once in 1.5 or beyond.

1.3 Plan

In 1.3 we should make this strict as possible using just the ASCII word characters [A-Za-z0-9]. libyaml currently supports these and also the '-' characters, so we should probably use that set for 1.3.

It is doubtful that anyone in the world currently uses characters outside this set; mostly because libyaml doesn't support them.

A-Z | a-z | 0-9 | '-' | '_'

1.5 Plan

Modern languages should embrace Unicode, and thus we can open this up to word characters beyond ASCII.

\p{Letter} | \p{Number} | '-' | '_'

Comments

@perlpunk: would like to add the dot. &a.1

@ingydotnet: I think that's ok. added.

Clone this wiki locally