-
Notifications
You must be signed in to change notification settings - Fork 171
PROPOSAL: alternative syntax and separate name space for Definitions #307
Comments
In general I approve of this. I've been working with data where I wanted composition on hidden fields and the closedness of definitions was a hassle to deal with, so I am happy to see them stay in the language As for (regular) hidden fields, it might be a good idea to choose another sigil. A lot of C language families allow _ to start identifiers and I can see it being confusing to someone just picking up the language that the fields they added are not being emitted. # is used as a comment marker for a lot of languages (notably yaml). It might be a good idea to explicitly detect if people use one for more friendlier errors |
I generally like this idea as well. One thing I like about Golang is the ability to see visibility of constructs by just looking at the first character. This would be a good measure (the ease with determining scope/visibility, open/closed) for the choice of denotation or syntax. I don't have any strong opinions on the exact format, just the ease with which I can comprehend code. |
I like the '#' character for referring to definitions. It looks odd, though, using the same character for establishing (defining?) these definitions. Have you considered using a separate token such as |
@verdverm Yes the proposed approach was, perhaps not surprisingly, inspired by exactly this property of Go. It is harder to use the casing approach of CUE, just because it is not always feasible to have the strict casing guidelines as in Go. Hence the choice for something like @seh: yes I definitely considered using a separate token. However, this will add some significant complexity to the language. Syntactically, one would need a separate production for the Whatever the choice is, though, given the nested nature of CUE, the rule would have to be the same for top-level as nested fields. Personally I don't think having nested Anyway, you hit on exactly the biggest issue of this proposal: the ugliness of
pros (compared to
So the question is whether people think these benefits weigh up to the ugliness. I think a big mitigating factor is that |
This sounds reasonable 👍
No (I didn't realize this was supported)
Yes. It's usually something I can work around though. The best benefit I can think of is the |
One thought on how to make the syntax a little nicer and unified: How about
|
@rudolph9: interesting thought. This would require extra syntax in the language, unfortunately. The But it is definitely a neat thought. Comments from others welcome in this regard. |
i like of this proposal |
I like the proposal as is, especially if it is simpler in both code and effort for the maintainers. Definitions definitely need to stand out more, the I am ok with |
Done |
This issue has been migrated to cue-lang/cue#307. For more details about CUE's migration to a new home, please see cue-lang/cue#1078. |
Proposal: Alternative Syntax for Definitions
Definitions are currently marked using the
::
syntax. This document proposes an alternative syntax. The purpose is to separate the name space of definitions from that of regular fields.Background
CUE currently uses
::
to mark a field as a definition. The field name is a regular identifier, meaning that definitions and regular fields occupy the same name space. This can pose a problem in mapping certain languages to CUE, where similar constructs live in a separate namespace.The most notable case causing issues is JSON Schema. JSON Schema allows a
definitions
(or$defs
) section to introduce definitions. Logically, such definitions would have to be mapped at the same level as the top-level fields in CUE. This does not pose a problem along as a convention is followed where fields are camel case and definitions are upper case. Unfortunately, this is often not the case.A similar issue exists, theoretically, for OpenAPI, Protocol Buffers, and Go. It is just that conventions are more strictly followed for these languages and that this issue is not a problem in practice.
Several secondary issues, like simplification of export rules, are addressed by this proposal.
Overview
In this document we propose lexically distinguishing identifiers for definitions from regular fields by requiring a
#
prefix that is part of the identifier of such definition, thereby implicitly creating a separate namespace for definitions. This would replace the::
notation.Before:
After:
Regular fields can still have a name starting with a
#
when the name isenclosed in quotes
"#foo": value
. The proposed mechanism is thus analogous tothe now legacy, but still supported construct of hidden fields (
_foo
).In fact, with this proposal the reintroduction of hidden fields becomes an option.
Details
Use of
#
in regular fieldsRegular fields that need to start with a
#
can use aliases:Export rules
The spec currently defines exporting rules for fields. These are not implemented. Part of the difficulty is exactly that definitions and regular fields occupy the same namespace: the spec has different rules for the two cases, but since they share the same namespace, it is not always obvious which rules should apply where.
With the two distinct namespaces, the following exporting rules would be feasible:
#
) are exported.#
is a Unicode uppercase letter (Unicode class "Lu").This still relies on casing, which may still not generally work when CUE is automatically translated from other languages. Given the conventions of existing languages, it generally seems to give a desirable outcome, however. A more aggressive exporting policy, for instance exporting all identifiers starting with non-lowercase letters or even exporting any definition not
starting with
#_
, may be in order.Interaction with hidden fields
The notation of the proposed change is analogous to that of hidden fields (no longer part of the spec, but still implemented in search of a good guideline for alternatives). The implementation of hidden fields is somewhat complicated. It is also a syntactically very different construct for something that is almost identical to definitions. It was therefore decided that we need
to phase out hidden fields.
With the current proposal, hidden fields become merely a slight variant of definitions: with
#Foo: { … }
the struct will be closed whereas for_Foo: {…}
it won’t. In both cases the field will not be part of data output.API
The current CUE API already distinguishes between regular fields and definitions. With proposed changes, bringing hidden fields in line with definitions, the same API could now be used to look up hidden fields. Other than that, the lookup API would not have to change.
The AST API could be simplified by removing the token type. There would be a long transition period, however, to support the old representation.
Discoverability
With the proposed change, it becomes easier to explain all different field types within a single table:
foo: x
: regular field$foo: x
: also a regular field, often interpreted as some meta field by the user. The$
has no meaning to CUE itself.#foo: x
: Definition: not part of the output when converted to data. Structs are implicitly closed. Can be used to define a complete definition of a type._foo: x
: Hidden field: like definitions, but structs are not implicitly closed. Can be used to define partial values that are not complete types. (TBD)"foo": x
: using double quotes any valid JSON string can be a field name for a regular field, including including"#foo"
and"_foo"
.All of these are just identifiers. Today,
a._foo
anda.$foo
are valid references. The advantage of this syntax is that the_
and$
signal to the users what kind of value is referenced. With#
-style identifiers this benefit is extended to definitions as well. For orthogonality,a."foo"
should be allowed as a valid reference (see The Query Proposal).Transition
Firstly, the proposal justifies the reintroduction of hidden fields. So any pain caused by the introduction of
#
-style definitions could be offset by no longer needing to transition off of hidden fields.A transition phase could allow both the
::
-style and#
-style identifiers to coexist and mean the same thing. A transition period could consist of the following steps:::
-style definitions to#
form, including all references.cue fmt
to rewrite old style definitions to new (where possible).::
.token.ISA
.In order to move definitions to their own namespace early on, it is important for parsed CUE to be rewritten to
#
-style identifiers before each compile. This means that representations of references will exhibit the#
-style identifiers even for code that uses the then legacy double colons.In the strictest implementation, definitions may only have
#
-style identifiers and cannot have free-form strings like regular fields can. This introduces the following incompatibilities:"\(expr)" :: value
or[expr] :: value
are no longer possible.-l
command line flag would no longer accept fields of the form"\(foo)"::
.A workaround for the first limitation is to move generated definition inside a map of a static definition (everything can be solved with another level of indirection). For instance:
can be rewritten as
The flag issue can be solved by allowing some annotation to indicate a field is a definition specific to this flag (perhaps even supporting the then legacy
::
). See also extensions.Extensions
The transition section discusses several limitations imposed by the proposal. If need be, the language could be extended to allow for "dynamic" definitions, for instance of the form
#(expr): value
, whereexpr
needs to evaluate to a valid identifier. See The Query Proposal for more details.Discussion
Comparison
Precedence
The use of
#foo
for definition has some analogy in JSON Schema, where the same notation is allowed for anchors to refer to schema in the$ref
field.Use of double colon
The use of double colon was derived from Haskell. It also has parallels with Jsonnet, where it is used to mean almost the same thing. The "almost" can also lead to confusion here. Note that Jsonnet also has a
:::
. This proposal will remove any pressure for CUE to follow suit.Alternatives
An alternative way to deal with separate namespaces, keeping
::
is to have another selector operator specifically for definitions. For instance,As CUE is lexically scoped, there is no ambiguity which of the two namespaces is meant and there is no need for any special marker using the first reference. The special operator is only needed for the selector,
Bar
.In other words, introducing separate namespaces without distinguishing identifiers lexically introduces a similar amount of clutter, but leads to less clarity. The users will have to learn a new symbol (
.:
) and will be more confronted between the difference between a reference and selector. Distinguishing fields from definitions lexically results in more symmetry between the two (#Foo.#Bar
) which in turn seems to lead to a more intuitive reading.Feedback wanted
"\(foo)" :: bar
or[foo] :: bar
)?The text was updated successfully, but these errors were encountered: