Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIT Syntax: Structured Annotations #58

Open
theduke opened this issue Jun 29, 2022 · 13 comments
Open

WIT Syntax: Structured Annotations #58

theduke opened this issue Jun 29, 2022 · 13 comments

Comments

@theduke
Copy link

theduke commented Jun 29, 2022

Many tools might want to add additional metadata to WIT declarations that modify code generation behaviour.

A concrete example would be the async option of wit-bindgen-wasmtime , which marks functions as async and currently has to be specified in a macro or on the command line.

I realize that that is only temporary until the component model gains async support, but there are many other use cases.

Examples:

  • Deprecating types/fields/variants, allowing generators to produce respective annotations in languages that support those
  • Customizing ownership semantics (eg don't implement Clone in Rust), or make functions consume a value
  • Rust: deriving additional traits on generates types
    (like Eq, Hash, serde::Serialize, ...)
  • Controlling the visibility of items
    Some languages like Rust make this easy with wrapper modules that selectively re-export, but that's not the case for others like Python, Javascript, etc where you might want to keep the generated types private (or protected in eg Java) and provide custom wrappers
  • Customizing field types
  • ...

If there is no standard way to declare these, generators will always need some custom metadata layer for customization, which seems suboptimal.

Prior Art

Protobuf

Protobuf has custom options, which is a particularly fancy system that allows defining well-typed options that are even restricted to specific scopes.

import "google/protobuf/descriptor.proto";

extend google.protobuf.MessageOptions {
  optional string my_option = 51234;
}

message MyMessage {
  option (my_option) = "Hello world!";
}

Cap'n Proto

Capn' Proto also has a well-typed annotation system.

annotation foo(struct, enum) :Text;
# Declare an annotation 'foo' which applies to struct and enum types.

struct MyType $foo("bar") {
  # Apply 'foo' to to MyType.

  # ...
}

Proposal

I'd personally love a well-typed annotation system inspired by the above, but I also understand if that is currently not appreciated / too complex.

I'd be happy to come up with a concrete proposal and implement it in wit-bindgen, but I wanted to get some opinions first.

An alternative would be untyped annotations that can be attached to a set of AST items (type declarations, fields, variants, functions, ...) and allow all valid tokens within delimiters.

For example:

@rust(derive = ["PartialEq", "Eq"])
record r {
    @deprecated
    a: string,
     @deprecated(reason = "use b2 instead")
     b: u64,
     b2: i64,
}

@deprecated
f = func()
@lukewagner
Copy link
Member

I think you're right we need to add some form of annotations to wit. Incidentally, there's also a custom annotations proposal in core wasm which has a vaguely similar motivation.

Just to see if we're in the same part of the design space here: do you imagine these annotations existing solely as inputs to wasm code generators, thereby influencing codegen but not being directly interpreted by wasm engines as part of the runtime semantics of a component?

As for how much or little structure to put into the annotation syntax: that's a great question and I don't myself have a great intuition about what's the right answer here. On the one hand, I guess there's plenty of precedent in C#, C++, Rust, etc where they just define an expression syntax but not much beyond that in terms of scoping or validation (iiuc, or maybe they do?). If we went this route, I guess we could do likewise, defining syntax for literal values of all the interface value types (which could be reused for default values -- or maybe default values are just specified via attributes?). Going beyond that as, e.g., CapnProto has done with explicit declarations and validation looks pretty neat; I guess I could see this becoming useful at a certain scale of wit and attribute usage. Does anyone have any direct experience with this or a related more-structured/typed annotation system?

@Pauan
Copy link

Pauan commented Jun 30, 2022

@lukewagner On the one hand, I guess there's plenty of precedent in C#, C++, Rust, etc where they just define an expression syntax but not much beyond that in terms of scoping or validation (iiuc, or maybe they do?).

You can see the syntax definition for Rust attributes here.

Basically these are the valid syntaxes for attributes:

#[foo]
#[foo = expr]
#[foo(tokens)]

In this case expr is an expression and tokens is an arbitrary number of tokens. The only restriction on tokens is that brackets must match. So these are invalid, because they have unmatching brackets:

#[foo([)]
#[foo(])]
#[foo([})]

But as long as the brackets match, any token is allowed:

#[foo(some = { "yes" => 5 + 10 }, [$x; T::bar])]

It doesn't even need to be valid expressions, because it's using syntax tokens.

This is absurdly flexible, it means that every attribute gets to define its own syntax, essentially creating a sub-language. That flexibility is probably overkill for Wasm.

Also note that it's possible to have multiple different attributes on the same item:

#[foo]
#[bar = 5]
#[qux(some ? fancy => syntax)]
struct SomeStruct {
    ...
}

The attributes are parsed one at a time, top-to-bottom.

@stevelr
Copy link

stevelr commented Jul 28, 2022

tossing another example into the mix .. Smithy was designed to solve many of the same problems. AWS uses it to define interfaces for 250+ web services, and code generators generate SDKs for all the supported languages and platforms that they support (a large cross product). The annotation feature in smithy is called "Traits" https://awslabs.github.io/smithy/1.0/spec/core/model.html#traits.

Some annotations can be used at runtime, for example @sensitive annotation on a field can be used by a logging library to omit logging that field. Others like @required are used during code generation - (e.g., to determine whether the field is wrapped with Option<> in Rust). (Obviously to be useful at runtime you need a language-neutral schema model that can be loaded - Smithy has a spec for that). It's easy for anyone to define new traits, each with its own 'schema' of parameters.

Most of the tooling developed by aws for smithy is written in Java, however there's a pure Rust library https://github.com/johnstonskj/rust-atelier that has a full parser, AST, and other tools.

I am not affiliated with aws, but I have used the rust-atelier crates to build code-generators for webassembly sdks.

@theduke
Copy link
Author

theduke commented Nov 10, 2022

So, what's the best way to move this forward?

A concrete proposal?

@lukewagner
Copy link
Member

Yes, a PR to this repo making a specific proposal we can discuss would be welcome. I'm imagining a PR would add the proposed syntax to WIT.md, the custom section binary format to Binary.md, and then perhaps an Annotations.md that described how it worked, with examples.

@badeend
Copy link
Contributor

badeend commented Dec 9, 2022

An other use for annotations could be to instruct code generators to emit specialized types, rather than their low-level representation:

#[type-hint("wasi:snapshot1/timestamp")]
type timestamp = u64;

#[type-hint("wasi:snapshot1/local-date-time")]
type local-date-time = u64;

#[type-hint("wasi:snapshot1/duration")]
type duration = u64;

#[type-hint("wasi:snapshot1/time-zone")]
type iana-time-zone = string;



export localize-date: func(utc: timestamp, tz: iana-time-zone) -> local-date-time;

As far as the component-model is concerned, the localize-date function would be of type func(u64, string) -> u64, but generators that are aware of the well-known "wasi:..." type hints could emit code like localize_date(Instant, TimeZone) -> LocalDateTime, etc.

@badeend
Copy link
Contributor

badeend commented Dec 9, 2022

My previous post relates to user defined types, but I guess the same can be done for the built-in value type specializations. Suppose the flags type didn't exist in ComponentModel1.0 and were to be added as part of ComponentModel2.0. Then the following ComponentModel2.0 (WIT) syntax:

flags my-flags {
    lego,
    marvel-superhero,
    supervillan,
}

could be lowered into the ComponentModel1.0-compatible form:

#[flags]
record my-flags {
    lego: bool,
    marvel-superhero: bool,
    supervillan: bool,
}

On first glance, it looks like this can be done for every specialized value type currently defined (tuple, flags, enum, union, option, result, string).

However, whereas the user-defined type hints of my previous post are only used at codegeneration time, for this latter usecase the wasm runtime would need special knowledge of these annotations if, for example, custom subtyping rules are desired.

@badeend
Copy link
Contributor

badeend commented Dec 11, 2022

In the context of WASI, annotations could also be used to hint at the host on how to resolve imports and specify which permissions each import requires. Potentially replacing external manifests.

@env("HOME") // Hints at the host to resolve this import with the value of the HOME environment variable.
import home-dir: string;

// Signals to the host that the component intends to use this socket factory to set up UDP connections to cloudflares DNS service.
@firewall(outbound = "udp:1.1.1.1:53", reason = "To resolve domain names.")
import sockets: "https://github.com/WebAssembly/wasi-sockets/spec.wit#socket-factory";

// Instrument the host to provide a filesystem with two directories mounted inside them:
@mount("/tmp", kind = FsMount::Temp)
@mount("/app-user-data", access = FsAccess:ReadWrite, reason = "To store your precious photo's.")
import fs: "https://github.com/WebAssembly/wasi-filesystem/spec.wit#fs";


export main: func(@from-command-line args: list<string>) -> unit; // Note the `@from-command-line`

@esoterra
Copy link
Contributor

Unfortunately, if we make Structured Annotations encode as Custom Sections, then

  1. we'll have to come up with a way to associate annotations with their subjects using indices,
  2. most tools won't understand them (e.g. a component-to-component optimizer) and may break those indices, and
  3. this data can't easily be preserved when composing components together.

In a way, what we want from Structured Annotations seems to be similar to what is being discussed for the "URL" that is attached to interfaces/functions/etc. in Components to identify e.g. that an interface is referring to a given WASI capability. It's information that we're, in many cases, going to want to supply to the runtime so that it can provide us the correct implementation of our imports or correctly interpret our exports.

This raises an interesting question: what if Structured Annotations are just a syntax sugar for encoding data in the "URL" (which might become more of a structured text field than specifically a URL) on each import. We would have to come up with a way to encode annotations in this field, but it would have the benefits that

  1. annotation data will be clearly associated with their subjects by appearing next to them in the binary,
  2. tools will already have to understand this field and can avoid breaking it as they operate on components, and
  3. the annotation data is easily preserved by simply keeping this field and not modifying it.

@lukewagner
Copy link
Member

Good points! Agreed on the problems with custom sections. I like the observation that arbitrary annotations can already be stuffed into the URL field of an externname and thus you can think of this feature as trying to provide a better developer experience for expressing annotations than forcing everyone to do their own ad hoc URL mangling.

In general, we can observe an emerging pattern in the component model where we've been taking semantic data that could have been stuffed into an import/export name string and factoring it out into separate specialized name sub-fields. We started by splitting out the developer-facing part of a name from the tooling-/runtime-facing URL and more-recently we're talking about splitting major/minor versions out of the URL into separate u32 immediates. A consistent goal here is letting the URL be a black box (only allowing equality) so that it can be a simple GUID-like (but readable) identifier that you look up in, e.g., a fixed list of interfaces that your tool/host/registry knows about. Thus, splitting out another name subfield for storing "structured annotations" can be similarly motivated.

@esoterra
Copy link
Contributor

That sounds good to me, how structured do you think this subfield would need to be?
I imagine we could have anything on the scale from a raw string or bytes to e.g. a structured encoding of the full set of value types. Which direction are you leaning?

@lukewagner
Copy link
Member

Good question. One requirement we seem to be converging on is that Wit should be co-expressive with component types so that we can render an arbitrary component's type as Wit and also do a rough roundtrip. So if the Wit syntax has complex structured annotations, then probably so should what goes in a component.

The harder question then is whether to encode the structured annotation via a mini binary-format grammar or via some grammar layered on top of a string, as we've done with name. The nice thing about the latter approach is that it lets you simply use the underlying string when you don't care about the internal structure (e.g., for logging, debugging, dictionary keys, ...). But for the string grammar of structured annotations, would we use the Wit structured annotation syntax suggested above, or does that feel too complex to bake into the component model? If not the Wit syntax, then what... S-Expressions ;-) ? If we go the mini-binary-grammar route, then it seems rather complex to have to pass around a whole recursive tree data structure everywhere we want to pass around an externname.

But another route might be to pare down structured annotations to the bare minimum that meets our requirements so that we don't even have to ask how to encode complex tree structures in externname. Thinking about the requirements, it seems like there's two rather-different scenarios:

  1. annotations that are mostly independent of the semantics of the import/export they are annotating and are instead only meant to be meaningful to a particular language or host (i.e., as in most of the earlier discussions)
  2. annotations that logically extend the URL, adding a data payload that is passed to the host to complement the URL and whose meaning is defined by the URL (i.e., as in your previous comment and, e.g., the "parameterized SQL query as a function import" use case).

Scenario 1 is more open-ended and makes me vaguely worried about composability. As long as we say "you can always strip these sorts of annotations", then I suppose that makes them fine. But then this makes me think that this use of annotations really do belong in a new kind of "annotations" custom section; they're not really part of the "name".

In Scenario 2, since the interpretation of the annotation is implied by the URL, a single raw string subfield of extername seems sufficient. But also, this sort of use case feels less like a "structured annotation" and more like a "payload" (of the URL). I think this scenario seems more central to the Component Model because this payload is meant to apply to all producers/consumers (that understand the interface identified by the URL) and it's not strippable. So I think perhaps this use case case should be considered separately from the generic "structured annotations" use case at the root of this issue.

@vados-cosmonic
Copy link
Contributor

vados-cosmonic commented Sep 27, 2023

Note that support for custom derives has landed in wit-bindgen thanks to @thomastaylor312 : bytecodealliance/wit-bindgen#678

Not quite full annotation support, but at the very least covers the custom derive use case!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants