HUGR contains two languages #1405

zrho · 2024-08-08T12:17:22Z

zrho
Aug 8, 2024
Collaborator

I have been meditating on the type system and the role of hugr-core's Type, TypeArg and TypeParam.
We are essentially implementing two programming languages:
one for runtime, which is described by the computation graph,
and a static meta language that is encoded into types, type args, operation definitions and type definitions.

We are essentially using a very restricted form of dependent types.
This sounds scary at first, but the language is restricted enough that the result is quite simple.
In this note I want to explore how hugr would look like from that perspective.

Operation Parameters via Dependent Functions

An operation without static parameters or child nodes is a value in the meta language and has type (op INS OUTS) where INS and OUTS are lists of types for the operation's input and output ports.
When the operation has a parameter, it is described by a function in the meta language that takes the parameter as an argument and produces a value of type (op INS OUTS).
Here the type of the input and output ports may depend on the value of the parameter;
therefore the function in the meta language is a dependent function.

For example, the type of an operation which takes a statically sized vector of qubits and reverses it, could be expressed in some hypothetical syntax as follows:

(fn
  (param !n u64)
  (op [(vec qubit !n)] [(vec qubit !n)]))

Here !n is the first argument to the dependent function which represents the statically known size of the qubit vector.
If we wanted to make this reverse operation generic in the type of elements in the vector,
we can take the element type as another parameter of type type:

(fn
  (param !a type)
  (param !n u64)
  (op [(vec !a !n)] [(vec !a !n)]))

Remark: For simplicity I will omit extension sets for now, but they can be added in the straightforward manner.

Nesting is Parameterisation over Operations

We could therefore give a conditional operation the following type:

(fn
  (param !ins (list type))
  (param !outs (list type))
  (param !then (op !ins !outs))
  (param !else (op !ins !outs))
  (op [bool . !ins] !outs))

The !then and !else parameters are the two branches of the conditional, which are passed in as child nodes of the conditional operation.
The input and output ports of the two branches have the same types, given by the type lists !ins and !outs.
The result is an operation with the same input and output ports, except for an additional boolean input that controls which branch is taken by the conditional.
Note that in this example the type of the !then and !else parameters depends on the value of the earlier !ins and !outs parameters.

Another example of a nested operation would be an operation that prepares a vector of statically known size by running a subgraph multiple times:

(fn
  (param !a type)
  (param !b type)
  (param !n u64)
  (param !prepare (op [!a] [!b]))
  (where (copy !a)) ; constraint that the input needs to be copyable
  (op [!a] [(vec !b n)]))

Data Types are Values

The type qubit of qubits is itself a value of type type.
This allows us to pass in qubit as a type parameter for the repeated preparation operation above.
Type constructors such as vec become meta language functions that return a type:

(fn (param !a type) (param !n u64) type)

We interpret this type as follows: given any type !a and any natural !n we obtain a type (vec !a !n) of vectors that contain !n many values of type !a.
As another example, an extension might want to define a tensor type with explicit dimensions such as (tensor f32 [512 128 128]). This tensor type would be described via the type:

(fn (param !e type) (param !ds (list int)) type)

We therefore reuse the same mechanism for extensions defining custom types and operations.

Aren't Dependent Types hard?

The meta language functions are uninterpreted functions and so type checking does not need any normalisation step:
You do not need to "run" programs in the meta language, apart from unwrapping aliases.
In particular, a value of type u64 really is just an integer and not a compile time program that evaluates to an integer.

The Data Type

Dependent types mix the value and type language.
We can therefore consolidate meta language values and types into a single Rust data type:

pub enum Term {
    /// The type of types.
    Type,
    /// A variable.
    Var(SmolStr),
    /// A symbolic function application.
    Apply(SmolStr, Vec<Term>),
    /// A list, with an optional tail.
    List(Vec<Term>, Option<Box<Term>>),
    /// The type of lists, given the type of the list elements.
    ListType(Box<Term>),
    /// A string.
    Str(SmolStr),
    /// The type of strings.
    StrType,
    /// A natural number.
    Nat(u64),
    /// The type of natural numbers.
    NatType,
    /// Extension set.
    ExtSet(ExtensionSet),
    /// The type of extension sets.
    ExtSetType,
    /// A tuple of values.
    Tuple(Vec<Term>),
    /// The type of tuples, given a list of types for the items.
    Product(Box<Term>),
    /// A variant given its index and value.
    Tagged(u64, Box<Term>),
    /// A sum type whose values are variants, given a list of types for the variants.
    Sum(Box<Term>),
    /// The type of operations, given lists of input and output types and an extension set.
    OpType(Box<Term>, Box<Term>, Box<Term>),
}

Terms do not contain any binders. Instead we have a non-recursive type of meta language functions:

pub struct Func {
    pub params: Vec<FuncParam>,
    pub body: Term,
}

pub struct FuncParam {
    pub name: SmolStr,
    pub value: Term
}

A Case against More Sophisticated Encodings

There is something weird about the Term type above: We know that a ListType must always
take a term that is a type. So Term::ListType(Box::new(Term::NatType)) is valid, while
Term::ListType(Box::new(Term::Nat(42))) is not. Similarly, we know that Product must always
contain a list:

Term::Product(Box::new(Term::List(vec![Term::NatType], None))) // valid
Term::Product(Box::new(Term::Var("foo".to_string()))) // valid, as long as `foo` refers to a list
Term::Product(Box::new(Term::Nat(32))) // not valid

In the spirit of making invalid terms unrepresentable, it is very tempting to make this visible in
the Rust type for Term, as is done in the current implementation of types in hugr-core. I want
to argue against doing this, at least on the boundary of the system, for the sake of simplicity.
It took me quite some time to understand the type system of hugr-core and my attempts to create
a serialisation format for it have been bogged down by frustration with the sheer number of
structs, enums and indirections involved. I am really sympathetic with the motivation behind this,
but think it's ultimately not pragmatic.

In the absence of structural invariants, we need type checking in order to verify that terms are well-formed.
However, type checking is required anyway: we need to check that the items of a list have the same type,
that Term::Apply nodes are passed arguments of the correct type, that a variable is not used in two positions
with a different type, etc.

For the sake of this discussion, I want to mostly ignore efficiency concerns in the encoding, such as
deduplication, hash consing, sharing, tabling, etc. Those things are important, but should
be discussed separately. However I want to note that a simple encoding of terms like the one above
makes it much easier to experiment with different implementation strategies. Do we want sharing
with reference counting? Just replace Box<...> with Arc<...> and Vec<...> with Arc<[...]>.
Do we want to implement sharing via a table? Just replace each occurance of Term with an index into
the table. Then cached information can be attached to the indices as well (compilers are databases),
instead of weaving it into the types themselves.

Inferred Parameters

Since types are just values, we have been able to blur the line between a type parameter of a polymorphic operation
and a value parameter of a parametric operation. As a consequence, type parameters are explicit.
For a compiler IR this might be okay; for example MLIR needs all types of polymorphic operations to be specified
explicitly at use. If we want to avoid this, we could provide a way to designate some parameters as implicit,
allowing use sites to omit them.

Constraints

Work in Progress

Type constraints can be integrated very nicely into this system, and can replace the current TypeBound system.
I will expand on this shortly, but wanted to get this idea out early to gather some comments.

acl-cqc · 2024-08-08T13:04:15Z

acl-cqc
Aug 8, 2024
Collaborator

"Dependent types" is a term I shy away from because (a) there are many flavours of thing and not everyone agrees on which flavours do count as "dependent types" or don't; (b) some of those flavours are "truly scary" in the sense of introducing undecidability, etc. etc.

That said, I would agree that the Hugr type system is in the space of things that some people sometimes call "dependent types" ;-), but avoids any of the scary issues around decidability etc. by, as you note, all "evaluation" being pure substitution with strong normalization (no infinite loops etc.); and then we have specific, well-constrained, "backdoors" (e.g. binary compute_signature) where you can break out of the system and do pretty much anything. That's the idea, anyway :)

Indeed, I think all of your examples written in hypothetical syntax, can be written in Rust using structs such as PolyFuncType.

In the spirit of making invalid terms unrepresentable, it is very tempting to make this visible in
the Rust type for Term, as is done in the current implementation of types in hugr-core. I want
to argue against doing this, at least on the boundary of the system, for the sake of simplicity.

That the system can be represented by a single thing called Type (which could be a kind or a type - or a type or a value, depending on your point of view) is certainly true. But I would at the least argue that whether this is "simpler" is debatable, subjective. In particular you would need a lot more kind checking algorithm: as it stands, kind checking is only required in places where the declarations (TypeParams to types/opdefs, etc.) and arguments (Types/TypeArgs) are made in data (i.e. OpDef's/Hugrs that could be parsed in from a file at runtime, say). Much of the kind-checking system has instead been encoded into the Rust type system, which is both an indication of the simplicity (and decidability ;-)) of the kind system that it can be thus encoded, and also offers faster feedback to the programmer (the Rust compiler tells you when you've got it wrong, rather than waiting until you run your code; you can tell many of the rules by looking just at the Rust declarations, rather than the definitions i.e. the code that executes).

So if the question here is, shall we use this simpler combined enum for hugr-model, I'm neutral on that (which is to say, ok if you prefer!). If you are arguing we should switch over to something more like the flat representation in Tierkreis, then so far I'm not keen myself - to me the rules seem much less obvious. This might be overcome with documentation, but then, confusion with the present system might be overcome with documentation, too. (FWIW I suspect that the difference is that for the "simple single struct" realization, documentation is a description of what is allowed, whereas for the explicit-in-Rust-type-system realization, documentation is "where do I find X"...)

1 reply

zrho Aug 8, 2024
Collaborator Author

Indeed, I think all of your examples written in hypothetical syntax, can be written in Rust using structs such as PolyFuncType.

That is intentional: All of this is designed to be widely compatible with the current system. It is not trivially translatable as is, since the current design forces to intermingle parsing, kind checking and bound inference. But it is an attempt to extract the essence of the design that is implicit in what we have in as small and coherent a way possible. Further differences are:

Sequences and rows are both unified under lists (freeing up the term row for when we want to add rows as per the usual definition).
Lists can have a tail/rest variable.
Lists are always homogeneous.
The "sum" type is split into products and sums (which then agree with the usual definition).
Tuples (heterogeneous sequences) are subsumed under products.
You can now synthesize types for values, whereas before a sequence has an ambiguous type that can only be checked.

What do you think about these?

But I would at the least argue that whether this is "simpler" is debatable, subjective.

I suppose the main difference lies in what the representation is used for: You can use it to implement specific transformations or analyses, or you could use it to implement infrastructural code (such as a text format, a binary format, a type checker, documentation tooling, a rewrite engine, etc). I do not know how convenient it is for the former purpose, since I have mostly been interested in adding the infrastructure. For that purpose, I have found the current representation to be quite frustrating to work with. Hence the push to extract the idea of the format so that we can have implementations of it that are most fit for a specific goal, instead of trying to solve the much harder task of finding tradeoffs that are sufficiently good for all.

If you are arguing we should switch over to something more like the flat representation in Tierkreis

Not for Tierkreis as it is. Only if and once we build a Tierkreis v2 based on hugr.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HUGR contains two languages #1405

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

HUGR contains two languages #1405

zrho Aug 8, 2024 Collaborator

Operation Parameters via Dependent Functions

Nesting is Parameterisation over Operations

Data Types are Values

Aren't Dependent Types hard?

The Data Type

A Case against More Sophisticated Encodings

Inferred Parameters

Constraints

Replies: 1 comment · 1 reply

acl-cqc Aug 8, 2024 Collaborator

zrho Aug 8, 2024 Collaborator Author

zrho
Aug 8, 2024
Collaborator

Replies: 1 comment 1 reply

acl-cqc
Aug 8, 2024
Collaborator

zrho Aug 8, 2024
Collaborator Author