Skip to content

Commit

Permalink
feat: introduce compound (parameterizable) extension types and variat…
Browse files Browse the repository at this point in the history
…ions (#196)

* feat: introduce compound (parameterizable) extension types and variations
  • Loading branch information
jvanstraten authored Jul 4, 2022
1 parent 1080f06 commit a79eb07
Show file tree
Hide file tree
Showing 4 changed files with 137 additions and 2 deletions.
4 changes: 4 additions & 0 deletions proto/substrait/algebra.proto
Original file line number Diff line number Diff line change
Expand Up @@ -507,6 +507,10 @@ message Expression {
// points to a type_anchor defined in this plan
uint32 type_reference = 1;

// The parameters to be bound to the type class, if the type class is
// parameterizable.
repeated Type.Parameter type_parameters = 3;

// the value of the literal, serialized using some type-specific
// protobuf message
google.protobuf.Any value = 2;
Expand Down
20 changes: 20 additions & 0 deletions proto/substrait/type.proto
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ syntax = "proto3";

package substrait;

import "google/protobuf/empty.proto";

option csharp_namespace = "Substrait.Protobuf";
option go_package = "github.com/substrait-io/substrait-go/proto";
option java_multiple_files = true;
Expand Down Expand Up @@ -180,6 +182,24 @@ message Type {
uint32 type_reference = 1;
uint32 type_variation_reference = 2;
Nullability nullability = 3;
repeated Parameter type_parameters = 4;
}

message Parameter {
oneof parameter {
// Explicitly null/unspecified parameter, to select the default value (if
// any).
google.protobuf.Empty null = 1;

// Data type parameters, like the i32 in LIST<i32>.
Type data_type = 2;

// Value parameters, like the 10 in VARCHAR<10>.
bool boolean = 3;
int64 integer = 4;
string enum = 5;
string string = 6;
}
}
}

Expand Down
79 changes: 78 additions & 1 deletion site/docs/types/type_classes.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ Compound type classes are type classes that need to be configured by means of a

## User-Defined Types

User-defined type classes can be created using a combination of pre-defined types. User-defined types are defined as part of [simple extensions](../extensions/index.md#simple-extensions). An extension can declare an arbitrary number of user defined extension types. Initially, user defined types must be simple types (although they can be constructed of a number of inner compound and simple types).
User-defined type classes can be created using a combination of pre-defined types. User-defined types are defined as part of [simple extensions](../extensions/index.md#simple-extensions). An extension can declare an arbitrary number of user defined extension types.

A YAML example of an extension type is below:

Expand All @@ -58,3 +58,80 @@ A YAML example of an extension type is below:
This declares a new type (namespaced to the associated YAML file) called "point". This type is composed of two `i32` values named longitude and latitude. Once a type has been declared, it can be used in function declarations. [TBD: should field references be allowed to dereference the components of a user defined type?]

Literals for user-defined types are represented using protobuf [Any](https://developers.google.com/protocol-buffers/docs/proto3#any) messages.

### Compound User-Defined Types

User-defined types may be turned into compound types by requiring parameters to be passed to them. The supported "meta-types" for parameters are data types (like those used in `LIST`, `MAP`, and `STRUCT`), booleans, integers, enumerations, and strings. Using parameters, we could redefine "point" with different types of coordinates. For example:

```yaml
name: point
parameters:
- name: T
description: |
The type used for the longitude and latitude
components of the point.
type: dataType
```

or:

```yaml
name: point
parameters:
- name: coordinate_type
type: enumeration
options:
- integer
- double
```

or:

```yaml
name: point
parameters:
- name: LONG
type: dataType
- name: LAT
type: dataType
```

We can't specify the internal structure in this case, because there is currently no support for derived types in the structure.

The allowed range can be limited for integer parameters. For example:

```yaml
name: vector
parameters:
- name: T
type: dataType
- name: dimensions
type: integer
min: 2
max: 3
```

This specifies a vector that can be either 2- or 3-dimensional. Note however that it's not currently possible to put constraints on data type, string, or (technically) boolean parameters.

Similar to function arguments, the last parameter may be specified to be variadic, allowing it to be specified one or more times instead of only once. For example:

```yaml
name: union
parameters:
- name: T
type: dataType
variadic: true
```

This defines a type that can be parameterized with one or more other data types, for example `union<i32, i64>` but also `union<bool>`. Zero or more is also possible, by making the last argument optional:

```yaml
name: tuple
parameters:
- name: T
type: dataType
optional: true
variadic: true
```

This would also allow for `tuple<>`, to define a zero-tuple.
36 changes: 35 additions & 1 deletion text/simple_extensions_schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ properties:
type: object
additionalProperties:
$ref: "#/$defs/type"
parameters: # parameter list for compound types
$ref: "#/$defs/type_param_defs"
variadic: # when set, last parameter may be specified one or more times
type: boolean
type_variations:
type: array
minItems: 1
Expand All @@ -25,7 +29,7 @@ properties:
required: [parent, name]
properties:
parent:
type: string
$ref: "#/$defs/type"
name:
type: string
description:
Expand All @@ -47,6 +51,36 @@ $defs:
oneOf:
- type: string
- type: object
type_param_defs: # an array of compound type parameter definitions
type: array
items:
type: object
required: [type]
properties:
name: # name of the parameter (for documentation only)
type: string
description: # description (for documentation only)
type: string
type: # expected metatype for the parameter
type: string
enum:
- dataType
- boolean
- integer
- enumeration
- string
min: # for integers, the minimum supported value (inclusive)
type: number
max: # for integers, the maximum supported value (inclusive)
type: number
options: # for enums, the list of supported values
type: array
minItems: 1
uniqueItems: true
items:
type: string
optional: # when set to true, the parameter may be omitted at the end or skipped using null
type: boolean
arguments: # an array of arguments
type: array
items:
Expand Down

0 comments on commit a79eb07

Please sign in to comment.