Release TypeDB 3.0.0 · typedb/typedb

Download from TypeDB Package Repository:

Pull the Docker image:

docker pull typedb/typedb:3.0.0

Announcement

This last year of rewriting TypeDB 2.x into TypeDB 3.0 has been an adventure of learning, iterating, and lots of fun for our whole team.

We're thrilled to say we've achieved our first tranche of goals:

Initial testing shows TypeDB 3.0's performance is comparable-to or surpasses MongoDB in transactional workloads (a 3-5x performance gain over TypeDB 2.x!)
TypeDB 3.0 leverages its new Rust codebase to greatly increase correctness, reduce memory footprint, and improve performance
It features a simplified architecture that opens the door for adding new features and optimisations quickly

Not only that, but we've upgraded TypeQL to version 3.0 to directly address long-standing requests:

Rules have been replaced with Functions: functions are more flexible, easier to reason about, and much more familiar to programmers. Functions are like subqueries you can re-use and invoke whenever you want. We think you'll love them!
Query pipelining: combined with more powerful expressions and functions, you can now build read, write, or transformation pipelines that do everything you need without coming back to the client. This is a triple threat: more readable, more maintainable, and more performant.
Data constraints: we welcome the arrival of Cardiniality, value Range, and value Enumeration restrictions, among others. These have been our top requested features for over a year and now they're at your fingertips! TypeDB will now automatically validate that your data has the exact connectivity and shape you require.

There's so much more than this, and we hope you'll dig into TypeDB 3.0 to try it out.

Top 10 TypeDB 3.0 Features

Goodbye sessions, hello transactions: TypeDB 3.0 eliminates sessions and simplifies your interactions to use 3 types of transactions. Use read, write, or schema transactions when reading data, modifying data, or exclusively modifying schema & data, respectively. Transactions are still ACID up to snapshot isolation, and support concurrent reads and writes.
Standard return types - Rows or Documents: TypeDB 3.0 simplifies the answers to be either Rows or Documents. Rows will feel familiar to SQL users, though still contain our traditional Concept data types. Documents can be structured and build using the enhanced fetch clause.
Enhanced schema language: We've streamlined define, undefine, and introduce redefine to modify schemas - while restructuring the definition language to be shorter, more consistent, and more understandible:

TypeDB 2.x:

define
  person sub entity,
    owns age,
    plays friendship:friend;
  friendship sub relation,
    relates friend;
  age sub attribute, value string;

  abstract-friendship sub relation,
    relates friend, abstract;


undefine
  abstract-friendship relates friend, abstract; # does this undefine the `abstract`, the `friend` role, or the `abstract-friendship`?

TypeDB 3.x:

define
  entity person,
    owns age,
    plays friendship:friend;
  relation friendship,
    relates friend;
  attribute age, value string;
  relation abstract-friendship,
    relates friend @abstract;
  entity child, owns age;
  entity dog;


undefine
  @abstract from abstract-friendship relates friend; # clearly removes @abstract
  owns age from child; # clearly removes only the age ownership from the child type
  dog; # clearly undefines the dog type

Query pipelines: Pipelines are best illustrated. Let's say we want to assign a tax credit for a person called Bill, corresponding to how many children he has.

In TypeDB 2.x, we have to split this into multiple queries that do redundant work, and perform multiple network round trips. In addition, we have to split some of our logic between the database and the application, which damages maintainability!

# count how many children Bill has
children = tx.query().get_aggregate('match $p isa person, has email "[email protected]"; (child: $child, parent: $p) isa parentship; get $child; sum;').resolve()
# compute tax credit
tax_credit = 1000*children;
# assign tax credit
tx.query().insert(f'match $p isa person, has email "[email protected]"; insert $p has tax_credit {tax_credit};')

In TypeDB 3.0, this is streamlined into one query pipeline:

tx.query("""
  match $p isa person, has email "[email protected]"; parentship (child: $child, parent: $p);
  reduce $count = count($child) within $p;
  match let $tax_credit = 1000 * $count;
  insert $p has tax_credit == $tax_credit;
""")

This is a simple pipeline: we could continue to chain operations, such as inserts and deletes, to build complex transformations.

Reduce and aggregation operations: As the previous example illustrates, what used to be a get; sum; operation is now a reduce clause! This is a much more expressive way of aggregating values and allows using them in future operations.
Enhanced Fetch: fetch clauses now are structured exactly how they will return JSON objects:

match
    $p isa person, has email $email, has age $age; $email == "[email protected]";
fetch {
    "email": $email, 
    "tax-identifier": $p.tax_id,
    "names": [ $p.name ],
    "age_next_year": $age + 1,
    "total_salary": (
        match
        $p has salary $salary;
        return sum($salary);
    ),
    "children_ages": [
        match
        parentship (child: $child, parent: $p);
        fetch {
            "age": $child.age,
        };
    ],
    "all_attributes": { $p.* }
};

This will return a stream of JSON documents that look like this:

{
    "email": "[email protected]", 
    "tax-identifier": "123-45-6789",
    "names": [ "bill", "billstone" ],
    "age_next_year": 51,
    "total_salary": 50000,
    "children_ages": [
      { "age": 10 },
      { "age": 13 },
    ],
    "all_attributes": { 
      "email": "[email protected]",
      "age": 50,
      "tax_id": "123-45-6789",
      "names" [ "bill", "billstone" ],
      "salary": [ 10000, 40000 ],
    }
};

{
    "email": "[email protected]",
    "tax-identifier": "123-45-6789",
    "names": [
        "bill",
        "billstone"
    ],
    "age_next_year": 51,
    "total_salary": 50000
    "children_ages": [
        { "age": 10 },
        { "age": 12 }
    ],
    "all_attributes": {
        "age": 50,
        "email": "[email protected]",
        "name": [
            "bill",
            "billstone"
        ],
        "salary": [
            10000,
            40000
        ],
        "tax_id": "123-45-6789"
    },

}

Functions: functions take a set of arguments, and return either a stream or a tuple of concepts. Functions can contain any read pipeline!

define
fun mean_salary($p: person) -> double:
  match $p has salary $salary;
  return mean($salary);

Let-bindings and inline expressions: expressions are now bound to a new variable (using the same $ syntax - no more ? variables!) using the 'let' keyword. If you don't need to name your result, you can inline expressions:

match
  $p has salary $annual_salary;
  let $monthly_salary = $annual_salary / 12;
  parentship (parent: $p, child: $child);
  $child has age > (10 + 8);

Data constraints: TypeDB 3.0 ships with built-in controls for cardinality, values, ranges, and abstractness, best shown by example:

define
  entity person, 
    owns email, # NEW: unless specified, ownerships have a default cardinality of @card(0..1) (exactly 0 or 1 attributes)
    owns name @card(0..),  # specifically relaxed cardinality of 0 to infinity
    owns tax_id @card(1, 1), # require exactly 1 tax_id to be owned 
    owns age,
    owns gender;

    plays parentship:parent, # NEW: unless specified, played roles have an implicit @card(0..) (any number of connections allowed)
    plays parentship:child;

  relation parentship,
    relates child, # NEW: unless specified, relation roles have a default cardinality of @card(0..1) (exactly 0 or 1 of the role)
    relates parent @card(1..2);  # specialised relation roles to require 1 or 2 parents for each parentship relation

  attribute age, value integer @range(0..); # require all ages to be of values equal to or greater than 0
  attribute gender, value string @values("male", "female", "other"); # restrict the domain of values to an exact set

New built-in value types: TypeDB 3.0 ships with a larger set of value types:

integer (renamed from long in 2.x)
double
boolean
string
NEW: date, representing a date, without a time
datetime
NEW: datetime-tz, representing a date time with timezone
NEW: duration
NEW: Fixed-Decimal numbers

Aggregated release note ---------------

New Features

User authentication and network encryption

User authentication

Bring user authentication and user management functionalities into TypeDB Core through the following changes:
- Implement user management functionality, handling user creation, retrieval, update, and deletion.
- Implement authentication protocol - TypeDB Core drivers are now required to supply credentials when making a connection
System database

Implement a system database, a special database that the server will use to store various systems-related information. The primary motivation is to store database server users for authentication purpose, but in the future can be extended to store other system-related information.

Network encryption

Bring network (a.k.a. in-flight encryption), which can be configured through the CLI by the user, supporting all possible scenarios:
1. Unencrypted mode, where communications are performed through plain text
2. Encrypted mode, where communications are encrypted using TLS certificate. Additionally, a custom CA certificate may be specified if the certificate was not generated by "well-known" certificate authority such as LetsEncrypt, CloudFlare.
Propagate TypeQL syntax updates

We mirror the changes from typedb/typeql#383, which is composed of 3 changes:
1. We now require that all assignments are preceded with a let keyword:
```
match
 $p isa person, has monthly-salary $monthly;
 let $annual_salary = $monthly * 12;
```
1. We rename long to integer, to use a more familiar term for integer values (long is very c-like!)
```
define
  attribute age, value integer;
```
This is still defined as a 64-bit signed integer in TypeDB's backend.
1. We make a consistent constructor syntax for instances:
```
insert
  $p isa person;      # create entity
  $f isa friendship (friend: $p);    # create relation with players
  $a isa age 20;      # create attribute with value
```
Previously, we would write this as:
```
insert
  $p isa person;
  $f (friend: $p) isa friendship;
  $a isa age; $a == 20;
```
Note that the anonymous relation constructor is now allowed like this:
```
insert
  friendship (friend: $p) ;
```
instead of:
```
insert
  (friend: $p) isa friendship;
```
Relation index

We implement the application of the relation index, which is an optimised storage-layer index that allows skipping through a relation from one player to another player, avoiding one lookup operation and giving the query planner the choice to produce different sorted intersection points. This change could improve performance on traversals across relations by up to 50%.
Beam planner

We replace the greedy planner with a simple beam search, with beam width dependent on the size of the query graph, up to a maximum of 128 partial plan candidates. This allows us to properly leverage intersection capabilities of the executor.

We also simplify the individual planner vertex cost calculations.
IID constraint

We add support for IID constraints in match clauses.

Example:
```
match $x iid 0x1E001A0000000012345678;
```
Introduce naive retries for suspended function calls
Introduces retries for suspended function calls.
Functions may suspend to break cycles of recursion. Restoring suspend points is essential to completeness of recursive functions, and thus correctness of negations which call them.

Is constraint

We implement the is constraint support in queries:

match
    $x isa company;
    $y isa person;
    $z isa person;
    not { $y is $z; };  # <---
    $r1 isa employment, links ($x, $y);
    $r2 isa employment, links ($x, $z);
    select $x, $y, $z;

Implement tabling machinery for cyclic functions
Introduces machinery needed to support finding the fixed-point of cyclic function calls. Cyclic functions now run, and return an incomplete set of results followed by an error. It is possible that the planner chooses a plan.

Disjunction support

We introduce support for disjunctions in queries:

match
    $person isa person;
    { $person has name $_; } or { $person has age $_; };

Introduce periodic fsync to guarantee commits are persisted to disk
Commits guarantee that changes have been persisted to the disk (using fsync) before changes are visible and the commit is acknowledged. The sync is done periodically and committing threads must wait for a signal that the sync is complete. This avoids the performance penalty of each commit doing its own sync, while providing guarantees that committed data is not lost in the event of an OS crash.
Introduce 3.0 server diagnostics

Diagnostics

We introduce a diagnostics package to collect metrics related to server usage. There are multiple potential clients of this package, a couple of them being initially implemented:
- A web endpoint for pulling data (e.g. Prometheus) is bound at port 4104 by default and exposes diagnostics formatted for prometheus (http://localhost:4104/diagnostics?format=prometheus) or as JSON (http://localhost:4104/diagnostics?format=JSON) (done);
- A push http client for sending diagnostics data to the TypeDB Diagnostics Service (done);
- PostHog (will be done in a near future).
If the reporting is turned off, we report a single batch of minimal diagnostics information after 1 hour of runtime just to mark that a server has been booted up. No user information is shared.
If the reporting is turned on, the reporting happens every hour following the rules set in 2.x.
If the development mode is active (details can be found below), no reporting is executed, and the reporting config flag is ignored.

Configuration

We add new CLI flags and for configuration:
1. --diagnostics.monitoring.port 4014 for configuring the monitoring server's port;
2. --diagnostics.monitoring.enable false for disabling and enabling the monitoring server (true by default, thus called enable and requires a boolean value);
3. --diagnostics.reporting.metrics false for disabling and enabling the collected load and usage metrics' reporting (true by default, ignored in the development mode).
An additional switch for the development mode has also been added in order to use TypeDB's release binaries in CI.

Development mode

We reintroduce the development mode. As in 2.x, this mode is a default mode for local bazel (& cargo) builds and snapshots. It's turned off in published releases but can be explicitly switched on through CLI, as mentioned in the Configuration section.
WARNING: the development mode is used by the TypeDB's developers and is not intended to be used in other programs. Be aware that its usage can affect the server's performance and stability.
Introduce PostHog diagnostics and correct server shutdown

Diagnostics

We add PostHog diagnostics events reporting for gathering usage data about users' journeys through TypeDB and enhance the user experience after connecting it with other metrics collected in PostHog.
The reporting configuration remains the same: the --diagnostics.reporting.metrics flag controls both the old (Service diagnostics) and the new (PostHog) reporting, letting a user disable everything in one action.

There are two groups of events sent:
- server usage: hourly heartbeats with deployment/server IDs and the version of the server, can optionally contain information about actions performed without connection to a specific database (opened connection, created user, etc.);
- database usage: optional event (can be not sent if there is nothing to send), similar to the optional server usage action metrics, but regarding a specific database (created database, opened transaction, executed query).
Server shutdown

Now, the server is correctly shut down when using CTRL-Csignal, awaiting for all the background tasks and printing pretty and informative messages:
```
Running TypeDB CE 3.0.0-alpha-10.
Ready!
^C
Received a CTRL-C signal. Shutting down...
Exited.
```
Additionally, we add a feature of sending diagnostics (both types) on server shutdown (if enabled).

Query execution analyser

We implement a useful debugging feature - a query analyzer. This is similar to Postgres's Explain Analyze, which produces both the query plan plus some details about the data that has flowed through the query plan and the time at each step within it.

Example output:

Query profile[measurements_enabled=true]
  -----
  Stage or Pattern [id=0] - Match
    0. Sorted Iterator Intersection [bound_vars=[], output_size=1, sort_by=p0]
      [p0 isa ITEM] filter [] with (outputs=p0, )
    ==> batches: 158, rows: 10000, micros: 6407

    1. Sorted Iterator Intersection [bound_vars=[p0], output_size=2, sort_by=p1]
      Reverse[p1 rp p0 (role: __$2__)] filter [] with (inputs=p0, outputs=p1, checks=__$2__, )
    ==> batches: 854, rows: 39967, micros: 75716

  -----
  Stage or Pattern [id=1] - Reduce
    0. Reduction
    ==> batches: 1, rows: 10000, micros: 116035

  -----
  Stage or Pattern [id=2] - Insert
    0. Put attribute
    ==> batches: 10000, rows: 10000, micros: 5890

    1. Put has
    ==> batches: 10000, rows: 10000, micros: 54264

When disabled, profiling is a no-op (no strings are created, locks taken, or times measured), though there is still some cost associated with cloning Arcs containing the profiling data structures around.

To enable query profiling, the easiest way (for now) is to enable TRACE logging level for the executor package, currrently configured in //common/logger/logger.rs:

.add_directive(LevelFilter::INFO.into())
// add:
// .add_directive("executor=trace".parse().unwrap())

Alternatively, just set the enable boolean to true in the QueryProfile::new() constructor.

Implement query executable cache

We implement the cache (somewhat arbitrarily limited to 100 entries) for compiled executable queries, along with cache maintanance when statistics change significantly or the schema updates.

Query execution without any cache hits still looks like this:
```
Parsing -> Translation (to intermediate representation) -> Annotation -> Compilation -> Execution
```
However, with a cache hit, we now have:
```
Parsing -> Translation ---Cache--> Execution
```
skipping the annotation and compilation/planning phases, which take significant time.

Note that schema transactions don't have a query executable cache, since keeping the cache in-sync when schema operations run can be error prone.

The query cache is a structural cache, which means it will ignore all Parameters in the query: variable names, constants and values, and fetch document keys. Most production systems run a limited set of query structures, only varying values and terms, making a structural cache like this highly effective!
Stabilise fetch and introduce fetch functions
We introduce function invocation in fetch queries. Fetch can call already existing functions:
```
match
  $p isa person;
fetch {
  "names": [ get_names($p) ],
  "age": get_age($p)
};
```
and also use local function blocks:
```
match
  $p isa person;
fetch {
  "names": [
    match
      $p has name $n;
      return { $n };
  ],
  "age": (
    match
      $p has age $a;
      return first $a;
  )
};
```
In the examples above, results collected in concept document lists ("names": [ ... ]) represent streams (purposely multiple answers), while single (optional) results ("age": ...) represent a single concept document leaf and do not require (although allow) any wrapping.

Moreover, we stabilize attribute fetching, allowing you to expect a specific concept document structure based on your schema. This way, if the attribute type is owned with default or key cardinality (@card(0..1) or @card(1..1)), meaning that there can be at most one attribute of this type, it can be fetched as a single leaf, while other cardinalities force list representation and make your system safe and consistent. For example, a query

match
  $p isa person;
fetch {
  $p.*
};

can automatically produce a document like

{
  "names": [ "Linus", "Torvalds" ],
  "age": 54
}

with

define
  entity person
    owns name @card(1..),
    owns age @card(1..1);

Additionally, fetch now returns attributes of all subtypes x sub name; y sub name; for $p.name, just like regular match queries like match $p owns name $n.

With this, feel free to construct your fetch statements the way you want:

match
  $p isa person, has name $name;
  fetch {
    "info": {
      "name": {
        "from entity": [ $p.person-name ],
        "from var": $name,
      },
      "optional age": $p.age,
      "rating": calculate_rating($p)
    }
  };

to expect consistent structure of the results:

[
  {
    "info": {
      "name": {
        "from entity": [ "Name1", "Surname1" ],
        "from var": "Name1"
      },
      "optional age": 25,
      "rating": 19.5
    }
  },
  {
    "info": {
      "name": {
        "from entity": [ "Name1", "Surname1" ],
        "from var": "Surname1"
      },
      "optional age": 25,
      "rating": 19.5
    }
  },
  {
    "info": {
      "name": {
        "from entity": [ "Bob" ],
        "from var": "Bob"
      },
      "optional age": null
      "rating": 28.783
    }
  }
]

Introduce 3.0 undefine queries
We introduce undefine queries with the new "targeted" syntax to enable safe and transparent schema concept undefinitions.

If you want to undefine the whole type, you can just say:
```
undefine
  person;
```
If you want to undefine a capability from somewhere, use undefine **capability** from **somewhere**.
It consistently works with owns (person is preserved, only the owns is undefined):
```
undefine
  owns name from person;
```
annotations of owns (person owns name is preserved, only @regex(...) (you don't need to specify the regex's argument) is undefined):
```
undefine
  @regex from person owns name;
```
and any other capability, even specialisation:
```
undefine
  as parent from fathership relates father;
```
Want to undefine multiple concepts in one go? Worry not!
```
undefine
  person;
  relates employee from employment;
  @regex from name;
  @values from email;
```
The error messages in all definition queries (define, redefine, and undefine) were enhanced, and the respective query/language BDD tests were introduced in the CI with concept unit tests.

Additional fixes:
- answers of match capability queries like match $x owns $y, match $x relates $y, and $match $x plays $y now include transitive capabilities;
- define lets you write multiple declarations of an undefined type, specifying its kind anywhere, even in the last mention of this type;
- failures in schema queries for schema transactions no longer lead to freezes of the following opened transactions;
- no more crashes on long string attributes deletion with existing has edges;
Introduce 3.0 value and as match constraints
We introduce value match constraint for querying for attribute types of specific value types:
```
match
  $lo value long;
```
We introduce as match constraint for querying for role types specialisation:
```
match
  $relation relates $role as parentship:child;
```
which is equivalent to:
```
match
  $relation relates $role;
  $role sub parentship:child;
```
but is useful for shortening your queries, making it similar to define definitions, and in case of anonymous variables:
```
match
  $relation relates $_ as parentship:child;
```

Basic streaming functions
Implement function execution for non-recursive stream functions which return streams only.
Recursive functions & Non-stream functions will throw unimplemented.

These can currently only be used from the preamble. Sample:

           with
            fun get_ages($p_arg: person) -> { age }:
            match
                $p_arg has age $age_return;
            return {$age_return};

            match
                $p isa person;
                $z in get_ages($p);

Expression support

We add expression execution to match queries:

match
    $person_1 isa person, has age $age_1;
    $person_2 isa person, has age $age_2;
    $age_2 == $age_1 + 2;

Fetch execution

We implement fetch execution, given an executable pipeline that may contain a Fetch terminal stage.

Note: we have commented out match-return subqueries, fetching of expressions ({ "val" : $x + 1 }), and fetching of function outputs ("val": mean_salary($person)), as these require function-body evaluation under the hood - this is not yet implemented.

Negation

We add support for negations in match queries:

match $x isa person, has name $a; not { $a == "Jeff"; };

Note: unlike in 2.x, TypeDB 3.x supports nesting negations, which leads to a form of "for all" operator:

match
    $super isa set;
    $sub isa set;
    not { # all elements of $sub are also elements of $super:
        (item: $element, set: $sub) isa set-membership;  # there is no element in $sub...
        not { (item: $element, set: $super) isa set-membership; }; # ... that isn't also an element of $super
    };

Add support for ISO timezones; finalise datetime & duration support
3.0 Define and Redefine
We add define and the first version of redefine queries to TypeDB 3.0, excluding functions and aliases.
Implement query pipelines
We implement query pipelines as two versions: Read and Write pipelines. In this iteration we centralise the IR translation, type annotation, and compilation passes into a single top-level driver package: //query.

We also implement some larger refactors of the BlockContext, introducing new TranslationContext, PipelineContext, and VariableRegistry types. The //compiler package is now split into //compiler/match|insert|delete, with the intent that each one has an entry point to compile() a block of TypeQL IR into a Program for each type, along with possibly a set of type inference or type checkers to go with each type of program to be compiled.

Note: lots of refactoring still to follow, merging early to unblock waiting work.

Code Refactors

Partition data into Keyspaces and configure RocksDB

We split the data in TypeDB into 5 RocksDB databases (internally known as Keyspaces).

In addition, we configure RocksDB caches, index and filter block pinning, bloom filters, compression at various levels, and prefix extractors.
Optimize statistics update routine

We improve the performance of Statistics::may_synchronise by reducing the amount of storage accesses
- use the reinsert flag rather than known_to_exist, as the former takes into account the state of the storage at time of commit,
- load a context of 8 commits from WAL to increase the chances concurrent writes can be resolved without checking storage,
- check loaded commit data before accessing storage in case there is enough information to determine the delta.
Fix greedy planner to sort on correct vars

Currently, the greedy planner picks a plan in form of an ordering of constraints and variables. An added constraint produces variables if it relates those variables and they haven't been added to the plan at an earlier stage.

The greedy planner picks constraints based on their "minimal" cost for a directional lookup, that may sort on either of one or more variables that the constraint relates. However, it does not record which variable the lookup should be sorted on. This means, as it stands, produced variables are added in random order after their constraint to the ordered plan.

This PR aims to ensure that we sort on the variable for the appropriate direction, by adding them first.
Functions have their own parameter registry
Fixes a bug where the functions were passed the parameter registry of the entry pattern.
WAL files start at SequenceNumber::MIN

Sequenced records in the WAL are numbered starting from 1. Unsequenced records use the number of the last sequenced record in the WAL.

A problem arises when an unsequenced record is written before any sequenced records (e.g. server restart after WAL had been deleted). In that case the index of the unsequenced record is forced to be zero. However, the WAL shard file still believes it starts at 1, and when the record header does not match that, the shard is believed to be corrupted.

That is not an issue when sharding happens during an unsequenced write, as the new shard filename uses the record sequence number, not the next expected sequence number. That does however mean that during a scan through the WAL, if a shard file starts at the desired sequence number, the previous shard must be scanned as well. (This is already the implemented behaviour.)
Extensible user credential representation in system database
Create an extensible representation of user credential, so that it can be extended by concrete types down the line. Currently, password is the only concrete type, but later can be extended into token, SSH keypair, and so on.
Add more typing information into key prefixes to optimise storage lookups

We incorporate more typing information into the storage lookup operations, making them more longer and specific and reducing the range of data that must be retrieved for any traversal operation. This should make future RocksDB bloom filters more effective, and reduce the amount of data that is read even when bloom filters are not available for the prefix. Overall, we expect this to allow TypeDB to scale better, particularly on larger-than-memory workloads.
Simplify attribute encoding

We remove separate prefixes for attribute instances and instead introduce a value type prefix after the attribute type encoding.

Encoding before:
```
[String-Attribute-Instance][<type-id>][<value>][<length>]
```
After:
```
[Attribute-Instance][<type-id>][String][<value>][<length>]
```
Exclude console from docker image
Excludes typedb console from docker image distribution
Improved query profile formatting and cleaner beam search

Printing and log tracing
- Improved Display and Debug prints
- Log tracing for query profiler now prints variables (instead of positions)
- Added log tracing for query planning
Query planning
- Support for Relation indexes
- Hash checks during planning
  - trivial permutations of the "same" plan are ignored.
  - "same" means produces the same variables with the same cost
- Make planner more deterministic
  - trivial plan elements are now decided upon determinstically, freeing the planner to consider only the more "heavy-weight" choices.
- Add heuristic weighting that penalizing larger intermediate results more than before, reward producing results early
- Various bug fixes (joins in lowering)
Add server address CLI argument
Make server address configurable through command line argument.
Iterator pools

Optimise storage-layer iterator creation and deletion using iterator pools, one per transaction. This helps improve benchmark performance by 2-8%.
Guarantee transaction causality

Under concurrent write workloads, causality can appear to be violated. For example:
```
- Open tx1, tx2
- start tx1 commit (validation begins)
- start tx2 commit (validation begins and ends, no conflict with tx1!)
- open tx3
- end tx1 commit (validation finishes)
```
When we open tx3, we end up with a snapshot that is actually from before tx1, even though tx2 has committed - because we don't know the status of tx1 yet, and in our current simplified model, transaction are assigned a linear commit order decided at WAL write time, the read watermark remains before tx1 until it finishes validating.

In this scenario, a client that commits a transaction can actually end up opening the next transaction snapshot before the last commit that successfully returned.

After this change, when opening a transaction, TypeDB wait until the currently pending transactions all finish, guaranteeing we see the latest possible data version. The assumption is that 1) validation in general is a small amount of time and 2) is fully concurrent, so the wait time should be very small, and only occur under large concurrently committing transactions.
Frugal type seeder
Optimises parts of the type seeder to avoid unnecessary effort.

Disable variables of category value being narrow-able to attribute
Disable variables of category value being narrowed to attribute

$name = "some name"; $person has name == $name; # Fine
$name == "some name"; $person has name $name; # Also Fine
$name = "some name"; $person has name $name; # Disallowed

Commit generated Cargo.tomls + Cargo.lock

We commit the generated cargo manifests so that TypeDB can be built as a normal cargo project without using Bazel.

Integration and unit tests can be run normally, and rust-analyzer behaves correctly. Behaviour tests currently cannot be run without Bazel, as the tests expect to find the feature files in bazel-typedb/external/typedb_behaviour.

In addition, if during development the developer needs to depend on a local clone of a dependency (e.g. making parallel changes to typeql), the Cargo.toml would need to be temporarily manually adjusted to point to the path dependency.
Fix disjunction inputs when lowering
The disjunction compiler explicitly accepts a list of variables which are bound when the disjunction is evaluated. This also fixes a bug where input variables used in downstream steps would not be copied over.
Fixes early termination of type-inference pruning when disjunctions change internally
Fixes early termination of type-inference pruning when disjunctions change internally
Add database name validation for database creation
We add validation of names for created databases. Now, all database names should be valid TypeQL identifiers.
Fetch annotation, compilation, and executables

We implement Fetch annotation, compilation and executable building, rearchitecting the rest of the compiler to allow for tree-shaped nesting of queries (eg. functions or fetch sub-pipelines).
Function compilation
Fill in compilation for functions.
Fetch iii

We implement further refactoring, which pull Fetch into Annotations and Executables, without implementing any sub-methods yet.
Refactor pipeline annotations

We implement the next step of Fetch implementation, which allows us to embed Annotated pipelines into Fetch sub-queries and into functions. We comment out the code paths related to type inference for functions, since functions are now enhanced to include pipelines.
Fetch part I

We implement the initial architecture and refactoring required for the Fetch implementation, including new IR data structures and translation steps.
Require implementation

Implement the 'require' clause:
```
match
...
require $x, $y, $z;
```
Will filter the match output stream to ensure that the variable $x, $y, $z are all non-empty variables (if they are optional).
Added aggregate returns

Let match stages accept inputs

We now support match stages in the middle of a pipeline, which enables queries like this:

insert
    $org isa organisation;
match
    $p isa person, has age 10;
insert
    (group: $org, member: $p) isa membership;

TypeDB 3 - Specification

Added specification of the TypeDB 3.0 database system behaviour.

Fix within-transaction query concurrency

We update the in-transaction query behaviour to result in a predictable and well-defined behaviour. In any applicable transaction, the following rules hold:

Read queries are able to run concurrency and lazily, without limitation
Write queries always execute eagerly, but answers are sent back lazily
Schema queries and write queries interrupt all running read queries and finished write queries that are lazily sending answers

As a user, we see the following - reads after writes or schema queries are just fine:

let transaction = transaction(...);
let writes_iter = transaction.query("insert $x isa person;").await.unwrap().into_rows();
let reads_iter = transaction.query("match $x isa person;").await.unwrap().into_rows();

// both are readable:
writes_iter.next().await.unwrap();
reads_iter.next().await.unwrap();

In the other order, we will get an interrupt:

let transaction = transaction(...);
let reads_iter = transaction.query("match $x isa person;").await.unwrap().into_rows();
let writes_iter = transaction.query("insert $x isa person;").await.unwrap().into_rows();

// only writes are still available
writes_iter.next().await.unwrap();
assert!(reads_iter.next().await.is_err());

Include console in 3.0 distribution
Includes console in 3.0 distribution
Introduce reduce stages for pipelines
Introduce reduce stages for pipelines to enable grouped aggregates.
3.0 Add query type to the query rows header. Implement connection/database bdd steps
We update the server to server the updated protocol and return the type of the executed query as a part of the query row stream answer.

We also fix the server's database manipulation code and implement additional bdd steps to pass the connection/database bdd tests.
Rework constraints and capabilities
Our old approach to annotations and capabilities wasn't correct. We considered types and capabilities inheriting annotations from their supertypes and overridden capabilities, drawing a parallel between subtyping and overriding. However, none of these considerations were correct:
- Thinking of annotations as constraints for types creates a conceptual mess, where one annotation can play the role of two other annotations (@key as @unique and @card(1..1)), where we can't understand what a default cardinality is, where we struggled to set default annotations, and where we have to introduce strange validation methods like "inheritable_together", "inheritable_below", etc.
- In reality, capabilities should not inherit anything from their overrides. They should simply comply with the constraints of the capabilities of their super interface types, and you don't need to override anything. This means that the "inheritance" of annotations/constraints for capabilities depends on each specific object type, depending on whether it has a capability for super interface type or not.
This PR addresses this general issue, additionally refactoring most of the existing validations in Concept API.
Implement grpc Database endpoints
We implement the GRPC endpoints for database manipulation, and convert two more relevant errors into typedb_error!()s.
Finish GRPC transaction service and migrate more error types

Finish implementing the transaction service, including transaction commit/close/rollback, single write query/many read query execution, and interrupt mechanisms.

We also migrate several error message types from being the previous impl Error variants to the typedb_error! macro.
Type statement planning & execution
Clean up duplicate test setup and implement GRPC Query streaming pt 1

Refactor tests that were all redeclaring storage and manager setup and loading into various reusable <package>/tests/test_utils_<package>.rs.

We also extend the implementation of the GRPC service for concept answers, and implement the query streaming architecture:
1. spawn a dedicated thread to execute a query, which writes messages to submit into a Blocking MPSC queue with default capabity 32 (or whatever the user's transaction query prefect configuration is).
2. spawn a task to take messages from the MPSC queue and insert them into the GRPC output sender (which is another MPSC queue). This is meant to submit a batch of messages at a time, and next time the stream is polled, another tokio task is created to submit the next batch.
3. Precompute size is controlled by the size of the blocking MPSC queue.

Add TypeQLMayError for query BDDs
We add TypeQLMayError, a specification of MayError for query BDDs. It separates expected failures to Logic and Parsing.

Firstly, it's needed for better 3.0 tests rewrite, so we don't miss any incorrectly translated queries.
Secondly, the absence of at least this kind of failure reason specification already caused troubles in 3.x: for example, there is a 2.x test in define.feature, which fails because its typeql statement is not correct, but we expect it to fail for a logical reason (not sub):

  Scenario: defining a less strict annotation on an inherited ownership throws
    Then typeql define; throws exception
      """
      define child, owns email @unique;
      """

This can be easily changed to specific error code checks in the future.

Usage example:

#[apply(generic_step)]
#[step(expr = r"typeql define{typeql_may_error}")]
async fn typeql_define(context: &mut Context, may_error: params::TypeQLMayError, step: &Step) {
    let query_parsed = typeql::parse_query(step.docstring.as_ref().unwrap().as_str());
    if may_error.check_parsing(&query_parsed).is_some() { // If we expect the error, it is unwrapped
        return;
    }

    let typeql_define = query_parsed.unwrap().into_schema();
    with_schema_tx!(context, |tx| {
        let result = QueryManager::new().execute_schema(
            Arc::get_mut(&mut tx.snapshot).unwrap(),
            &tx.type_manager,
            &tx.thing_manager,
            typeql_define,
        );
        may_error.check_logic(&result);
    });
}

Partially implement GRPC services & standardise error infrastructure
We flesh out the architecture of the new 3.0 Protocol, the Transaction service, database transaction exclusivity between write and schema transactions, and introduce the TypeDBError enum which exposes the APIs required to generate stack traces and error messages.

We also implement full database cleanup on deletion, which has dramatically sped up the runtime of BDD tests!
Add data (thing) validation
We add data validation for schema migration (changes of the schema) and data modifications.

Moreover, we add a number of additional operation time validations of data for schema migration which only existed for transaction commits, aiming to maximize the synchronization of schema and data of a database. There is also a change in the cleanup behavior for relations and attributes: now, nothing is hidden for queries until the commit time, considering that we delete and hide data only on the commit time.

We refactor locking for things in thing_manager, making sure that not only objects exist when we add edges to them, but also their interfaces and other objects, connected to these objects through inheritance like for @unique and @card constraints. The @card constraint, in its turn, is now checked both in operation time for schema for all operations with a potential effect on data's cardinality and in commit time for data (in case something has been missed in the more complex operation time checks).

Changes of previously existing checks:
1. Validations for types when an attribute type implicitly loses @Independent annotation or a relation type implicitly gains @cascade annotation (which can lead to data deletion) are now throwing errors only if there are existing instances for these types (including subtypes).
2. We accidentally made the create {root} type idempotent, which is incorrect. Now, if you try to create an existing entity type, it is an error.
We also add the whole concept BDD tests package to the CI as it is considered to be complete. From now on, we better aim not to break the existing tests too much with new merges.
Isa instruction executors
Refactor errors & struct names in Function, IR & Compile packages
Refactor errors & struct names in Function, IR & Compile packages
Fix type-seeder from label constraint for scoped labels
Implement inserts
Implements the translation, compilation & execution of insert & delete clauses.
Rename roleplayer edge to links + plan

We rename the role player edge to be referred to as a "links" edge, to avoid confusion with the RolePlayer struct which refers to a player in a relation bundled with the role type tag.
Introduce IR for schema constraints
Introduce Owns, Relates & Plays constraints into IR & implement type-inference traits for them.
Include type annotations into compiled instructions
Partially implement define queries
Implement basic functionality of define queries.
RolePlayer edge executors
Add TypeCache to Database; implement statistics update
Implement expression IR & evaluation
Implements translation from TypeQL, IR & evaluation of expressions.
Initial Define query

We implement a basic define query execution and fix a couple of Isa executor bugs, using the latest TypeQL.
Implement simple HasReverse test and fix bugs
Purge root types

entity, relation, and attribute are promoted to keywords (á la struct) and no longer have corresponding root types. That means that they are no longer a part of the type hierarchy.
HasReverse executor and refactoring

Implement Has reverse executor, refactor the Has executor into methods for reuse. We also implement higher-kinded traits for Things, and refactor the get_instances/get_instances_in set of APIs in the ThingManager.
Tuple based iterators

We rearchitect the traversal architecture to allow more extensibility, reuse, and speed of development. Critically, we standardise all the specific instruction iterators (HasIterator...) into a new intermediate Tuple representation in Tuple iterators. This allows reuse and composition of code to advance, skip forward, count, and record answers from Tuple iterators.

We also rename Position to VariablePosition, and simplify the translators from TypeQL into IR.

Gherkin parser override: create feature per scenario

Workaround for cucumber-rs/cucumber#331 (cucumber-rs/cucumber#331). The wrapper creates a new feature for each scenario, which sidesteps the runner issue.

On //tests/behaviour/concept/type:test_owns_annotations:

Before:

[Summary]
1 feature
702 scenarios (702 passed)
41957 steps (41957 passed)
test test ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 2013.41s

After:

[Summary]
702 features
702 scenarios (702 passed)
41957 steps (41957 passed)
test test ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 165.86s

Finish database reset function to speed up behaviour tests
We add connection reset database BDD command, which works just like create database if the database is absent and resets the database (removing all the data from it, making it look like a new database).

This function is expected to be used in BDDs to speed up its execution, especially while the cucumber crate works quite slowly with huge files. It is important to note that this function should be used in:
- production (at least with the current non-thread-friendly implementation: can we somehow hide it?)
- connection/database BDD tests
Traversal architecture for enumerated, counted, checked, and bound vars

We implement new traversal behaviour: we enumerate all answers for named variables from a match, non-distinctly. However, when we encounter anonymous variables, we treat these as existence checks that should not be enumerated.

These requirements translate into four types of variable modes to be handled: enumerated, counted, checked, and bound.
1. Enumerated: these are 'selected' variables for which all answers must be emitted from a step. These could be anonymous variables, introduced by the compilation step as intermediates and used in further steps, or user-written named variables that should be outputted.
  Example:
```
match $x has name $a; select $x, $a;
 // -->
1 x [ p1, n1 ]
1 x [ p1, n2 ]
```
1. Counted: these are user-named, but not selected variables that can be not-emitted as an optimisation and multiplied into the anwer Multiplicity instead.
  Example:
```
match $x has name $a; select $x;
// -->
2 x [ p1 ]
```
1. Checked: These are anonymous variables, user-written, which should not impact the output Multiplicity or quantity:
```
match $x has name $_;
// -->
1 x  [ p1) ]
```
Like this, we can consider introduce anonymous variables freely without worrying about affecting the user's output:
```
match $x isa person;
->
match $x isa $_0; $_0 label person;
```
1. Bound: these variables are bound from previous computation.
The overall execution algorithm is now:
1. Each step produces a batch, which a Step takes in as input from the previous Step.
2. In a step, one or more instruction iterators/checks agree on one output Row
3. Once a row is agreed, we must move the iterators forward. Here, we behave differently depending what kind of mode each Variable is in - sometimes we have to compute row multiplicity, and sometimes we have to check for a value and skip forward.
4. We record the row into the output batch and are ready to go back to step 2
Implement function manager & type inference
Implements just enough to store & retrieve functions; compile a query with a preamble function and run type-inference on it.
Implement database reset, which deletes all storage keys, ID allocators, WAL, IsolationManager... back to a fully empty state
Implement basic database delete
Schema commit time validations
We add schema commit time validations to ensure that the schema is fully correct after a series of operations. These checks mostly concentrate on the things that are ignored in the operation time validations (to let users modify their schemas in an easier manner), like:
- verify that no types/edges conflict with their supertypes/overridden edges declarations (we usually don't check subtypes when we change a type for operation time validations, with minor exceptions) e.g. annotations and ordering
- verify that capabilities are still correctly overridden after a series of subtyping moves (e.g. we can only override the supertype interface type)
- verify that cardinality is correct for all levels of inheritance (see details)
We add new annotations: @values and @range, and refactor the usage of cardinality on the concept level.
Now, when you call for get_cardinality of a Capabiity (owns, plays, relates), you get the capability's constraint, which is either an explicitly set annotation or a default value. Thus, all capabilities have cardinality constraints, and these constraints can be modified through setting @card annotations.

We also change parsing of annotations to align with the design (x..y instead of x, y). Additionally, most of the checks for fails in tests now don't accept ConceptReadErrors so as not to miss typos and serious inner issues of the system!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeDB 3.0.0

Announcement

Top 10 TypeDB 3.0 Features

Aggregated release note ---------------

New Features

User authentication

System database

Network encryption

Diagnostics

Configuration

Development mode

Diagnostics

Server shutdown

Code Refactors

Printing and log tracing

Query planning

Contributors