Download from TypeDB Package Repository:
Pull the Docker image:
docker pull typedb/typedb:3.0.0
Announcement
This last year of rewriting TypeDB 2.x into TypeDB 3.0 has been an adventure of learning, iterating, and lots of fun for our whole team.
We're thrilled to say we've achieved our first tranche of goals:
- Initial testing shows TypeDB 3.0's performance is comparable-to or surpasses MongoDB in transactional workloads (a 3-5x performance gain over TypeDB 2.x!)
- TypeDB 3.0 leverages its new Rust codebase to greatly increase correctness, reduce memory footprint, and improve performance
- It features a simplified architecture that opens the door for adding new features and optimisations quickly
Not only that, but we've upgraded TypeQL to version 3.0 to directly address long-standing requests:
- Rules have been replaced with Functions: functions are more flexible, easier to reason about, and much more familiar to programmers. Functions are like subqueries you can re-use and invoke whenever you want. We think you'll love them!
- Query pipelining: combined with more powerful expressions and functions, you can now build read, write, or transformation pipelines that do everything you need without coming back to the client. This is a triple threat: more readable, more maintainable, and more performant.
- Data constraints: we welcome the arrival of Cardiniality, value Range, and value Enumeration restrictions, among others. These have been our top requested features for over a year and now they're at your fingertips! TypeDB will now automatically validate that your data has the exact connectivity and shape you require.
There's so much more than this, and we hope you'll dig into TypeDB 3.0 to try it out.
Top 10 TypeDB 3.0 Features
-
Goodbye sessions, hello transactions: TypeDB 3.0 eliminates sessions and simplifies your interactions to use 3 types of transactions. Use
read
,write
, orschema
transactions when reading data, modifying data, or exclusively modifying schema & data, respectively. Transactions are still ACID up to snapshot isolation, and support concurrent reads and writes. -
Standard return types - Rows or Documents: TypeDB 3.0 simplifies the answers to be either Rows or Documents. Rows will feel familiar to SQL users, though still contain our traditional Concept data types. Documents can be structured and build using the enhanced
fetch
clause. -
Enhanced schema language: We've streamlined define, undefine, and introduce
redefine
to modify schemas - while restructuring the definition language to be shorter, more consistent, and more understandible:
TypeDB 2.x:
define
person sub entity,
owns age,
plays friendship:friend;
friendship sub relation,
relates friend;
age sub attribute, value string;
abstract-friendship sub relation,
relates friend, abstract;
undefine
abstract-friendship relates friend, abstract; # does this undefine the `abstract`, the `friend` role, or the `abstract-friendship`?
TypeDB 3.x:
define
entity person,
owns age,
plays friendship:friend;
relation friendship,
relates friend;
attribute age, value string;
relation abstract-friendship,
relates friend @abstract;
entity child, owns age;
entity dog;
undefine
@abstract from abstract-friendship relates friend; # clearly removes @abstract
owns age from child; # clearly removes only the age ownership from the child type
dog; # clearly undefines the dog type
- Query pipelines: Pipelines are best illustrated. Let's say we want to assign a tax credit for a person called Bill, corresponding to how many children he has.
In TypeDB 2.x, we have to split this into multiple queries that do redundant work, and perform multiple network round trips. In addition, we have to split some of our logic between the database and the application, which damages maintainability!
# count how many children Bill has
children = tx.query().get_aggregate('match $p isa person, has email "[email protected]"; (child: $child, parent: $p) isa parentship; get $child; sum;').resolve()
# compute tax credit
tax_credit = 1000*children;
# assign tax credit
tx.query().insert(f'match $p isa person, has email "[email protected]"; insert $p has tax_credit {tax_credit};')
In TypeDB 3.0, this is streamlined into one query pipeline:
tx.query("""
match $p isa person, has email "[email protected]"; parentship (child: $child, parent: $p);
reduce $count = count($child) within $p;
match let $tax_credit = 1000 * $count;
insert $p has tax_credit == $tax_credit;
""")
This is a simple pipeline: we could continue to chain operations, such as inserts and deletes, to build complex transformations.
-
Reduce and aggregation operations: As the previous example illustrates, what used to be a
get; sum;
operation is now areduce
clause! This is a much more expressive way of aggregating values and allows using them in future operations. -
Enhanced Fetch: fetch clauses now are structured exactly how they will return JSON objects:
match
$p isa person, has email $email, has age $age; $email == "[email protected]";
fetch {
"email": $email,
"tax-identifier": $p.tax_id,
"names": [ $p.name ],
"age_next_year": $age + 1,
"total_salary": (
match
$p has salary $salary;
return sum($salary);
),
"children_ages": [
match
parentship (child: $child, parent: $p);
fetch {
"age": $child.age,
};
],
"all_attributes": { $p.* }
};
This will return a stream of JSON documents that look like this:
{
"email": "[email protected]",
"tax-identifier": "123-45-6789",
"names": [ "bill", "billstone" ],
"age_next_year": 51,
"total_salary": 50000,
"children_ages": [
{ "age": 10 },
{ "age": 13 },
],
"all_attributes": {
"email": "[email protected]",
"age": 50,
"tax_id": "123-45-6789",
"names" [ "bill", "billstone" ],
"salary": [ 10000, 40000 ],
}
};
{
"email": "[email protected]",
"tax-identifier": "123-45-6789",
"names": [
"bill",
"billstone"
],
"age_next_year": 51,
"total_salary": 50000
"children_ages": [
{ "age": 10 },
{ "age": 12 }
],
"all_attributes": {
"age": 50,
"email": "[email protected]",
"name": [
"bill",
"billstone"
],
"salary": [
10000,
40000
],
"tax_id": "123-45-6789"
},
}
- Functions: functions take a set of arguments, and return either a stream or a tuple of concepts. Functions can contain any read pipeline!
define
fun mean_salary($p: person) -> double:
match $p has salary $salary;
return mean($salary);
- Let-bindings and inline expressions: expressions are now bound to a new variable (using the same $ syntax - no more ? variables!) using the 'let' keyword. If you don't need to name your result, you can inline expressions:
match
$p has salary $annual_salary;
let $monthly_salary = $annual_salary / 12;
parentship (parent: $p, child: $child);
$child has age > (10 + 8);
- Data constraints: TypeDB 3.0 ships with built-in controls for cardinality, values, ranges, and abstractness, best shown by example:
define
entity person,
owns email, # NEW: unless specified, ownerships have a default cardinality of @card(0..1) (exactly 0 or 1 attributes)
owns name @card(0..), # specifically relaxed cardinality of 0 to infinity
owns tax_id @card(1, 1), # require exactly 1 tax_id to be owned
owns age,
owns gender;
plays parentship:parent, # NEW: unless specified, played roles have an implicit @card(0..) (any number of connections allowed)
plays parentship:child;
relation parentship,
relates child, # NEW: unless specified, relation roles have a default cardinality of @card(0..1) (exactly 0 or 1 of the role)
relates parent @card(1..2); # specialised relation roles to require 1 or 2 parents for each parentship relation
attribute age, value integer @range(0..); # require all ages to be of values equal to or greater than 0
attribute gender, value string @values("male", "female", "other"); # restrict the domain of values to an exact set
- New built-in value types: TypeDB 3.0 ships with a larger set of value types:
integer
(renamed fromlong
in 2.x)double
boolean
string
- NEW:
date
, representing a date, without a time datetime
- NEW:
datetime-tz
, representing a date time with timezone - NEW:
duration
- NEW: Fixed-Decimal numbers
Aggregated release note ---------------
New Features
-
User authentication and network encryption
User authentication
Bring user authentication and user management functionalities into TypeDB Core through the following changes:
- Implement user management functionality, handling user creation, retrieval, update, and deletion.
- Implement authentication protocol - TypeDB Core drivers are now required to supply credentials when making a connection
System database
Implement a system database, a special database that the server will use to store various systems-related information. The primary motivation is to store database server users for authentication purpose, but in the future can be extended to store other system-related information.
Network encryption
Bring network (a.k.a. in-flight encryption), which can be configured through the CLI by the user, supporting all possible scenarios:
- Unencrypted mode, where communications are performed through plain text
- Encrypted mode, where communications are encrypted using TLS certificate. Additionally, a custom CA certificate may be specified if the certificate was not generated by "well-known" certificate authority such as LetsEncrypt, CloudFlare.
-
Propagate TypeQL syntax updates
We mirror the changes from typedb/typeql#383, which is composed of 3 changes:
- We now require that all assignments are preceded with a
let
keyword:
match $p isa person, has monthly-salary $monthly; let $annual_salary = $monthly * 12;
- We rename
long
tointeger
, to use a more familiar term for integer values (long
is very c-like!)
define attribute age, value integer;
This is still defined as a 64-bit signed integer in TypeDB's backend.
- We make a consistent constructor syntax for instances:
insert $p isa person; # create entity $f isa friendship (friend: $p); # create relation with players $a isa age 20; # create attribute with value
Previously, we would write this as:
insert $p isa person; $f (friend: $p) isa friendship; $a isa age; $a == 20;
Note that the anonymous relation constructor is now allowed like this:
insert friendship (friend: $p) ;
instead of:
insert (friend: $p) isa friendship;
- We now require that all assignments are preceded with a
-
Relation index
We implement the application of the relation index, which is an optimised storage-layer index that allows skipping through a relation from one player to another player, avoiding one lookup operation and giving the query planner the choice to produce different sorted intersection points. This change could improve performance on traversals across relations by up to 50%.
-
Beam planner
We replace the greedy planner with a simple beam search, with beam width dependent on the size of the query graph, up to a maximum of 128 partial plan candidates. This allows us to properly leverage intersection capabilities of the executor.
We also simplify the individual planner vertex cost calculations.
-
IID constraint
We add support for IID constraints in
match
clauses.Example:
match $x iid 0x1E001A0000000012345678;
-
Introduce naive retries for suspended function calls
Introduces retries for suspended function calls.
Functions may suspend to break cycles of recursion. Restoring suspend points is essential to completeness of recursive functions, and thus correctness of negations which call them. -
Is constraint
We implement the
is
constraint support in queries:match $x isa company; $y isa person; $z isa person; not { $y is $z; }; # <--- $r1 isa employment, links ($x, $y); $r2 isa employment, links ($x, $z); select $x, $y, $z;
-
Implement tabling machinery for cyclic functions
Introduces machinery needed to support finding the fixed-point of cyclic function calls. Cyclic functions now run, and return an incomplete set of results followed by an error. It is possible that the planner chooses a plan. -
Disjunction support
We introduce support for disjunctions in queries:
match $person isa person; { $person has name $_; } or { $person has age $_; };
-
Introduce periodic fsync to guarantee commits are persisted to disk
Commits guarantee that changes have been persisted to the disk (usingfsync
) before changes are visible and the commit is acknowledged. The sync is done periodically and committing threads must wait for a signal that the sync is complete. This avoids the performance penalty of each commit doing its own sync, while providing guarantees that committed data is not lost in the event of an OS crash. -
Introduce 3.0 server diagnostics
Diagnostics
We introduce a diagnostics package to collect metrics related to server usage. There are multiple potential clients of this package, a couple of them being initially implemented:
- A web endpoint for pulling data (e.g. Prometheus) is bound at port 4104 by default and exposes diagnostics formatted for prometheus (
http://localhost:4104/diagnostics?format=prometheus
) or as JSON (http://localhost:4104/diagnostics?format=JSON
) (done); - A push http client for sending diagnostics data to the TypeDB Diagnostics Service (done);
- PostHog (will be done in a near future).
If the reporting is turned off, we report a single batch of minimal diagnostics information after 1 hour of runtime just to mark that a server has been booted up. No user information is shared.
If the reporting is turned on, the reporting happens every hour following the rules set in 2.x.
If the development mode is active (details can be found below), no reporting is executed, and the reporting config flag is ignored.Configuration
We add new CLI flags and for configuration:
--diagnostics.monitoring.port 4014
for configuring the monitoring server's port;--diagnostics.monitoring.enable false
for disabling and enabling the monitoring server (true by default, thus calledenable
and requires a boolean value);--diagnostics.reporting.metrics false
for disabling and enabling the collected load and usage metrics' reporting (true by default, ignored in the development mode).
An additional switch for the development mode has also been added in order to use TypeDB's release binaries in CI.
Development mode
We reintroduce the development mode. As in 2.x, this mode is a default mode for local bazel (& cargo) builds and snapshots. It's turned off in published releases but can be explicitly switched on through CLI, as mentioned in the
Configuration
section.
WARNING: the development mode is used by the TypeDB's developers and is not intended to be used in other programs. Be aware that its usage can affect the server's performance and stability. - A web endpoint for pulling data (e.g. Prometheus) is bound at port 4104 by default and exposes diagnostics formatted for prometheus (
-
Introduce PostHog diagnostics and correct server shutdown
Diagnostics
We add PostHog diagnostics events reporting for gathering usage data about users' journeys through TypeDB and enhance the user experience after connecting it with other metrics collected in PostHog.
The reporting configuration remains the same: the--diagnostics.reporting.metrics
flag controls both the old (Service diagnostics) and the new (PostHog) reporting, letting a user disable everything in one action.There are two groups of events sent:
- server usage: hourly heartbeats with deployment/server IDs and the version of the server, can optionally contain information about actions performed without connection to a specific database (opened connection, created user, etc.);
- database usage: optional event (can be not sent if there is nothing to send), similar to the optional server usage action metrics, but regarding a specific database (created database, opened transaction, executed query).
Server shutdown
Now, the server is correctly shut down when using
CTRL-C
signal, awaiting for all the background tasks and printing pretty and informative messages:Running TypeDB CE 3.0.0-alpha-10. Ready! ^C Received a CTRL-C signal. Shutting down... Exited.
Additionally, we add a feature of sending diagnostics (both types) on server shutdown (if enabled).
-
Query execution analyser
We implement a useful debugging feature - a query analyzer. This is similar to Postgres's
Explain Analyze
, which produces both the query plan plus some details about the data that has flowed through the query plan and the time at each step within it.Example output:
Query profile[measurements_enabled=true] ----- Stage or Pattern [id=0] - Match 0. Sorted Iterator Intersection [bound_vars=[], output_size=1, sort_by=p0] [p0 isa ITEM] filter [] with (outputs=p0, ) ==> batches: 158, rows: 10000, micros: 6407 1. Sorted Iterator Intersection [bound_vars=[p0], output_size=2, sort_by=p1] Reverse[p1 rp p0 (role: __$2__)] filter [] with (inputs=p0, outputs=p1, checks=__$2__, ) ==> batches: 854, rows: 39967, micros: 75716 ----- Stage or Pattern [id=1] - Reduce 0. Reduction ==> batches: 1, rows: 10000, micros: 116035 ----- Stage or Pattern [id=2] - Insert 0. Put attribute ==> batches: 10000, rows: 10000, micros: 5890 1. Put has ==> batches: 10000, rows: 10000, micros: 54264
When disabled, profiling is a no-op (no strings are created, locks taken, or times measured), though there is still some cost associated with cloning Arcs containing the profiling data structures around.
To enable query profiling, the easiest way (for now) is to enable TRACE logging level for the
executor
package, currrently configured in//common/logger/logger.rs
:.add_directive(LevelFilter::INFO.into()) // add: // .add_directive("executor=trace".parse().unwrap())
Alternatively, just set the
enable
boolean totrue
in theQueryProfile::new()
constructor. -
Implement query executable cache
We implement the cache (somewhat arbitrarily limited to 100 entries) for compiled executable queries, along with cache maintanance when statistics change significantly or the schema updates.
Query execution without any cache hits still looks like this:
Parsing -> Translation (to intermediate representation) -> Annotation -> Compilation -> Execution
However, with a cache hit, we now have:
Parsing -> Translation ---Cache--> Execution
skipping the annotation and compilation/planning phases, which take significant time.
Note that schema transactions don't have a query executable cache, since keeping the cache in-sync when schema operations run can be error prone.
The query cache is a structural cache, which means it will ignore all Parameters in the query: variable names, constants and values, and fetch document keys. Most production systems run a limited set of query structures, only varying values and terms, making a structural cache like this highly effective!
-
Stabilise fetch and introduce fetch functions
We introduce function invocation infetch
queries. Fetch can call already existing functions:match $p isa person; fetch { "names": [ get_names($p) ], "age": get_age($p) };
and also use local function blocks:
match $p isa person; fetch { "names": [ match $p has name $n; return { $n }; ], "age": ( match $p has age $a; return first $a; ) };
In the examples above, results collected in concept document lists (
"names": [ ... ]
) represent streams (purposely multiple answers), while single (optional) results ("age": ...
) represent a single concept document leaf and do not require (although allow) any wrapping.
Moreover, we stabilize attribute fetching, allowing you to expect a specific concept document structure based on your schema. This way, if the attribute type is owned with default or key cardinality (@card(0..1)
or @card(1..1)
), meaning that there can be at most one attribute of this type, it can be fetched as a single leaf, while other cardinalities force list representation and make your system safe and consistent. For example, a query
match
$p isa person;
fetch {
$p.*
};
can automatically produce a document like
{
"names": [ "Linus", "Torvalds" ],
"age": 54
}
with
define
entity person
owns name @card(1..),
owns age @card(1..1);
Additionally, fetch now returns attributes of all subtypes x sub name; y sub name;
for $p.name
, just like regular match
queries like match $p owns name $n
.
With this, feel free to construct your fetch statements the way you want:
match
$p isa person, has name $name;
fetch {
"info": {
"name": {
"from entity": [ $p.person-name ],
"from var": $name,
},
"optional age": $p.age,
"rating": calculate_rating($p)
}
};
to expect consistent structure of the results:
[
{
"info": {
"name": {
"from entity": [ "Name1", "Surname1" ],
"from var": "Name1"
},
"optional age": 25,
"rating": 19.5
}
},
{
"info": {
"name": {
"from entity": [ "Name1", "Surname1" ],
"from var": "Surname1"
},
"optional age": 25,
"rating": 19.5
}
},
{
"info": {
"name": {
"from entity": [ "Bob" ],
"from var": "Bob"
},
"optional age": null
"rating": 28.783
}
}
]
-
Introduce 3.0 undefine queries
We introduceundefine
queries with the new "targeted" syntax to enable safe and transparent schema concept undefinitions.If you want to undefine the whole type, you can just say:
undefine person;
If you want to undefine a capability from somewhere, use
undefine **capability** from **somewhere**
.
It consistently works withowns
(person
is preserved, only theowns
is undefined):undefine owns name from person;
annotations of
owns
(person owns name
is preserved, only@regex(...)
(you don't need to specify the regex's argument) is undefined):undefine @regex from person owns name;
and any other capability, even specialisation:
undefine as parent from fathership relates father;
Want to undefine multiple concepts in one go? Worry not!
undefine person; relates employee from employment; @regex from name; @values from email;
The error messages in all definition queries (
define
,redefine
, andundefine
) were enhanced, and the respectivequery/language
BDD tests were introduced in the CI withconcept
unit tests.Additional fixes:
- answers of
match
capability queries likematch $x owns $y
,match $x relates $y
, and$match $x plays $y
now include transitive capabilities; define
lets you write multiple declarations of an undefined type, specifying its kind anywhere, even in the last mention of this type;- failures in schema queries for schema transactions no longer lead to freezes of the following opened transactions;
- no more crashes on long string attributes deletion with existing
has
edges;
- answers of
-
Introduce 3.0
value
andas
match constraints
We introducevalue
match constraint for querying for attribute types of specific value types:match $lo value long;
We introduce
as
match constraint for querying for role types specialisation:match $relation relates $role as parentship:child;
which is equivalent to:
match $relation relates $role; $role sub parentship:child;
but is useful for shortening your queries, making it similar to
define
definitions, and in case of anonymous variables:match $relation relates $_ as parentship:child;
-
Basic streaming functions
Implement function execution for non-recursive stream functions which return streams only.
Recursive functions & Non-stream functions will throw unimplemented.These can currently only be used from the preamble. Sample:
with fun get_ages($p_arg: person) -> { age }: match $p_arg has age $age_return; return {$age_return}; match $p isa person; $z in get_ages($p);
-
Expression support
We add expression execution to match queries:
match $person_1 isa person, has age $age_1; $person_2 isa person, has age $age_2; $age_2 == $age_1 + 2;
-
Fetch execution
We implement fetch execution, given an executable pipeline that may contain a Fetch terminal stage.
Note: we have commented out
match-return
subqueries, fetching of expressions ({ "val" : $x + 1 }
), and fetching of function outputs ("val": mean_salary($person)
), as these require function-body evaluation under the hood - this is not yet implemented. -
Negation
We add support for negations in match queries:
match $x isa person, has name $a; not { $a == "Jeff"; };
Note: unlike in 2.x, TypeDB 3.x supports nesting negations, which leads to a form of "for all" operator:
match $super isa set; $sub isa set; not { # all elements of $sub are also elements of $super: (item: $element, set: $sub) isa set-membership; # there is no element in $sub... not { (item: $element, set: $super) isa set-membership; }; # ... that isn't also an element of $super };
-
Add support for ISO timezones; finalise datetime & duration support
-
3.0 Define and Redefine
We adddefine
and the first version ofredefine
queries toTypeDB 3.0
, excluding functions and aliases. -
Implement query pipelines
We implement query pipelines as two versions: Read and Write pipelines. In this iteration we centralise the IR translation, type annotation, and compilation passes into a single top-level driver package://query
.We also implement some larger refactors of the BlockContext, introducing new TranslationContext, PipelineContext, and VariableRegistry types. The
//compiler
package is now split into//compiler/match|insert|delete
, with the intent that each one has an entry point tocompile()
a block of TypeQL IR into aProgram
for each type, along with possibly a set of type inference or type checkers to go with each type of program to be compiled.Note: lots of refactoring still to follow, merging early to unblock waiting work.
Code Refactors
-
Partition data into Keyspaces and configure RocksDB
We split the data in TypeDB into 5 RocksDB databases (internally known as Keyspaces).
In addition, we configure RocksDB caches, index and filter block pinning, bloom filters, compression at various levels, and prefix extractors.
-
Optimize statistics update routine
We improve the performance of
Statistics::may_synchronise
by reducing the amount of storage accesses- use the
reinsert
flag rather thanknown_to_exist
, as the former takes into account the state of the storage at time of commit, - load a context of 8 commits from WAL to increase the chances concurrent writes can be resolved without checking storage,
- check loaded commit data before accessing storage in case there is enough information to determine the delta.
- use the
-
Fix greedy planner to sort on correct vars
Currently, the greedy planner picks a plan in form of an ordering of constraints and variables. An added constraint produces variables if it relates those variables and they haven't been added to the plan at an earlier stage.
The greedy planner picks constraints based on their "minimal" cost for a directional lookup, that may sort on either of one or more variables that the constraint relates. However, it does not record which variable the lookup should be sorted on. This means, as it stands, produced variables are added in random order after their constraint to the ordered plan.
This PR aims to ensure that we sort on the variable for the appropriate direction, by adding them first.
-
Functions have their own parameter registry
Fixes a bug where the functions were passed the parameter registry of the entry pattern. -
WAL files start at SequenceNumber::MIN
Sequenced records in the WAL are numbered starting from 1. Unsequenced records use the number of the last sequenced record in the WAL.
A problem arises when an unsequenced record is written before any sequenced records (e.g. server restart after WAL had been deleted). In that case the index of the unsequenced record is forced to be zero. However, the WAL shard file still believes it starts at 1, and when the record header does not match that, the shard is believed to be corrupted.
That is not an issue when sharding happens during an unsequenced write, as the new shard filename uses the record sequence number, not the next expected sequence number. That does however mean that during a scan through the WAL, if a shard file starts at the desired sequence number, the previous shard must be scanned as well. (This is already the implemented behaviour.)
-
Extensible user credential representation in system database
Create an extensible representation of user credential, so that it can be extended by concrete types down the line. Currently, password is the only concrete type, but later can be extended into token, SSH keypair, and so on. -
Add more typing information into key prefixes to optimise storage lookups
We incorporate more typing information into the storage lookup operations, making them more longer and specific and reducing the range of data that must be retrieved for any traversal operation. This should make future RocksDB bloom filters more effective, and reduce the amount of data that is read even when bloom filters are not available for the prefix. Overall, we expect this to allow TypeDB to scale better, particularly on larger-than-memory workloads.
-
Simplify attribute encoding
We remove separate prefixes for attribute instances and instead introduce a value type prefix after the attribute type encoding.
Encoding before:
[String-Attribute-Instance][<type-id>][<value>][<length>]
After:
[Attribute-Instance][<type-id>][String][<value>][<length>]
-
Exclude console from docker image
Excludes typedb console from docker image distribution -
Improved query profile formatting and cleaner beam search
Printing and log tracing
- Improved Display and Debug prints
- Log tracing for query profiler now prints variables (instead of positions)
- Added log tracing for query planning
Query planning
- Support for Relation indexes
- Hash checks during planning
- trivial permutations of the "same" plan are ignored.
- "same" means produces the same variables with the same cost
- Make planner more deterministic
- trivial plan elements are now decided upon determinstically, freeing the planner to consider only the more "heavy-weight" choices.
- Add heuristic weighting that penalizing larger intermediate results more than before, reward producing results early
- Various bug fixes (joins in lowering)
-
Add server address CLI argument
Make server address configurable through command line argument. -
Iterator pools
Optimise storage-layer iterator creation and deletion using iterator pools, one per transaction. This helps improve benchmark performance by 2-8%.
-
Guarantee transaction causality
Under concurrent write workloads, causality can appear to be violated. For example:
- Open tx1, tx2 - start tx1 commit (validation begins) - start tx2 commit (validation begins and ends, no conflict with tx1!) - open tx3 - end tx1 commit (validation finishes)
When we open
tx3
, we end up with a snapshot that is actually from beforetx1
, even thoughtx2
has committed - because we don't know the status oftx1
yet, and in our current simplified model, transaction are assigned a linear commit order decided at WAL write time, the read watermark remains beforetx1
until it finishes validating.In this scenario, a client that commits a transaction can actually end up opening the next transaction snapshot before the last commit that successfully returned.
After this change, when opening a transaction, TypeDB wait until the currently pending transactions all finish, guaranteeing we see the latest possible data version. The assumption is that 1) validation in general is a small amount of time and 2) is fully concurrent, so the wait time should be very small, and only occur under large concurrently committing transactions.
-
Frugal type seeder
Optimises parts of the type seeder to avoid unnecessary effort. -
Disable variables of category value being narrow-able to attribute
Disable variables of category value being narrowed to attribute$name = "some name"; $person has name == $name; # Fine $name == "some name"; $person has name $name; # Also Fine $name = "some name"; $person has name $name; # Disallowed
-
Commit generated Cargo.tomls + Cargo.lock
We commit the generated cargo manifests so that TypeDB can be built as a normal cargo project without using Bazel.
Integration and unit tests can be run normally, and rust-analyzer behaves correctly. Behaviour tests currently cannot be run without Bazel, as the tests expect to find the feature files in
bazel-typedb/external/typedb_behaviour
.In addition, if during development the developer needs to depend on a local clone of a dependency (e.g. making parallel changes to typeql), the Cargo.toml would need to be temporarily manually adjusted to point to the path dependency.
-
Fix disjunction inputs when lowering
The disjunction compiler explicitly accepts a list of variables which are bound when the disjunction is evaluated. This also fixes a bug where input variables used in downstream steps would not be copied over. -
Fixes early termination of type-inference pruning when disjunctions change internally
Fixes early termination of type-inference pruning when disjunctions change internally -
Add database name validation for database creation
We add validation of names for created databases. Now, all database names should be valid TypeQL identifiers. -
Fetch annotation, compilation, and executables
We implement Fetch annotation, compilation and executable building, rearchitecting the rest of the compiler to allow for tree-shaped nesting of queries (eg. functions or fetch sub-pipelines).
-
Function compilation
Fill in compilation for functions. -
Fetch iii
We implement further refactoring, which pull Fetch into Annotations and Executables, without implementing any sub-methods yet.
-
Refactor pipeline annotations
We implement the next step of Fetch implementation, which allows us to embed Annotated pipelines into Fetch sub-queries and into functions. We comment out the code paths related to type inference for functions, since functions are now enhanced to include pipelines.
-
Fetch part I
We implement the initial architecture and refactoring required for the Fetch implementation, including new IR data structures and translation steps.
-
Require implementation
Implement the 'require' clause:
match ... require $x, $y, $z;
Will filter the match output stream to ensure that the variable
$x, $y, $z
are all non-empty variables (if they are optional). -
Added aggregate returns
-
Let match stages accept inputs
We now support
match
stages in the middle of a pipeline, which enables queries like this:insert $org isa organisation; match $p isa person, has age 10; insert (group: $org, member: $p) isa membership;
-
TypeDB 3 - Specification
Added specification of the TypeDB 3.0 database system behaviour.
-
Fix within-transaction query concurrency
We update the in-transaction query behaviour to result in a predictable and well-defined behaviour. In any applicable transaction, the following rules hold:
- Read queries are able to run concurrency and lazily, without limitation
- Write queries always execute eagerly, but answers are sent back lazily
- Schema queries and write queries interrupt all running read queries and finished write queries that are lazily sending answers
As a user, we see the following - reads after writes or schema queries are just fine:
let transaction = transaction(...); let writes_iter = transaction.query("insert $x isa person;").await.unwrap().into_rows(); let reads_iter = transaction.query("match $x isa person;").await.unwrap().into_rows(); // both are readable: writes_iter.next().await.unwrap(); reads_iter.next().await.unwrap();
In the other order, we will get an interrupt:
let transaction = transaction(...); let reads_iter = transaction.query("match $x isa person;").await.unwrap().into_rows(); let writes_iter = transaction.query("insert $x isa person;").await.unwrap().into_rows(); // only writes are still available writes_iter.next().await.unwrap(); assert!(reads_iter.next().await.is_err());
-
Include console in 3.0 distribution
Includes console in 3.0 distribution -
Introduce reduce stages for pipelines
Introduce reduce stages for pipelines to enable grouped aggregates. -
3.0 Add query type to the query rows header. Implement connection/database bdd steps
We update the server to server the updated protocol and return the type of the executed query as a part of the query row stream answer.We also fix the server's database manipulation code and implement additional bdd steps to pass the
connection/database
bdd tests. -
Rework constraints and capabilities
Our old approach to annotations and capabilities wasn't correct. We considered types and capabilities inheriting annotations from their supertypes and overridden capabilities, drawing a parallel between subtyping and overriding. However, none of these considerations were correct:- Thinking of annotations as constraints for types creates a conceptual mess, where one annotation can play the role of two other annotations (
@key
as@unique
and@card(1..1)
), where we can't understand what a default cardinality is, where we struggled to set default annotations, and where we have to introduce strange validation methods like "inheritable_together", "inheritable_below", etc. - In reality, capabilities should not inherit anything from their overrides. They should simply comply with the constraints of the capabilities of their super interface types, and you don't need to override anything. This means that the "inheritance" of annotations/constraints for capabilities depends on each specific object type, depending on whether it has a capability for super interface type or not.
This PR addresses this general issue, additionally refactoring most of the existing validations in Concept API.
- Thinking of annotations as constraints for types creates a conceptual mess, where one annotation can play the role of two other annotations (
-
Implement grpc Database endpoints
We implement the GRPC endpoints for database manipulation, and convert two more relevant errors intotypedb_error!()
s. -
Finish GRPC transaction service and migrate more error types
Finish implementing the transaction service, including transaction commit/close/rollback, single write query/many read query execution, and interrupt mechanisms.
We also migrate several error message types from being the previous
impl Error
variants to thetypedb_error!
macro. -
Type statement planning & execution
-
Clean up duplicate test setup and implement GRPC Query streaming pt 1
Refactor tests that were all redeclaring storage and manager setup and loading into various reusable
<package>/tests/test_utils_<package>.rs
.We also extend the implementation of the GRPC service for concept answers, and implement the query streaming architecture:
- spawn a dedicated thread to execute a query, which writes messages to submit into a Blocking MPSC queue with default capabity 32 (or whatever the user's transaction query prefect configuration is).
- spawn a task to take messages from the MPSC queue and insert them into the GRPC output sender (which is another MPSC queue). This is meant to submit a batch of messages at a time, and next time the stream is polled, another tokio task is created to submit the next batch.
- Precompute size is controlled by the size of the blocking MPSC queue.
-
Add TypeQLMayError for query BDDs
We addTypeQLMayError
, a specification ofMayError
forquery
BDDs. It separates expected failures toLogic
andParsing
.Firstly, it's needed for better 3.0 tests rewrite, so we don't miss any incorrectly translated queries.
Secondly, the absence of at least this kind of failure reason specification already caused troubles in 3.x: for example, there is a 2.x test indefine.feature
, which fails because itstypeql
statement is not correct, but we expect it to fail for a logical reason (not sub):Scenario: defining a less strict annotation on an inherited ownership throws Then typeql define; throws exception """ define child, owns email @unique; """
This can be easily changed to specific error code checks in the future.
Usage example:
#[apply(generic_step)] #[step(expr = r"typeql define{typeql_may_error}")] async fn typeql_define(context: &mut Context, may_error: params::TypeQLMayError, step: &Step) { let query_parsed = typeql::parse_query(step.docstring.as_ref().unwrap().as_str()); if may_error.check_parsing(&query_parsed).is_some() { // If we expect the error, it is unwrapped return; } let typeql_define = query_parsed.unwrap().into_schema(); with_schema_tx!(context, |tx| { let result = QueryManager::new().execute_schema( Arc::get_mut(&mut tx.snapshot).unwrap(), &tx.type_manager, &tx.thing_manager, typeql_define, ); may_error.check_logic(&result); }); }
-
Partially implement GRPC services & standardise error infrastructure
We flesh out the architecture of the new 3.0 Protocol, the Transaction service, database transaction exclusivity between write and schema transactions, and introduce the TypeDBError enum which exposes the APIs required to generate stack traces and error messages.We also implement full database cleanup on deletion, which has dramatically sped up the runtime of BDD tests!
-
Add data (thing) validation
We add data validation for schema migration (changes of the schema) and data modifications.Moreover, we add a number of additional operation time validations of data for schema migration which only existed for transaction commits, aiming to maximize the synchronization of schema and data of a database. There is also a change in the cleanup behavior for relations and attributes: now, nothing is hidden for queries until the commit time, considering that we delete and hide data only on the commit time.
We refactor locking for things in
thing_manager
, making sure that not only objects exist when we add edges to them, but also their interfaces and other objects, connected to these objects through inheritance like for@unique
and@card
constraints. The@card
constraint, in its turn, is now checked both in operation time for schema for all operations with a potential effect on data's cardinality and in commit time for data (in case something has been missed in the more complex operation time checks).Changes of previously existing checks:
- Validations for types when an attribute type implicitly loses @Independent annotation or a relation type implicitly gains @cascade annotation (which can lead to data deletion) are now throwing errors only if there are existing instances for these types (including subtypes).
- We accidentally made the
create {root} type
idempotent, which is incorrect. Now, if you try to create an existing entity type, it is an error.
We also add the whole
concept
BDD tests package to the CI as it is considered to be complete. From now on, we better aim not to break the existing tests too much with new merges. -
Isa instruction executors
-
Refactor errors & struct names in Function, IR & Compile packages
Refactor errors & struct names in Function, IR & Compile packages -
Fix type-seeder from label constraint for scoped labels
-
Implement inserts
Implements the translation, compilation & execution of insert & delete clauses. -
Rename roleplayer edge to links + plan
We rename the role player edge to be referred to as a "links" edge, to avoid confusion with the
RolePlayer
struct which refers to a player in a relation bundled with the role type tag. -
Introduce IR for schema constraints
IntroduceOwns
,Relates
&Plays
constraints into IR & implement type-inference traits for them. -
Include type annotations into compiled instructions
-
Partially implement define queries
Implement basic functionality ofdefine
queries. -
RolePlayer edge executors
-
Add TypeCache to Database; implement statistics update
-
Implement expression IR & evaluation
Implements translation from TypeQL, IR & evaluation of expressions. -
Initial Define query
We implement a basic define query execution and fix a couple of Isa executor bugs, using the latest TypeQL.
-
Implement simple HasReverse test and fix bugs
-
Purge root types
entity
,relation
, andattribute
are promoted to keywords (á lastruct
) and no longer have corresponding root types. That means that they are no longer a part of the type hierarchy. -
HasReverse executor and refactoring
Implement Has reverse executor, refactor the Has executor into methods for reuse. We also implement higher-kinded traits for
Thing
s, and refactor theget_instances/get_instances_in
set of APIs in the ThingManager. -
Tuple based iterators
We rearchitect the traversal architecture to allow more extensibility, reuse, and speed of development. Critically, we standardise all the specific instruction iterators (
HasIterator
...) into a new intermediateTuple
representation in Tuple iterators. This allows reuse and composition of code to advance, skip forward, count, and record answers from Tuple iterators.We also rename
Position
toVariablePosition
, and simplify the translators from TypeQL into IR. -
Gherkin parser override: create feature per scenario
Workaround for cucumber-rs/cucumber#331 (cucumber-rs/cucumber#331). The wrapper creates a new feature for each scenario, which sidesteps the runner issue.
On
//tests/behaviour/concept/type:test_owns_annotations
:Before:
[Summary] 1 feature 702 scenarios (702 passed) 41957 steps (41957 passed) test test ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 2013.41s
After:
[Summary] 702 features 702 scenarios (702 passed) 41957 steps (41957 passed) test test ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 165.86s
-
Finish database reset function to speed up behaviour tests
We addconnection reset database
BDD command, which works just likecreate database
if the database is absent and resets the database (removing all the data from it, making it look like a new database).This function is expected to be used in BDDs to speed up its execution, especially while the
cucumber
crate works quite slowly with huge files. It is important to note that this function should be used in:- production (at least with the current non-thread-friendly implementation: can we somehow hide it?)
connection/database
BDD tests
-
Traversal architecture for enumerated, counted, checked, and bound vars
We implement new traversal behaviour: we enumerate all answers for named variables from a match, non-distinctly. However, when we encounter anonymous variables, we treat these as existence checks that should not be enumerated.
These requirements translate into four types of variable modes to be handled: enumerated, counted, checked, and bound.
- Enumerated: these are 'selected' variables for which all answers must be emitted from a step. These could be anonymous variables, introduced by the compilation step as intermediates and used in further steps, or user-written named variables that should be outputted.
Example:
match $x has name $a; select $x, $a; // --> 1 x [ p1, n1 ] 1 x [ p1, n2 ]
- Counted: these are user-named, but not selected variables that can be not-emitted as an optimisation and multiplied into the anwer Multiplicity instead.
Example:
match $x has name $a; select $x; // --> 2 x [ p1 ]
- Checked: These are anonymous variables, user-written, which should not impact the output Multiplicity or quantity:
match $x has name $_; // --> 1 x [ p1) ]
Like this, we can consider introduce anonymous variables freely without worrying about affecting the user's output:
match $x isa person; -> match $x isa $_0; $_0 label person;
- Bound: these variables are bound from previous computation.
The overall execution algorithm is now:
- Each
step
produces abatch
, which a Step takes in as input from the previous Step. - In a
step
, one or more instruction iterators/checks agree on one outputRow
- Once a row is agreed, we must move the iterators forward. Here, we behave differently depending what kind of mode each Variable is in - sometimes we have to compute row multiplicity, and sometimes we have to check for a value and skip forward.
- We record the row into the output
batch
and are ready to go back to step 2
- Enumerated: these are 'selected' variables for which all answers must be emitted from a step. These could be anonymous variables, introduced by the compilation step as intermediates and used in further steps, or user-written named variables that should be outputted.
-
Implement function manager & type inference
Implements just enough to store & retrieve functions; compile a query with a preamble function and run type-inference on it. -
Implement database reset, which deletes all storage keys, ID allocators, WAL, IsolationManager... back to a fully empty state
-
Implement basic database delete
-
Schema commit time validations
We add schema commit time validations to ensure that the schema is fully correct after a series of operations. These checks mostly concentrate on the things that are ignored in the operation time validations (to let users modify their schemas in an easier manner), like:- verify that no types/edges conflict with their supertypes/overridden edges declarations (we usually don't check subtypes when we change a type for operation time validations, with minor exceptions) e.g. annotations and ordering
- verify that capabilities are still correctly overridden after a series of subtyping moves (e.g. we can only override the supertype interface type)
- verify that cardinality is correct for all levels of inheritance (see details)
We add new annotations:
@values
and@range
, and refactor the usage of cardinality on theconcept
level.
Now, when you call forget_cardinality
of aCapabiity
(owns
,plays
,relates
), you get the capability's constraint, which is either an explicitly set annotation or a default value. Thus, all capabilities havecardinality
constraints, and these constraints can be modified through setting@card
annotations.We also change parsing of annotations to align with the design (
x..y
instead ofx, y
). Additionally, most of the checks forfails
in tests now don't acceptConceptReadErrors
so as not to miss typos and serious inner issues of the system!