Skip to content

Commit

Permalink
Rework CIP
Browse files Browse the repository at this point in the history
- Clear separation between additive and replacing semantics
- Additive semantics for nesting with {}
- Replacing semantics for flat composition
- Use THEN for discard cardinality
- Use WITH|RETURN|YIELD NOTHING for discard fields
  • Loading branch information
boggle committed Oct 16, 2017
1 parent cc176e8 commit 2921112
Showing 1 changed file with 111 additions and 89 deletions.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
= CIP2016-06-22 - Nested subqueries
= CIP2016-06-22 - Nested, updating, and chained subqueries
:numbered:
:toc:
:toc-placement: macro
Expand All @@ -9,7 +9,7 @@
[abstract]
.Abstract
--
This CIP proposes the incorporation of nested subqueries into Cypher.
This CIP proposes the incorporation of nested, updating, and chained subqueries into Cypher.
--

toc::[]
Expand All @@ -21,162 +21,165 @@ Subqueries - i.e. queries within queries - are a powerful and expressive feature

* Increased query expressivity
* Better query construction and readability
* Easier query composition and reuse
* Easier composition of simple query pipelines
* Post-processing results from multiple queries as a single unit
* Performing a sequence of multiple write commands for each record

== Background

This CIP may be viewed in light of the EXISTS CIP, the Scalar Subqueries and List Subqueries CIP, and the Map Projection CIP, all of which propose variants of subqueries.
In contrast, this CIP focusses on subqueries operating at a clause level while the EXISTS CIP and Map Projection CIP propose subqueries operating at an expression level.
This CIP may be viewed in light of CIPs for query combinators and set operations, `EXISTS`, scalar subqueries, and list subqueries.

== Proposal

Nested subqueries are self-contained Cypher queries that are usually run within the scope of an outer Cypher query.
Subqueries are self-contained Cypher queries that are usually run within the scope of an outer, containing Cypher query.

This proposal suggests the introduction of new nested subquery constructs to Cypher.
This proposal suggests the introduction of new subquery constructs to Cypher.

* Read-only nested simple subqueries of the form `{ ... RETURN ... }`
* Read-only nested chained subqueries of the form `THEN { ... RETURN ... }`
* Read-only nested optional subqueries of the form `OPTIONAL { ... RETURN ... }`
* Read-only nested mandatory subqueries of the form `MANDATORY { ... RETURN ... }`
* Read/Write nested simple updating subqueries of the form `DO { ... }` (inner query not ending with `RETURN`)
* Read/Write nested conditionally-updating subqueries of the form `DO [WHEN cond THEN { ... }]+ [ELSE { ... }] END` (inner queries not ending with `RETURN`)
* Read-only nested subqueries
** Read-only nested regular subqueries of the form `MATCH { <reading-query> }`
** Read-only nested optional subqueries of the form `OPTIONAL MATCH { <reading-query> }`
** Read-only nested mandatory subqueries of the form `MANDATORY MATCH { <reading-query> }`
* Read/Write updating subqueries
** Read/Write simple updating subqueries of the form `DO { <updating-query> }` (inner query not ending with `RETURN`)
** Read/Write conditionally-updating subqueries of the form `DO [WHEN <predicate> THEN { <updating-query> }]+ [ELSE { <updating-query> }] END` (inner queries not ending with `RETURN`)
* Chained subqueries
** Chained data-dependent subqueries by extending the `WITH` projection clause that have the form `<query> <with-clause> <query>`. Additionally, this CIP proposes new shorthand syntax for starting a query with `WITH` to compose a query with external inputs.
** Chained data-independent subqueries by introducing the new `THEN` clause for discarding all variables in scope as well as the cardinality of all input records. Additionally, this CIP proposes new shorthand syntax for discarding all variables in scope without discarding the cardinality of input records using `WITH|RETURN|YIELD NOTHING`.

A nested simple subquery consists of an inner query in curly braces.
We additionally propose removing the `FOREACH` clause from the current language (it is rendered obsolete by the introduction of `DO`).

All other nested subquery constructs are introduced with a keyword in conjunction with an inner query in curly braces.
Subquery constructs are always introduced with a keyword(s) in conjunction with an inner query in curly braces.

Nested subqueries may be correlated - i.e. the inner query may use variables from the outer query - or uncorrelated.
Subqueries may be correlated - i.e. the inner query may use variables from the outer query - or uncorrelated.

Nested subqueries can be contained within other nested subqueries at an arbitrary (but finite) depth.
Subqueries can be contained within other subqueries at an arbitrary (but finite) depth.

Read/Write nested subqueries cannot be contained within other read-only nested subqueries.
Read/Write subqueries cannot be contained within other read-only subqueries.

Finally, this CIP proposes new shorthand syntax for starting a query with `WHERE`, along with the ability to specify that no fields are to be returned through the introduction of `WITH -`, `RETURN -`, and `YIELD -`.

=== Read-only nested subqueries

**1. Read-only nested simple subqueries**
Conceptually, a nested subquery is evaluated for each incoming input record and may produce an arbitrary number of output records.

We propose the addition of read-only nested simple subqueries as a new form of read-only Cypher query.
==== Read-only nested regular subqueries

A nested read-only simple subquery is denoted using the following syntax: `{ <inner-query> }`.
We propose the addition of read-only nested regular subqueries as a new form of read-only Cypher query.

The inner query can be any complete read-only Cypher query.
A nested read-only simple subquery is denoted using the following syntax: `MATCH { <inner-query> }`.

A nested read-only simple subquery may only be used as a primary clause, i.e. as a
The inner query can be any complete read-only Cypher query.

* top-level Cypher query,
* inner query of another nested subquery,
* inner query of another expression-level subquery (such as a pattern comprehension, or an `EXISTS` subquery),
* argument query to `UNION` and similar clause-level binary operators
==== Read-only nested optional subqueries

A nested read-only simple subquery may not be used as a secondary clause after a preceding primary clause.
(However, a nested read-only chained subquery may be used in this case.)
We propose extending the `OPTIONAL MATCH` clause to express read-only nested optional subqueries.

A read-only nested optional subquery is denoted by the following syntax: `OPTIONAL MATCH { <inner-query> }`.

**2. Read-only nested chained subqueries**
==== Read-only nested mandatory subqueries

We propose the addition of read-only nested chained subqueries for using nested subqueries in a similar position as a secondary clause.
This is called _subquery chaining_.
We propose extending the `MANDATORY MATCH` clause to express read-only nested mandatory subqueries.

After a chain of clauses that together form a query, a new nested chained subquery may be introduced as a secondary clause using the `THEN` keyword followed by an inner query in curly braces, i.e. it is denoted using the following syntax: `... THEN { <inner-query> }`.
`THEN` is a query combinator and more details may be found in the Query Combinator CIP.
A read-only nested mandatory subquery is denoted by the following syntax: `MANDATORY MATCH { <inner-query> }`.

==== Semantics

**3. Read-only nested optional subqueries**
The nested subquery will be provided with all variables visible in the outer query as subquery input.

We propose the addition of a new `OPTIONAL` clause for expressing read-only nested optional subqueries.
All records returned by the final `RETURN` clause of the subquery will be augmented with the variable bindings of the initial input record from the outer query to form the output records of the subquery.
No other variable bindings will be added to the output records.
If an incoming variable is either discarded or shadows within the subquery, an error will be raised if the subquery returns that variable to the outer query.

A read-only nested optional subquery is denoted by the following syntax: `OPTIONAL { <inner-query> }`.
Finally, the result records of the different forms of nested subqueries are formed as follows:

* The result records of a read-only regular subquery are just the output records.
* The result records of a read-only optional subquery are all the output records (if there is at least one output record), or a single record with the same fields as the output records where all newly introduced variable bindings are set to `NULL`.
* The result records of a read-only mandatory subquery are just the output records. However, if the set of output records is empty, an error is raised in the same way as regular `MANDATORY MATCH`.

**4. Read-only nested mandatory subqueries**
Nested subqueries interact with write clauses in the same way as `MATCH` does.

We propose the addition of a new `MANDATORY` clause for expressing read-only nested mandatory subqueries.

A read-only nested mandatory subquery is denoted by the following syntax: `MANDATORY { <inner-query> }`.
=== Read/Write updating subqueries

Updating subqueries never change the cardinality; i.e. the inner update query is run for each incoming input record.

**4. Read/Write nested simple updating subqueries**
==== Read/Write simple updating subqueries

We propose the addition of a new `DO` clause for expressing read/write nested simple updating subqueries that _do not return any data_.
We propose the addition of a new `DO` clause for expressing read/write simple updating subqueries that _do not return any data_ from the inner query.

A read/write nested simple updating subquery is denoted by the following syntax: `DO { <inner-update-query> }`.
A read/write simple updating subquery is denoted by the following syntax: `DO { <inner-update-query> }`.

Any updating Cypher query from which the trailing final `RETURN` clause has been omitted may be used as an inner update query.

We additionally propose removing the `FOREACH` clause from the current language as it is rendered obsolete by the introduction of `DO`.

A query may end with a `DO` subquery in the same way that a query can currently end with any update clause.

**5. Read/Write nested conditionally-updating subqueries**
==== Read/Write conditionally-updating subqueries

We propose the addition of a second form of the `DO` clause for expressing read/write nested conditionally-updating subqueries that _do not return any data_.
We propose the addition of a new conditional `DO` clause for expressing read/write conditionally-updating subqueries that _do not return any data_ from the inner query.

A read/write nested conditionally-updating subquery is denoted by the following syntax:
A read/write conditionally-updating subquery is denoted by the following syntax:

```
DO
[WHEN <cond> THEN <inner-update-query>]+
[WHEN <predicate> THEN <inner-update-query>]+
[ELSE <inner-update-query>]
END
```


Evaluation proceeds as follows:

* Semantically, the `WHEN` conditions are tested in the order given, and the inner updating query is executed for only the first condition that evaluates to `true`.
* If no given `WHEN` condition evaluates to `true` and an `ELSE` branch is provided, the inner updating query of the `ELSE` branch is executed.
* If no given `WHEN` condition evaluates to `true` and no `ELSE` branch is provided, no updates will be executed.
* Semantically, the `WHEN` predicates are tested in the order given, and the inner updating query is executed for only the first predicate that evaluates to `true`.
* If no given `WHEN` predicates evaluates to `true` and an `ELSE` branch is provided, the inner updating query of the `ELSE` branch is executed.
* If no given `WHEN` predicates evaluates to `true` and no `ELSE` branch is provided, no updates will be executed.

A query may end with a conditional `DO` subquery in the same way that a query can currently end with any update clause.

**6. Shorthand syntax**

We propose the addition of a new clause `WHERE <cond> <subclauses>` as a shorthand syntax for `WITH * WHERE <cond> THEN { <subclauses> }`.
The idea is for this to be used exclusively as a primary clause; for example, as the first clause of a nested subquery.
=== Chained subqueries

We propose the addition of a new projection clauses of the form `WITH -` and `RETURN -`, which will retain the input cardinality but project no result fields.
This allows for *only* checking the cardinality in a read-only nested mandatory subquery.
==== Chained data-dependent subqueries

We propose the addition of a new subclause to `CALL` of the form `YIELD -`, which will retain the output cardinality of a call but project no result fields.
This allows for *only* checking the cardinality in an `EXISTS` subquery.
We propose extending the `WITH` projection clause to sequentially compose arbitrary queries to form a chained data-dependent subquery without resorting to nesting and indentation (e.g. as a short-hand syntax for post-UNION processing).

Chained data-dependent subqueries have the following general form `<Q1> WITH ... <Q2>`.

=== Semantic clarification
Both `<Q1`> and `<Q2>` are arbitrary, complete Cypher queries.

**1. Read-only nested subqueries**
Conceptually, the query `<Q2>` is evaluated for each incoming input record from the query `<Q1>` and may produce an arbitrary number of result records.
In other words, the query `<Q2>` will be provided with all variables returned by the query `<Q1>` as input variable bindings.

Conceptually, a nested subquery is evaluated for each incoming record and may produce an arbitrary number of result records.
Furthermore, this CIP proposes allowing a leading `WITH` to project variables from expressions that refer to unbound variables from the preceding scope (or query).
This set of referenced, unbound variables of such a leading `WITH` is understood to implicitly declare the input variables required for the query to execute.

The rules regarding variable scoping are detailed as follows:
Note:: This mechanism allows composing a Cypher query with inputs that have been constructed programmatically.

* All incoming variables remain in scope throughout the whole subquery.
* When evaluating the subquery, any new variable bindings introduced by the final `RETURN` clause will augment the variable bindings of the initial record.
* It is valid (though redundant) if incoming variables from the outer scope are passed on explicitly by any projection clause of the subquery (including the final `RETURN`).
* Nested subqueries therefore cannot shadow variables present in the outer scope, and thus behave in the same way as `UNWIND` and `CALL` with regard to the introduction of new variable bindings.
* Any other variable bindings that are introduced temporarily in the subquery will not be visible to the outer scope.
==== Chained data-independent subqueries

Subqueries interact with write clauses in the same way as `MATCH` does.
We propose introducing the `THEN` projection clause to sequentially compose two arbitrary subqueries to form a chained data-independent subquery without resorting to nesting and indentation.

Chained data-independent subqueries have the following general form `<Q1> THEN <Q2>`.

**2. Read/Write subqueries**
Both `<Q1`> and `<Q2>` are arbitrary, complete Cypher queries.
No variables and no input records are passed from `<Q1>` to `<Q2>`.
Instead `<Q2>` is executed in a standalone fashion after the execution of `<Q1>` has finished.

Execution of a `DO` subquery does not change the cardinality; i.e. the inner update query is run for each incoming record.
Furthermore, this CIP proposes allowing queries to start with a leading `THEN` for discarding all variables in scope as well as the cardinality of all input records provided by the surrounding execution environment.

Any input record is always passed on to the clause succeeding the `DO` subquery, irrespective of whether it was eligible for processing by any inner update query.
Note:: This mechanism allows guaranteed execution of `<Q2>` irrespective of the number of records produced by `<Q1>`.

A `DO` clause that uses `WHEN` sub-clause is called a _conditional DO_.
Note:: In general, `<Q1>` is expected to be an updating query and it is recommended that implementations generate a warning if this is not the case (to inform the user that `<Q1>` is essentially superfluous).

A query may end with a `DO` subquery in the same way that a query can currently end with any update clause.
==== Discarding variables in scope

Finally, this CIP proposes new shorthand syntax for discarding all variables in scope without discarding the cardinality of input records using `WITH|RETURN|YIELD NOTHING`.

=== Examples

**1. Read-only nested simple and chained subqueries**
==== Read-only nested regular subqueries

Post-UNION processing:
[source, cypher]
----
{
MATCH {
// authored tweets
MATCH (me:User {name: 'Alice'})-[:FOLLOWS]->(user:User),
(user)<-[:AUTHORED]-(tweet:Tweet)
Expand All @@ -197,7 +200,7 @@ Uncorrelated nested subquery:
[source, cypher]
----
MATCH (f:Farm {id: $farmId})
THEN {
MATCH {
MATCH (u:User {id: $userId})-[:LIKES]->(b:Brand),
(b)-[:PRODUCES]->(p:Lawnmower)
RETURN b.name AS name, p.code AS code
Expand All @@ -214,7 +217,7 @@ Correlated nested subquery:
[source, cypher]
----
MATCH (f:Farm {id: $farmId})-[:IS_IN]->(country:Country)
THEN {
MATCH {
MATCH (u:User {id: $userId})-[:LIKES]->(b:Brand),
(b)-[:PRODUCES]->(p:Lawnmower)
RETURN b.name AS name, p.code AS code
Expand All @@ -233,7 +236,7 @@ Filtered and correlated nested subquery:
----
MATCH (f:Farm)-[:IS_IN]->(country:Country)
WHERE country.name IN $countryNames
THEN {
MATCH {
MATCH (u:User {id: $userId})-[:LIKES]->(b:Brand),
(b)-[:PRODUCES]->(p:Lawnmower)
RETURN b AS brand, p.code AS code
Expand All @@ -253,9 +256,9 @@ Doubly-nested subquery:
[source, cypher]
----
MATCH (f:Farm {id: $farmId})
THEN {
MATCH {
MATCH (c:Customer)-[:BUYS_FOOD_AT]->(f)
THEN {
MATCH {
MATCH (c)-[:RETWEETS]->(t:Tweet)<-[:TWEETED_BY]-(f)
RETURN c, count(*) AS count
UNION
Expand All @@ -271,23 +274,23 @@ THEN {
RETURN f.name AS name, type, sum(endorsement) AS endorsement
----

**2. Read-only nested optional match and mandatory subqueries**
===== Read-only nested optional and mandatory subqueries

This proposal also provides nested subquery forms of `OPTIONAL MATCH` and `MANDATORY MATCH`:

[source, cypher]
----
MANDATORY MATCH (p:Person {name: 'Petra'})
MANDATORY MATCH (conf:Conference {name: $conf})
MANDATORY {
WHERE conf.impact > 5
MANDATORY MATCH {
WITH * WHERE conf.impact > 5
MATCH (p)-[:ATTENDS]->(conf)
RETURN conf
UNION
MATCH (p)-[:LIVES_IN]->(:City)<-[:IN]-(conf)
RETURN conf
}
OPTIONAL {
OPTIONAL MATCH {
MATCH (p)-[:KNOWS]->(a:Attendee)-[:PUBLISHED_AT]->(conf)
RETURN a.name AS name
UNION
Expand All @@ -298,7 +301,7 @@ RETURN name
----


**3. Read/Write nested simple and conditionally-updating subqueries**
==== Read/Write simple updating and conditionally-updating subqueries

We illustrate these by means of an 'old' version of the query, in which `FOREACH` is used, followed by the 'new' version, using `DO`.

Expand Down Expand Up @@ -376,12 +379,31 @@ DO WHEN x % 2 = 1 THEN {
END
----

==== Chained subqueries

Combining nested and chained subqueries
[source, cypher]
----
MATCH (x)-[:IN]->(:Category {name: "A"})
WITH x LIMIT 5
MATCH (x)-[:FROM]-(c :City)
RETURN x, c
UNION
MATCH (x)-[:IN]->(:Category {name: "A"})
WITH x LIMIT 10
MATCH (x)-[:FROM]-(c :City)
// This finished the right arm of the UNION
RETURN x, c
// This applies to the whole UNION
WITH x.name AS name ORDER BY x.age
RETURN x LIMIT 10
----

=== Interaction with existing features

Apart from the suggested deprecation of the `FOREACH` clause, nested read-only, write-only and read-write subqueries do not interact directly with any existing features.

=== Alternatives
== Alternatives

Alternative syntax has been considered during the production of this document:

Expand Down

0 comments on commit 2921112

Please sign in to comment.