You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Cypher pattern matching assumes relationship uniqueness: A relationship can only be matched once per instance of a pattern. This has been criticized occasionally in research papers and by graph analytics practitioners as being a somewhat arbitrary decision that makes it hard to express homomorphic and (node-)isomorphic matching.
This CIR (cypher improvement request) invites proposals to address this, e.g. via a syntactic extension for overriding default uniqueness.
Background
Pattern matching in Cypher can be described as filtering a candidate set of possible matches that are formed by computing the cross product of possible values for all pattern variables.
Candidate filtering only keeps matches for which all relevant predicates evaluate to true and whose relationships are matched in accordance with relationship direction and uniqueness requirements as specified by the relevant patterns.
It would certainly be desirable to further elaborate the exact full semantics of pattern matching at some point but for the purpose of this CIR only the semantics of pattern uniqueness is covered below.
Pattern variable
A variable that is used to name a part of a pattern is called a pattern variable. Currently, there are four kinds of pattern variables:
Node pattern variables, e.g. n in (n)
Single relationship pattern variables, e.g. r in ()-[r]->()
Variable length relationship pattern variables, e.g. r in ()-[r*..4]-()
Named path variables in named path patterns, e.g. p in p=()-[*]-()<-(x)
A pattern variable may have already been bound in an outer scope or it may be newly introduced by the syntactic element that contains the pattern.
Uniqueness scope
Every MATCH clause forms a uniqueness scope for all pattern variables used to name any of the patterns in the clause. The uniqueness scope does not consider variables from predicates in the associated WHERE clause, not even if those predicates are implied by patterns (e.g. patterns with literal maps whose value expressions reference pattern variables introduced in earlier parts of the query).
Similarly, other constructs that contain patterns (like pattern comprehensions) form a uniqueness scope for all pattern variables used to name any of their patterns.
Furthermore - and without any loss of generality - it is assumed in the following that all unnamed parts of patterns are named using artificially generated pattern variables, i.e. unnamed patterns are treated as syntactic sugar and everything is considered to be named.
Entity consumption
We say that a given concrete entity (i.e. a node or a relationship) is consumed ("used up") in a candidate match by a pattern variable from the associated uniqueness scope if
the concrete entity is a node, the pattern variable is a node pattern variable, and the variable is bound to the concrete node,
or the concrete entity is relationship, the pattern variable is a single relationship pattern variable, and the variable is bound to the concrete relationship,
or the concrete entity is a relationship, the pattern variable is variable length relationship pattern variable, and the variable is bound to a list of relationships that contains the concrete relationship (each occurrence counts separately),
or the concrete entity is a node, the pattern variable is variable length relationship pattern variable, the variable is bound to a list of relationships, and the concrete node is one of the interior nodes of one the relationships in that list (each occurrence counts separately),
or nothing else.
Note that this definition itself does not consider uniqueness, it is merely used below to define uniqueness and therefore the same concrete entity can be consumed multiple times by the same instance of a pattern (the same match) according to this definition.
Also note that this definition does not consider named path variables. Paths are always constructed from other patterns in the same uniqueness scope and therefore do not need any additional consideration regarding the uniqueness of their contained entities.
Uniqueness
With these definitions it now becomes possible to precisely describe relationship uniqueness:
A candidate match is relationship-unique in a uniqueness scope if it does not consume a relationship more than once.
By analogy, a candidate match is node-unique if it does not consume a node more than once.
Note that node uniqueness implies relationship uniqueness and prevents matching of self-relationships (relationships with the same start and end node).
Pattern matching in Cypher by default only returns relationship-unique matches.
Requirements
Requested changes
Change Cypher such that it becomes possible to configure uniqueness per uniqueness scope in order to at least enable matching without any uniqueness constraints
Change Cypher to support: homomorphic matching (no uniqueness), isomorphic matching (node uniqueness), and the currently provided semantics (relationship uniqueness)
Requested considerations
Consider the impact of the proposal on path matching semantics, especially regarding cycles
Consider the impact of the proposal on potential changes in possible query result cardinality (esp. regarding the possible introduction of infinite query results)
Consider the extensibility of the proposal regarding the introduction of further uniqueness modes in the future
Interaction with existing features
Do not change the semantics of existing queries without giving a very strong argument
The text was updated successfully, but these errors were encountered:
petraselmer
changed the title
feature request: isomorphic pattern matching and configurable uniqueness
Isomorphic pattern matching and configurable uniqueness
Feb 3, 2017
CIR-2017-174
Cypher pattern matching assumes relationship uniqueness: A relationship can only be matched once per instance of a pattern. This has been criticized occasionally in research papers and by graph analytics practitioners as being a somewhat arbitrary decision that makes it hard to express homomorphic and (node-)isomorphic matching.
This CIR (cypher improvement request) invites proposals to address this, e.g. via a syntactic extension for overriding default uniqueness.
Background
Pattern matching in Cypher can be described as filtering a candidate set of possible matches that are formed by computing the cross product of possible values for all pattern variables.
Candidate filtering only keeps matches for which all relevant predicates evaluate to true and whose relationships are matched in accordance with relationship direction and uniqueness requirements as specified by the relevant patterns.
It would certainly be desirable to further elaborate the exact full semantics of pattern matching at some point but for the purpose of this CIR only the semantics of pattern uniqueness is covered below.
Pattern variable
A variable that is used to name a part of a pattern is called a pattern variable. Currently, there are four kinds of pattern variables:
n
in(n)
r
in()-[r]->()
r
in()-[r*..4]-()
p
inp=()-[*]-()<-(x)
A pattern variable may have already been bound in an outer scope or it may be newly introduced by the syntactic element that contains the pattern.
Uniqueness scope
Every
MATCH
clause forms a uniqueness scope for all pattern variables used to name any of the patterns in the clause. The uniqueness scope does not consider variables from predicates in the associatedWHERE
clause, not even if those predicates are implied by patterns (e.g. patterns with literal maps whose value expressions reference pattern variables introduced in earlier parts of the query).Similarly, other constructs that contain patterns (like pattern comprehensions) form a uniqueness scope for all pattern variables used to name any of their patterns.
Furthermore - and without any loss of generality - it is assumed in the following that all unnamed parts of patterns are named using artificially generated pattern variables, i.e. unnamed patterns are treated as syntactic sugar and everything is considered to be named.
Entity consumption
We say that a given concrete entity (i.e. a node or a relationship) is consumed ("used up") in a candidate match by a pattern variable from the associated uniqueness scope if
Note that this definition itself does not consider uniqueness, it is merely used below to define uniqueness and therefore the same concrete entity can be consumed multiple times by the same instance of a pattern (the same match) according to this definition.
Also note that this definition does not consider named path variables. Paths are always constructed from other patterns in the same uniqueness scope and therefore do not need any additional consideration regarding the uniqueness of their contained entities.
Uniqueness
With these definitions it now becomes possible to precisely describe relationship uniqueness:
A candidate match is relationship-unique in a uniqueness scope if it does not consume a relationship more than once.
By analogy, a candidate match is node-unique if it does not consume a node more than once.
Note that node uniqueness implies relationship uniqueness and prevents matching of self-relationships (relationships with the same start and end node).
Pattern matching in Cypher by default only returns relationship-unique matches.
Requirements
Requested changes
Requested considerations
Interaction with existing features
The text was updated successfully, but these errors were encountered: