Isomorphic pattern matching and configurable uniqueness #174

boggle · 2017-01-25T15:33:00Z

CIR-2017-174

Cypher pattern matching assumes relationship uniqueness: A relationship can only be matched once per instance of a pattern. This has been criticized occasionally in research papers and by graph analytics practitioners as being a somewhat arbitrary decision that makes it hard to express homomorphic and (node-)isomorphic matching.

This CIR (cypher improvement request) invites proposals to address this, e.g. via a syntactic extension for overriding default uniqueness.

Background

Pattern matching in Cypher can be described as filtering a candidate set of possible matches that are formed by computing the cross product of possible values for all pattern variables.
Candidate filtering only keeps matches for which all relevant predicates evaluate to true and whose relationships are matched in accordance with relationship direction and uniqueness requirements as specified by the relevant patterns.

It would certainly be desirable to further elaborate the exact full semantics of pattern matching at some point but for the purpose of this CIR only the semantics of pattern uniqueness is covered below.

Pattern variable

A variable that is used to name a part of a pattern is called a pattern variable. Currently, there are four kinds of pattern variables:

Node pattern variables, e.g. n in (n)
Single relationship pattern variables, e.g. r in ()-[r]->()
Variable length relationship pattern variables, e.g. r in ()-[r*..4]-()
Named path variables in named path patterns, e.g. p in p=()-[*]-()<-(x)

A pattern variable may have already been bound in an outer scope or it may be newly introduced by the syntactic element that contains the pattern.

Uniqueness scope

Every MATCH clause forms a uniqueness scope for all pattern variables used to name any of the patterns in the clause. The uniqueness scope does not consider variables from predicates in the associated WHERE clause, not even if those predicates are implied by patterns (e.g. patterns with literal maps whose value expressions reference pattern variables introduced in earlier parts of the query).

Similarly, other constructs that contain patterns (like pattern comprehensions) form a uniqueness scope for all pattern variables used to name any of their patterns.

Furthermore - and without any loss of generality - it is assumed in the following that all unnamed parts of patterns are named using artificially generated pattern variables, i.e. unnamed patterns are treated as syntactic sugar and everything is considered to be named.

Entity consumption

We say that a given concrete entity (i.e. a node or a relationship) is consumed ("used up") in a candidate match by a pattern variable from the associated uniqueness scope if

the concrete entity is a node, the pattern variable is a node pattern variable, and the variable is bound to the concrete node,
or the concrete entity is relationship, the pattern variable is a single relationship pattern variable, and the variable is bound to the concrete relationship,
or the concrete entity is a relationship, the pattern variable is variable length relationship pattern variable, and the variable is bound to a list of relationships that contains the concrete relationship (each occurrence counts separately),
or the concrete entity is a node, the pattern variable is variable length relationship pattern variable, the variable is bound to a list of relationships, and the concrete node is one of the interior nodes of one the relationships in that list (each occurrence counts separately),
or nothing else.

Note that this definition itself does not consider uniqueness, it is merely used below to define uniqueness and therefore the same concrete entity can be consumed multiple times by the same instance of a pattern (the same match) according to this definition.

Also note that this definition does not consider named path variables. Paths are always constructed from other patterns in the same uniqueness scope and therefore do not need any additional consideration regarding the uniqueness of their contained entities.

Uniqueness

With these definitions it now becomes possible to precisely describe relationship uniqueness:

A candidate match is relationship-unique in a uniqueness scope if it does not consume a relationship more than once.

By analogy, a candidate match is node-unique if it does not consume a node more than once.

Note that node uniqueness implies relationship uniqueness and prevents matching of self-relationships (relationships with the same start and end node).

Pattern matching in Cypher by default only returns relationship-unique matches.

Requirements

Requested changes

Change Cypher such that it becomes possible to configure uniqueness per uniqueness scope in order to at least enable matching without any uniqueness constraints
Change Cypher to support: homomorphic matching (no uniqueness), isomorphic matching (node uniqueness), and the currently provided semantics (relationship uniqueness)

Requested considerations

Consider the impact of the proposal on path matching semantics, especially regarding cycles
Consider the impact of the proposal on potential changes in possible query result cardinality (esp. regarding the possible introduction of infinite query results)
Consider the extensibility of the proposal regarding the introduction of further uniqueness modes in the future

Interaction with existing features

Do not change the semantics of existing queries without giving a very strong argument

The text was updated successfully, but these errors were encountered:

…r#174

boggle added the language feature label Jan 25, 2017

boggle added a commit to boggle/openCypher that referenced this issue Jan 25, 2017

Add proposal for isomorphic pattern matching in response to opencyphe…

8b6513b

…r#174

Mats-SX added the CIR label Jan 26, 2017

petraselmer removed the language feature label Feb 3, 2017

petraselmer changed the title ~~feature request: isomorphic pattern matching and configurable uniqueness~~ Isomorphic pattern matching and configurable uniqueness Feb 3, 2017

sherfert mentioned this issue Oct 9, 2017

Suggestion: returning all nodes connected to another could be hugely more efficient neo4j/neo4j#9031

Closed

thobe mentioned this issue Dec 20, 2018

Configurable pattern matching semantics in response to #174 #175

Open

thobe added the HAS CIP label Dec 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Isomorphic pattern matching and configurable uniqueness #174

Isomorphic pattern matching and configurable uniqueness #174

boggle commented Jan 25, 2017 •

edited

Loading

Isomorphic pattern matching and configurable uniqueness #174

Isomorphic pattern matching and configurable uniqueness #174

Comments

boggle commented Jan 25, 2017 • edited Loading

CIR-2017-174

Background

Pattern variable

Uniqueness scope

Entity consumption

Uniqueness

Requirements

Requested changes

Requested considerations

Interaction with existing features

boggle commented Jan 25, 2017 •

edited

Loading