Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isomorphic pattern matching and configurable uniqueness #174

Open
boggle opened this issue Jan 25, 2017 · 0 comments
Open

Isomorphic pattern matching and configurable uniqueness #174

boggle opened this issue Jan 25, 2017 · 0 comments

Comments

@boggle
Copy link
Contributor

boggle commented Jan 25, 2017

CIR-2017-174

Cypher pattern matching assumes relationship uniqueness: A relationship can only be matched once per instance of a pattern. This has been criticized occasionally in research papers and by graph analytics practitioners as being a somewhat arbitrary decision that makes it hard to express homomorphic and (node-)isomorphic matching.

This CIR (cypher improvement request) invites proposals to address this, e.g. via a syntactic extension for overriding default uniqueness.

Background

Pattern matching in Cypher can be described as filtering a candidate set of possible matches that are formed by computing the cross product of possible values for all pattern variables.
Candidate filtering only keeps matches for which all relevant predicates evaluate to true and whose relationships are matched in accordance with relationship direction and uniqueness requirements as specified by the relevant patterns.

It would certainly be desirable to further elaborate the exact full semantics of pattern matching at some point but for the purpose of this CIR only the semantics of pattern uniqueness is covered below.

Pattern variable

A variable that is used to name a part of a pattern is called a pattern variable. Currently, there are four kinds of pattern variables:

  • Node pattern variables, e.g. n in (n)
  • Single relationship pattern variables, e.g. r in ()-[r]->()
  • Variable length relationship pattern variables, e.g. r in ()-[r*..4]-()
  • Named path variables in named path patterns, e.g. p in p=()-[*]-()<-(x)

A pattern variable may have already been bound in an outer scope or it may be newly introduced by the syntactic element that contains the pattern.

Uniqueness scope

Every MATCH clause forms a uniqueness scope for all pattern variables used to name any of the patterns in the clause. The uniqueness scope does not consider variables from predicates in the associated WHERE clause, not even if those predicates are implied by patterns (e.g. patterns with literal maps whose value expressions reference pattern variables introduced in earlier parts of the query).

Similarly, other constructs that contain patterns (like pattern comprehensions) form a uniqueness scope for all pattern variables used to name any of their patterns.

Furthermore - and without any loss of generality - it is assumed in the following that all unnamed parts of patterns are named using artificially generated pattern variables, i.e. unnamed patterns are treated as syntactic sugar and everything is considered to be named.

Entity consumption

We say that a given concrete entity (i.e. a node or a relationship) is consumed ("used up") in a candidate match by a pattern variable from the associated uniqueness scope if

  • the concrete entity is a node, the pattern variable is a node pattern variable, and the variable is bound to the concrete node,
  • or the concrete entity is relationship, the pattern variable is a single relationship pattern variable, and the variable is bound to the concrete relationship,
  • or the concrete entity is a relationship, the pattern variable is variable length relationship pattern variable, and the variable is bound to a list of relationships that contains the concrete relationship (each occurrence counts separately),
  • or the concrete entity is a node, the pattern variable is variable length relationship pattern variable, the variable is bound to a list of relationships, and the concrete node is one of the interior nodes of one the relationships in that list (each occurrence counts separately),
  • or nothing else.

Note that this definition itself does not consider uniqueness, it is merely used below to define uniqueness and therefore the same concrete entity can be consumed multiple times by the same instance of a pattern (the same match) according to this definition.

Also note that this definition does not consider named path variables. Paths are always constructed from other patterns in the same uniqueness scope and therefore do not need any additional consideration regarding the uniqueness of their contained entities.

Uniqueness

With these definitions it now becomes possible to precisely describe relationship uniqueness:

A candidate match is relationship-unique in a uniqueness scope if it does not consume a relationship more than once.

By analogy, a candidate match is node-unique if it does not consume a node more than once.

Note that node uniqueness implies relationship uniqueness and prevents matching of self-relationships (relationships with the same start and end node).

Pattern matching in Cypher by default only returns relationship-unique matches.

Requirements

Requested changes

  • Change Cypher such that it becomes possible to configure uniqueness per uniqueness scope in order to at least enable matching without any uniqueness constraints
  • Change Cypher to support: homomorphic matching (no uniqueness), isomorphic matching (node uniqueness), and the currently provided semantics (relationship uniqueness)

Requested considerations

  • Consider the impact of the proposal on path matching semantics, especially regarding cycles
  • Consider the impact of the proposal on potential changes in possible query result cardinality (esp. regarding the possible introduction of infinite query results)
  • Consider the extensibility of the proposal regarding the introduction of further uniqueness modes in the future

Interaction with existing features

  • Do not change the semantics of existing queries without giving a very strong argument
@Mats-SX Mats-SX added the CIR label Jan 26, 2017
@petraselmer petraselmer changed the title feature request: isomorphic pattern matching and configurable uniqueness Isomorphic pattern matching and configurable uniqueness Feb 3, 2017
@thobe thobe added the HAS CIP label Dec 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants