Skip to content

Latest commit

 

History

History
276 lines (184 loc) · 12.7 KB

2.nGQL-overview.md

File metadata and controls

276 lines (184 loc) · 12.7 KB

Nebula Graph Query Language (nGQL)

About nGQL

nGQL is a declarative, textual query language like SQL, but for graphs. Unlike SQL, nGQL is all about expressing graph patterns. nGQL is a work in progress. We will add more features and further simplify the existing ones. There might be inconsistency between the syntax specs and implementation for the time being.

Goals

  • Easy to learn
  • Easy to understand
  • To focus on the online queries, also to provide the foundation for the offline computation

Features

  • Syntax is close to SQL, but not exactly the same (Easy to learn)
  • Expandable
  • Case insensitive
  • Support basic graph traverse
  • Support pattern match
  • Support aggregation
  • Support graph mutation
  • Support distributed transaction (future release)
  • Statement composition, but NO statement embedding (Easy to read)

Terminology

  • Graph Space : A physically isolated space for different graph
  • Tag : A label associated with a list of properties
    • Each tag has a name (human readable string), and internally each tag will be assigned a 32-bit integer
    • Each tag associates with a list of properties, each property has a name and a type
    • There could be dependencies between tags. The dependency is a constrain, for instance, if tag S depends on tag T, then tag S cannot exist unless tag T exists
  • Vertex : A Node in the graph
    • Each vertex has a unique 64-bit (signed integer) ID (VID)
    • Each vertex can associate with multiple tags
  • Edge : A Link between two vertices
    • Each edge can be uniquely identified by a tuple <src_vid, dst_vid, edge_type, rank>
    • Edge type (ET) is a human readable string, internally it will be assigned a 32-bit integer. The edge type decides the property list (schema) on the edge
    • Edge rank is an immutable user-assigned 64-bit signed integer. It affects the edge order between two vertices. The edge with a higher rank value comes first. When not specified, the default rank value is zero.
    • Each edge can only be of one type
  • Path : A non-forked connection with multiple vertices and edges between them
    • The length of a path is the number of the edges on the path, which is one less than the number of vertices
    • A path can be represented by a list of vertices, edge types, and rank. An edge is a special path with length==1
 <vid, <edge_type, rank>, vid, ...>

Language Specification At A Glance

For most readers, You can skip this section if you are not familiar with BNF.

General

  • The entire set of statements can be categorized into three classes: query, mutation, and administration
  • Every statement can yield a data set as the result. Each data set contains a schema (column name and type) and multiple data rows

Composition

  • Statements could be composed in two ways:
    • Statements could be piped together using operator "|", much like the pipe in the shell scripts. The result yielded from the previous statement could be redirected to the next statement as input
    • More than one statements can be batched together, separated by ";". The result of the last statement (or a RETURN statement is executed) will be returned as the result of the batch

Data Types

  • Simple type: vid, double, int, bool, string, timestamp
  • vid : 64-bit signed integer, representing a vertex ID
  • List of simple types, such as integer[], double[], string[]
  • Map: A list of KV pairs. The key must be a string, the value must be the same type for the given map
  • Object (future release??): A list of KV pairs. The key mush be a string, the value can be any simple type
  • Tuple List: This is only used for return values. It's composed by both meta data and data (multiple rows). The meta data includes the column names and their types.

Type Conversion

  • A simple typed value can be implicitly converted into a list
  • A list can be implicitly converted into a one-column tuple list
    • "<type>_list" can be used as the column name

Common BNF

<simple_type> ::= vid | integer | double | float | bool | string | path | timestamp | year | month | date | datetime

<composite_type> ::=

<type> ::= <simple_type> | <composite_type>

<vid_list> ::= vid (, vid)* | "{" vid (, vid)* "}"

<label> ::= [:alpha] ([:alnum:] | "_")*

<underscore_label> ::= ("_")* <label>

<field_name> ::= <label>

<field_def_list> ::= <field_def> (, <field_def>)*

<field_def> ::= <field_name>:<type>

<tuple_list_decl> ::= <tuple_schema> ":" <tuple_data>

<tuple_schema> ::= <field_def_list>

<tuple_data> ::= <tuple> (, <tuple>)* | "{" <tuple> (, <tuple>)* "}"

<tuple> ::= "(" VALUE (, VALUE)* ")"

<var> ::= "$" <label>

Statements

Choose a graph space

Nebula supports multiple graph spaces. Data in different graph spaces are physically isolated. Before executing a query, a graph space needs to be selected using the following statement

USE <graphspace_name>

Return a data set

Simply return a single value or a data set

RETURN <return_value_decl>

<return_value_decl> ::= vid | <vid_list> | <tuple_list_decl> | <var>

Create a tag

The following statement defines a new tag

CREATE TAG <tag_name> (<prop_def_list>)

<tag_name> ::= <label>
<prop_def_list> ::= <prop_def>+
<prop_def> ::= <prop_name>,<type>
<prop_name> ::= <label>

Create an edge type

The following statement defines a new edge type

CREATE EDGE <edge_type_name> (<prop_def_list>)

<edge_type_name> := <label>

Insert vertices

The following statement inserts one or more vertices

INSERT VERTEX [NO OVERWRITE] <tag_list> VALUES <vertex_list>

<tag_list> ::= <tag_name>(<prop_list>) (, <tag_name>(<prop_list>))*
<vertex_list> ::= <vertex_id>:(<prop_value_list>) (, <vertex_id>:(<prop_value_list>))*
<vertex_id> ::= vid
<prop_list> ::= <prop_name> (, <prop_name>)*
<prop_value_list> ::= VALUE (, VALUE)*

Insert edges

The following statement inserts one or more edges

INSERT EDGE [NO OVERWRITE] <edge_type_name> [(<prop_list>)] VALUES (<edge_value>)+

edge_value ::= <vertex_id> -> <vertex_id> [@ <weight>] : <prop_value_list>

Update a vertex

The following statement updates a vertex

UPDATE VERTEX <vertex_id> SET <update_decl> [WHERE <conditions>] [YIELD <field_list>]

<update_decl> ::= <update_form1> | <update_form2>
<update_form1> ::= <prop_name> = <expression> {,<prop_name> = <expression>}+
<update_form2> ::= (<prop_list>) = (<value_list>) | (<prop_list>) = <var>

Update an edge

The following statement updates an edge

UPDATE EDGE <vertex_id> -> <vertex_id> [@<weight>] OF <edge_type> SET <update_decl> [WHERE <conditions>] [YIELD <field_list>]

Traverse the graph

Navigate from given vertices to their neighbors according to the given conditions. It returns either a list of vertex IDs, or a list of tuples

GO [<steps_decl> STEPS] FROM <data_set_decl> [OVER [REVERSELY] <edge_type_decl>] [WHERE <filter_list>] [YIELD <field_list>]

<steps_decl> ::= integer | integer TO integer | UPTO integer
<data_set_decl> ::= [data_set] [[AS] <label>]
<data_set> ::= vid | <vid_list> | <tuple_list_decl> | <var>
<edge_type_decl> ::= <edge_type_list> [AS <label>] <edge_type_list> ::= <edge_type> {, <edge_type>}*
<edge_type> ::= <label>

<filter_list> ::= <filter> {AND | OR <filter>}*
<filter> ::= <expression> > | >= | < | <= | == | != <expression> | <expression> IN <value_list>
<field_list> ::= <return_field> {, <return_field>}*
<return_field> ::= <expression> [AS <label>]

WHERE clause only applies to the results that are going to be returned. It will not be applied to the intermediate results (See the detail description of the STEP[S] clause)

When STEP[S] clause is skipped, it implies one step

When going out for one step from the given vertex, all neighbors will be checked against the WHERE clause, only results satisfied the WHERE clause will be returned

When going out for more than one step, WHERE clause will only be applied to the final results. It will not be applied to the intermediate results. Here is an example

GO 2 STEPS FROM me OVER friend WHERE birthday > "1988/1/1"

Obviously, you will probably guess the meaning of the query is to get all my fof (friend of friend) whose birthday is after 1988/1/1. You are absolutely right. We will not apply the filter to my friends (in the first step)

Here is another example

GO UPTO 3 STEPS FROM me OVER friend WHERE birthday > "1988/1/1/"

This query tries to find any friend of me whose birthday is after 1988/1/1. If it finds at least one, it will return all the results. If it cannot find any, it will check my friends of friends to see if anyone's birthday is after 1988/1/1. It will return all the non-empty results, otherwise it will check my friends of friends of friends.

So, similarly, next query tries to find anyone whose birthday is after 1988/1/1 starting from my 3-hop friends, and finishing at my 5-hop friends

GO 3 TO 5 STEPS FROM me OVER friend WHERE birthday > "1988/1/1/"

Search

Following statements looks for vertices or edges that match certain conditions

FIND VERTEX WHERE <filter_list> [YIELD <field_list>]

FIND EDGE WHERE <filter_list> [YIELD <field_list>]

Property Reference

It's common to refer a property in the statement, such as in WHERE clause and YIELD clause. In nGQL, the reference to a property is defined as

<property_ref> ::= <object> "." <prop_name>
<object> ::= <alias_name> | <alias_with_tag> | <var>
<alias_name> ::= <label>
<alias_with_tag> ::= <alias_name> '[' <tag_name> "]"

<var> always starts with "$". There are two special variables: $- and $$.

$- refers to the input stream, while $$ refers to the destination objects

All property names start with a letter. There are a few system property names starting with "_". All properties names starting with "_" are reserved.

Built-in properties

_id : Vertex id _type : Edge type _src : Source ID of the edge _dst : Destination ID of the edge _rank : Edge rank number