Unary Logical Graph Operators

This section provides an overview of unary graph to graph operators, which consume one graph as input.

Unary Logical Graph Operators
Aggregation
Pattern Matching
Transformation
Property transformation
Grouping
KeyedGrouping
Subgraph
Sampling
Group by RollUp
Split

Aggregation

In general, aggregation operators are used to combine N values into a single one. In the case of Gradoop, they are used to reduce some kind of information that is contained in a graph down to a single value. The aggregate operator is implemented as aggregate() function for the LogicalGraph class. It takes any amount of aggregate functions as an input and outputs the original graph with the result of each aggregate function as a new property. Most aggregate functions require a string denoting the name of the property or label they are being applied to. In the following, they shall be referred to as [propertyName] or [labelName]. Gradoop provides the following aggregate functions:

Function	Description	Returns
`MinProperty`	Minimum of a specified property over all edges and vertices	`LogicalGraph` with the aggregated property `min_[propertyName]: [value]`
`MinEdgeProperty`	Minimum of a specified property over all edges	`LogicalGraph` with the aggregated property `min_[propertyName]: [value]`
`MinVertexProperty`	Minimum of a specified property over all vertices	`LogicalGraph` with the aggregated property `min_[propertyName]: [value]`
`MaxProperty`	Maximum of a specified property over all edges and vertices	`LogicalGraph` with the aggregated property `max_[propertyName]: [value]`
`MaxEdgeProperty`	Maximum of a specified property over all edges	`LogicalGraph` with the aggregated property `max_[propertyName]: [value]`
`MaxVertexProperty`	Maximum of a specified property over all vertices	`LogicalGraph` with the aggregated property `max_[propertyName]: [value]`
`SumProperty`	Sum of a specified property over all edges and vertices	`LogicalGraph` with the aggregated property `sum_[propertyName]: [value]`
`SumEdgeProperty`	Sum of a specified property over all edges	`LogicalGraph` with the aggregated property `sum_[propertyName]: [value]`
`SumVertexProperty`	Sum of a specified property over all vertices	`LogicalGraph` with the aggregated property `sum_[propertyName]: [value]`
`AverageProperty`	Arithmetic mean of a specified property over all edges and vertices	`LogicalGraph` with the aggregated property `avg_[propertyName]: [double]`
`AverageEdgeProperty`	Arithmetic mean of a specified property over all edges	`LogicalGraph` with the aggregated property `avg_[propertyName]: [double]`
`AverageVertexProperty`	Arithmetic mean of a specified property over all vertices	`LogicalGraph` with the aggregated property `avg_[propertyName]: [double]`
`Count`	Element count of a graph	`LogicalGraph` with the aggregated property `count: [long]`
`EdgeCount`	Edge count of a graph	`LogicalGraph` with the aggregated property `edgeCount: [long]`
`VertexCount`	Vertex count of a graph	`LogicalGraph` with the aggregated property `vertexCount: [long]`
`HasLabel`	Checks presence of an edge or vertex label in a graph (also as filter usable)	`LogicalGraph` with the aggregated property `hasLabel_[labelName]: [bool]`
`HasEdgeLabel`	Checks presence of an edge label in a graph (also as filter usable)	`LogicalGraph` with the aggregated property `hasEdgeLabel_[labelName]: [bool]`
`HasVertexLabel`	Checks presence of an vertex label in a graph (also as filter usable)	`LogicalGraph` with the aggregated property `hasVertexLabel_[labelName]: [bool]`

Aggregate example

Consider the social graph from our quickstart section. Now we could use the aggregate function to get some more insights into the information that is contained in that graph. For example, we could be interested in the mean age of all persons. To achieve that we would

extract a sub-graph containing only the vertices labeled "Person"
aggregate the sum of all the persons ages
perform a graph head transformation to add the quotient of the summed up ages and the total amount of persons to the graph head

// Given the graph n1 from quickstart, we can aggregate the age values and output the result
SumVertexProperty sumPropertyAge = new SumVertexProperty("age");
LogicalGraph meanAgeAggregation =
  n1
    .subgraph(v -> v.getLabel().equals("Person"), e -> true)
    .aggregate(sumPropertyAge, new VertexCount())
    .transformGraphHead((currentGraphHead, transformedGraphHead) -> {
      int sumAge = currentGraphHead.getPropertyValue("sum_age").getInt();
      long numPerson = currentGraphHead.getPropertyValue("vertexCount").getLong();
      currentGraphHead.setProperty("meanAge", (double) sumAge / numPerson);
      return currentGraphHead;
    });

The resulting graph has the following structure:

n1:graph { meanAge: 30.8, vertexCount: 5, sum_age: 154, } [
  (p1:Person {name: "Bob", age: 24})-[:friendsWith]->(p2:Person{name: "Alice", age: 30})-[:friendsWith]->(p1)
  (p2)-[:friendsWith]->(p3:Person {name: "Jacob", age: 27})-[:friendsWith]->(p2)
  (p3)-[:friendsWith]->(p4:Person{name: "Marc", age: 40})-[:friendsWith]->(p3)
  (p4)-[:friendsWith]->(p5:Person{name: "Sara", age: 33})-[:friendsWith]->(p4)
]

Pattern Matching

In graph analytics, pattern matching is one of the most important and challenging tasks.

In Gradoop, pattern matching can be achieved either by defining a query graph using the GDL, or by executing a cypher query. While the former method is straight forward and simple to use, the cypher query language is more powerful. In the following we are going to showcase the usage and capabilities of both approaches.

GDL pattern matching with `query()`

The LogicalGraph class implements the method query to apply GDL queries on the data graph. The method returns a GraphCollection containing the matching subgraphs.

The method applies an engine that uses vertex homomorphism and edge isomorphism as default morphism strategies. As pattern matching is the purpose of this method, only the MATCH and WHERE keywords of the Cypher query language are supported. The vertex and edge data of the data graph elements is attached to the resulting vertices. In its more simple forms, the method uses no statistics about the data graph which may result in bad runtime performance. Use GraphStatistics to provide statistics for the query planner. The query method is overloaded. The following table describes each form of the method in detail:

Signature	Description
`query(String query)`	Simplest form. Works as described above
`query(String query, String constructionPattern)`	Allows for the creation of new graph elements based on variable bindings of the match pattern
`query(String query, GraphStatistics graphStatistics)`	Allows handling of existing graph statistics
`query(String query, String constructionPattern, GraphStatistics graphStatistics)`	Allows for both the creation of new graph elements as well as the application of existing graph statistics
`query(String query, boolean attachData, MatchStrategy vertexStrategy, MatchStrategy edgeStrategy, GraphStatistics graphStatistics)`	If `attachData` is set to `true`, the original vertex and edge data is attached to the result. Possible match strategies are `ISOMORPHISM` and `HOMOMORPHISM`
`query(String query, String constructionPattern, boolean attachData, MatchStrategy vertexStrategy, MatchStrategy edgeStrategy, GraphStatistics graphStatistics)`	Same as above with the additional possibility to define a construction pattern

`query` example

Consider the social graph form the quickstart. Say, we are interested in analyzing Marc's circle of friends. Using the query method, we could define a pattern which describes the relation "friend of Marc"

String gdlQuery = "MATCH (a:Person)-[e0:friendsWith]->(b:Person)-[e1:friendsWith]->(c:Person)," +
      " (c)-[e2:friendsWith]->(b)-[e3:friendsWith]->(a)" +
      " WHERE a <> c AND b.name = \"Marc\"";

GraphCollection friendsOfMarc = n1.query(gdlQuery);

The query above results in the following graph collection:

gc[
    g0:graph[
	(a:Person {name: Sara, age: 33})-[:friendsWith]->(b:Person {name: Marc, age:40})-[:friendsWith]->(c:Person {name: Jacob, age:27}),
	(c)-[:friendsWith]->(b)-[:friendsWith]->(a)
    ],
    g1:graph[
	(a:Person {name: Jacob, age: 33})-[:friendsWith]->(b:Person {name: Marc, age:40})-[:friendsWith]->(c:Person {name: Sara, age:27}),
	(c)-[:friendsWith]->(b)-[:friendsWith]->(a)
    ]
]

`construct` example

A useful feature of the query method is the ability to define new graph elements based on variable bindings of the match pattern. Let's define a new relation between Marc's friends called 'acquaintance':

Note: It is necessary to define the new element with an assigned variable. This could change in the future.

String gdlQuery = "MATCH (a:Person)-[e0:friendsWith]->(b:Person)-[e1:friendsWith]->(c:Person)," +
      " (c)-[e2:friendsWith]->(b)-[e3:friendsWith]->(a)" +
      " WHERE a <> c AND b.name = \"Marc\"";

String constructionPattern = "(b)<-[e0]-(a)-[e_new0:acquaintance]->(c)<-[e1]-(b)," +
      " (c)-[e_new1:acquaintance]->(a)," +
      " (c)-[e2]->(b)-[e3]->(a)";

GraphCollection acquaintances = n1.query(gdlQuery, constructionPattern);

Resulting graph collection:

gc[
    g0:graph[
	(a:Person {name: Sara, age: 33})-[:friendsWith]->(b:Person {name: Marc, age:40})-[:friendsWith]->(c:Person {name: Jacob, age:27}),
	(c)-[:friendsWith]->(b)-[:friendsWith]->(a),
	(a)-[:acquaintance]->(c)-[:acquaintance]->(a)
    ],
    g1:graph[
	(a:Person {name: Jacob, age: 33})-[:friendsWith]->(b:Person {name: Marc, age:40})-[:friendsWith]->(c:Person {name: Sara, age:27}),
	(c)-[:friendsWith]->(b)-[:friendsWith]->(a),
	(c)-[:acquaintance]->(a)-[:acquaintance]->(c)
    ]
]

Cypher pattern matching with CAPF

CAPF (Cypher for Apache Flink) aims to bring the openCypher grammar to Gradoop. The operator is currently work in progress, with the repository being here. CAPF is based on Neo4j's Morpheus project.

CAPFQuery

This class enables the execution of a cypher query on a LogicalGraph via the CAPF API. When instantiating a CAPFQuery , it is always necessary to provide a cypher query string as well as the current ExecutionEnvironment. Optionally, one might also provide an MetaData object in order to speed up the execution time. For a detailed discussion of the cypher syntax please look here.

Currently supported clauses:

Clause
MATCH
RETURN
UNION

Also, all aggregating function are supported. A detailed discussion of them can be found here.

CAPFQueryResult

This class is a wrapper containing the results of a CAPF query. Since the executed cypher query can either result in a GraphCollection or in Flink Table, this class provides methods to access both representations of the data.

Handling a Flink Table is a bit messy at the moment, but there will be support for it in Gradoop in the near future.

CAPF examples

The following examples operate on the combination of the LogicalGraph's n1 and n2 from our quickstart example. The variable env refers the the current ExecutionEnvironment.

Simple matching example

In order to answer find out who the second grade friends of Marc are, one could write a cypher query as follows:

String cypherQuery = "MATCH (a:Person {name: \"Marc\"})-[:friendsWith]->(b:Person)-[:friendsWith]->(c:Person)" +
 "WHERE a <> c" +
 "RETURN c";

// `env` refers to the execution environment
CAPFQuery query = new CAPFQuery(cypherQuery, env);

// `nc` refers to the combination of the LogicalGraph's n1 and n2 from our quickstart
CAPFQueryResult result = query.execute(nc);

GraphCollection resultCollection = result.getGraphs();

The resulting GraphCollection looks as follows:

g0:graph[
    (v0:Person {name: "Mike"})
],
g1:graph[
    (v1:Person {name: "Alice"})
]

UNION example

String cypherQuery = "MATCH (v:Person) RETURN v.name AS name UNION MATCH (v:Company) RETURN v.name AS name"
CAPFQueryResult result = new CAPFQuery(cypherQuery, env).execute(nc);

// this query does not result in a graph so we need to access the data via Flink's Table API
BatchTableEnvironment tableEnvironment = (BatchTableEnvironment) result.getTable().tableEnv();
DataSet<Row> resultDataSet = tableEnvironment.toDataSet(result.getTable(), TypeInformation.of(Row.class)).javaSet();
List<Row> resultList = resultDataSet.collect();

System.out.println(resultList);

The code results in the following output: [Jil, Mike, Acme Corp, Globex Inc., Marc, Bob, Paul, Sara, Alice, Jacob]

Aggregating example

Compute the average age of all persons in our graph:

String cypherQuery = "MATCH (v:Person) RETURN avg(v.age)"
CAPFQueryResult result = new CAPFQuery(cypherQuery, env).execute(nc);

// this query does not result in a graph so we need to access the data via Flink's Table API
BatchTableEnvironment tableEnvironment = (BatchTableEnvironment) result.getTable().tableEnv();
DataSet<Row> resultDataSet = tableEnvironment.toDataSet(result.getTable(), TypeInformation.of(Row.class)).javaSet();
List<Row> resultList = resultDataSet.collect();

System.out.println(resultList);

The code results in the following output: [30]

Transformation

By using the transformation operation on a LogicalGraph, one can alter elements of a given graph. This alteration can either affect the graph-head, edges or vertices. One could also think of a Gradoop transformation as a modification to a LogicalGraph that changes an EPGM elements data, but not its identity or the topological structure of the graph.

Every transformation method of the LogicalGraph expects a TransformationFunction as input. This interface defines a method called apply. The method takes the current version of the element and a copy of that element as input. The copy is initialized with the current structural information (i.e. identifiers, graph membership, source / target identifiers). The implementation is able to transform the element by either updating the current version and returning it or by adding necessary information to the new entity and returning it.

The following transformation functions are defined as methods for the LogicalGraph class:

Function	Description	Expects
`transform`	Transforms the elements of the logical graph using the given transformation functions.	`TransformationFunction<GraphHead>`, `TransformationFunction<Vertex>`, `TransformationFunction<Edge>`
`transformGraphHead`	Transforms the graph head of the logical graph using the given transformation function.	`TransformationFunction<GraphHead>`
`transformVertices`	Transforms the vertices of the logical graph using the given transformation function.	`TransformationFunction<Vertex>`
`transformEdges`	Transforms the edges of the logical graph using the given transformation function.	`TransformationFunction<Edge>`

Transformation example

Consider the social graph from the quickstart example. Suppose every person from the graph g1 attended the same university. We could add this new information to the graph like this:

LogicalGraph n1WithUni = n1
  .subgraph(v -> v.getLabel().equals("Person"), e -> true)
  .transformVertices((currentVertex, transformedVertex) -> {
    currentVertex.setProperty("university", "Leipzig University");
    return currentVertex;
  });

The resulting graph has the following structure:

g1:graph[
	(p1:Person {name: "Bob", age: 24, university: "University Leipzig"})-[:friendsWith]->(p2:Person{name: "Alice", age: 30, university: "University Leipzig"})-[:friendsWith]->(p1)
  (p2)-[:friendsWith]->(p3:Person {name: "Jacob", age: 27, university: "University Leipzig"})-[:friendsWith]->(p2)
 	(p3)-[:friendsWith]->(p4:Person{name: "Marc", age: 40, university: "University Leipzig"})-[:friendsWith]->(p3)
  (p4)-[:friendsWith]->(p5:Person{name: "Sara", age: 33, university: "University Leipzig"})-[:friendsWith]->(p4)
]

Property transformation

The property transformation operation is a convenience function for a Transformation on properties. This alteration can either affect the properties of graph-head, edges or vertices.

Every property transformation method of the LogicalGraph expects a PropertyTransformationFunction as input. This interface defines a method called execute. The method takes the current version of the PropertyValue as input. The implementation is able to transform the PropertyValue by updating the current version and returning it.

The following property transformation functions are defined as methods for the LogicalGraph class:

Function	Description	Expects
`transformGraphHeadProperties`	Transforms the value of a graphHead property matching the specified property key using the given transformation functions.	`String` propertyKey, `PropertyTransformationFunction`
`transformVertexProperties`	Transforms the value of a vertex property matching the specified property key using the given transformation functions.	`String` propertyKey, `PropertyTransformationFunction`
`transformEdgeProperties`	Transforms the value of an edge property matching the specified property key using the given transformation functions.	`String` propertyKey, `PropertyTransformationFunction`

Property transformation example

Consider the social graph from the quickstart example. Suppose that we want to round the age of every person from the graph g1. We could update the age property like this:

LogicalGraph n1WithRoundedAge = n1
  .subgraph(v -> v.getLabel().equals("Person"), e -> true)
  .callForGraph(new PropertyTransformation<>("age",
    null, property -> {
    property.setLong(((long)(Math.round(property.getLong()/10.0) * 10)));
    return property;
  }, null));

The resulting graph has the following structure:

g1:graph[
	(p1:Person {name: "Bob", age: 20})-[:friendsWith]->(p2:Person{name: "Alice", age: 30})-[:friendsWith]->(p1)
  (p2)-[:friendsWith]->(p3:Person {name: "Jacob", age: 30})-[:friendsWith]->(p2)
 	(p3)-[:friendsWith]->(p4:Person{name: "Marc", age: 40})-[:friendsWith]->(p3)
  (p4)-[:friendsWith]->(p5:Person{name: "Sara", age: 30})-[:friendsWith]->(p4)
]

Grouping

By using the grouping operator you can achieve a structural grouping of vertices and edges to condense a graph and thus help to uncover insights about patterns and statistics hidden in the graph.

Strategies

Gradoop defines the following two strategies applicable for grouping: GROUP_REDUCE and GROUP_COMBINE. By default GROUP_REDUCE is used.

In some cases with bad distribution of vertex grouping keys it may be desirable to combine vertex tuples using a CombineGroup transformation to potentially reduce network traffic. This behaviour is represented by the GROUP_COMBINE strategy. For more information see this.

Functions

For grouping any of the following functions can be applied to a LogicalGraph object.

Function	Description	Expects
`groupBy`	Groups vertices based on the given vertex grouping keys which can be property keys or labels.	`List<String>` vertexGroupingKeys
`groupBy`	Groups vertices and edges based on the given grouping keys which can be property keys or labels.	`List<String>` vertexGroupingKeys, `List<String>` edgeGroupingKeys
`groupBy`	Groups vertices and edges based on the given grouping keys which can be property keys or labels. Additionally aggregates are calculated using the given aggregate functions (MaxProperty, MinProperty, SumProperty, Count) and a strategy has to be chosen.	`List<String>` vertexGroupingKeys, `List<AggregateFunction>` vertexAggregateFunctions, `List<String>` edgeGroupingKeys, `List<AggregateFunction>` edgeAggregateFunctions, `GroupingStrategy` groupingStrategy

Grouping example

Consider the social graph from the quickstart example. For our grouping example we will first extract all nodes of type Person and then calculate a rounded age which we are going to use for grouping:

LogicalGraph preprocessedGraph = n1
  .vertexInducedSubgraph(new ByLabel<>("Person"))
  .transformVertices((currentVertex, transformedVertex) -> {
      int age = currentVertex.getPropertyValue("age").getInt();
    currentVertex.setProperty("age_rounded", (int) MathUtils.round(age, -1));
      return currentVertex;
    });

LogicalGraph summarizedGraph = preprocessedGraph
  .groupBy(Arrays.asList(Grouping.LABEL_SYMBOL, "age_rounded"),
    singletonList(new Count("count")),
    singletonList(Grouping.LABEL_SYMBOL),
    singletonList(new Count("count")),
    GroupingStrategy.GROUP_REDUCE);

The resulting graph is shown below the next example.

Grouping example using GroupingBuilder

Alternatively you can use the GroupingBuilder which offers more options. The following example leads to the same result as the traditional grouping operation shown above.

// Perform grouping on preprocessedGraph
LogicalGraph summarizedGraph = new Grouping.GroupingBuilder()
  .setStrategy(GroupingStrategy.GROUP_REDUCE)
  .addVertexGroupingKey("age_rounded")
  .useVertexLabel(true)
  .useEdgeLabel(true)
  .addVertexAggregateFunction(new Count())
  .addEdgeAggregateFunction(new Count())
  .<...>build()
  .execute(preprocessedGraph);

The resulting graph has the following structure:

Label Groups

Imagine we would not have used a transformation function to remove nodes which are not of type Person, but still want to group over our age_rounded property for those nodes. For this job we could define a VertexLabelGroup as the following example shows:

// Perform grouping on preprocessedGraph
LogicalGraph summarizedGraph = new Grouping.GroupingBuilder()
  .setStrategy(GroupingStrategy.GROUP_REDUCE)
  .useVertexLabel(true)
  .useEdgeLabel(true)
  .addVertexLabelGroup("Person", "PersonAge", Arrays.asList("age_rounded"), Arrays.asList(new Count()))
  .<...>build()
  .execute(preprocessedGraph);

This would lead to the following resulting graph:

For adding a VertexLabelGroup you can use one of the following four functions:

Function	Description	Expects
`addVertexLabelGroup`	Groups vertices with the specified label based on the given grouping keys which can be property keys or labels.	`String` label, `List<String>` groupingKeys
`addVertexLabelGroup`	Groups vertices with the specified label based on the given grouping keys which can be property keys or labels. Super vertices are created using the supplied super vertex label.	`String` label, `String` superVertexLabel, `List<String>` groupingKeys
`addVertexLabelGroup`	Groups vertices with the specified label based on the given grouping keys which can be property keys or labels and calculate aggregates using the supplied aggregate functions.	`String` label, `List<String>` groupingKeys, `List<AggregateFunction>` aggregateFunctions
`addVertexLabelGroup`	Groups vertices with the specified label based on the given grouping keys which can be property keys or labels and calculate aggregates using the supplied aggregate functions. Super vertices are created using the supplied super vertex label.	`String` label, `String` superVertexLabel, `List<String>` groupingKeys, `List<AggregateFunction>` aggregateFunctions

This can be applied analogously for edges using an EdgeLabelGroup.

KeyedGrouping

The keyed grouping operator provides an alternative implementation of graph grouping. This operator can be used to gain insight into the structor of a graph.
While the Grouping operator only supports grouping on labels, properties and label groups, any attribute (or derived attribute) of an element may be used for grouping. This is achieved through the use of user-definable key-functions.

Key functions

Key-functions are used to extract a key from an element. This key may have any type supported by Flink. Elements with the same key according to all key-functions set on the operator are reduced to a single super-element. The user may define their own key-functions by implementing the KeyFunction interface. Some functions are also provided by Gradoop, they may be instantiated from the GroupingKeys factory class. Currently the following keys are provided:

Function	Description	Parameters	Key type
`label`¹	Extract the label of elements.	None.	`String`
`labelSpecific`	Extract different keys for elements, based on the label of the element.	A `Map` mapping labels to `List`s of key functions and (optionally) a `Map` from old label to new label, which may be used to change the labels of elements.	A `Tuple` containing an internal reference to the label of the element and the values of key-functions defined for that label.
`nothing`¹	Extract a constant value for each element. This will result in all elements having the same key, which may be used to reduce all elements to a single element.	None.	`Boolean`
`property`¹	Extract a property from each element.	The property key.	`byte[]` (the internal representation of a `PropertyValue`

Note: labelSpecific only supports key-functions that provide a default value. Provided functions annotated with ¹ are examples of those functions.

Operator usage

The KeyedGrouping operator has to called using the callForGraph method. It may be instantiated using the constructor of the KeyedGrouping class or from the createInstance methods of KeyedGroupingUtils. The latter provides compatibility with the older Grouping operator.
The constructor takes the following parameters:

A List of vertex-grouping keys. (KeyFunctions)
A List of vertex-aggregate functions. (AggregateFunctions)
A List of edge-grouping keys. (KeyFunctions)
A List of edge-aggregate functions. (AggregateFunctions)

All of there parameters are optional and may be null or empty lists.

Example

// Assuming the same input as the Grouping example.
List<KeyFunction<EPGMVertex, ?>> vertexKeys = Arrays.asList(
  GroupingKeys.label(), GroupingKeys.property("age_rounded"));
LogicalGraph summarizedGraph = preprocessedGraph.callForGraph(new KeyedGrouping<>(
  vertexKeys,
  singletonList(new Count("count")),
  singletonList(GroupingKeys.label()),
  singletonList(new Count("count"))));

Taking the same input graph as with the Grouping example and running the same preprocessing steps, the result should be identical to that example.

Implementing a key-function

When the user wants to use a custom key, they have to implement the KeyFunction interface. This interface contains multiple functions:

Name	Description	Parameters	Return value
`getKey`	Extracts the key object from an element.	The element from which the key is extracted.	The key object. (may not be `null`)
`getType`	Gets type information about the keys extracted by the `getKey` method.	None.	A `TypeInformation` object used by Flink to determine the type of extracted keys.
(`addKeyToElement`)	Add extracted keys back to elements. This method is called on super-elements to store the key of the group on the super-element.	An element and a key.	Nothing. (`void`)

Note: Implementing the addKeyToElement method is optional. A default implementation, which simply discards the key and leaves the element unchanged, is provided by the interface. It is, however, recommended to overwrite this method to identify element groups in the resulting summary graph of the KeyedGrouping operator.

To support label-specific grouping with the labelSpecific key-function, KeyFunctionWithDefaultValue has to be implemented instead. This interface only extends the KeyFunction interface by one method, namely getDefaultKey. This method has no parameters and should return the default key used for all elements. This key has the be of the same type as returned by getKey and may also not be null.

Example

The grouping example is using a preprocessing step before executing the grouping step. The same result may be achieved by defining a custom key-function:

// Define the key-function.
KeyFunction<EPGMVertex, Integer> customKey = new KeyFunction<EPGMVertex, Integer>() {
  @Override public Integer getKey(EPGMVertex v) {
    return (int) MathUtils.round(v.getPropertyValue("age").getInt(), -1);
  }

  @Override public TypeInformation<Integer> getType() {
    return BasicTypeInfo.INT_TYPE_INFO;
  }

  @Override public void addKeyToElement(T element, Object key) {
    element.setProperty("age_rounded", PropertyValue.create(key));
  }
}
// This key function extracts the property "age" and rounds it.
// The rounded property is stored as "age_rounded" on the super-vertex.
List<KeyFunction<EPGMVertex, ?>> vertexKeys = Arrays.asList(
  GroupingKeys.label(), customKey);
LogicalGraph summarizedGraph = n1
  .vertexInducedSubgraph(new ByLabel<>("Person"))
  .callForGraph(new KeyedGrouping<>(
  vertexKeys,
  singletonList(new Count("count")),
  singletonList(GroupingKeys.label()),
  singletonList(new Count("count"))));

Subgraph

A subgraph is a logical graph which contains only vertices and edges that fulfill a given vertex and/or edge predicate. The associated operators are implemented as functions .subgraph(), .vertexInducedSubgraph() and .edgeInducedSubgraph() that can be called on a LogicalGraph instance. The former allows one of five available execution strategies to be defined, which are listed in the table below.

Strategy	Description
`Subgraph.Strategy.BOTH`	Applies both filter functions on the input vertex and edge data set. This is the standard strategy of the `.subgraph()` function. Note, that this operator does not verify the consistency of the resulting graph.
`Subgraph.Strategy.BOTH_VERIFIED`	Applies both filter functions on the input vertex and edge data set. In addition, this strategy verifies the consistency of the output graph.
`Subgraph.Strategy.VERTEX_INDUCED`	Only applies the vertex filter function and adds the incident edges connecting those vertices via a join. This strategy is applied by the `.vertexInducedSubgraph()` function.
`Subgraph.Strategy.EDGE_INDUCED`	Only applies the edge filter function and computes the connecting vertices. This strategy is applied by the `.edgeInducedSubgraph()` function.
`Subgraph.Strategy.EDGE_INDUCED_PROJECT_FIRST`	Only applies the edge filter function and computes the resulting vertices via: `DISTINCT((π_source(E) U π_target(E))) \|><\| V`

Frequently used filter expressions are predefined in Gradoop. Make sure to check, if the filter is applicable for vertices (V) or edges (E) or both.

Subgraph example

The following code shows examples of the different subgraph functions and the use of combined filters:

// Apply vertex and edge filter to the given graph n1 from quickstart
LogicalGraph subgraph = n1
    .subgraph(
        new LabelIsIn<>("Person", "Company"),
        new ByLabel<>("worksAt")
    );

// Apply vertex and edge filter with verifying the consistency (no "Company" vertices included)
LogicalGraph verifiedSubgraph = n1
    .subgraph(
        new LabelIsIn<>("Person", "Company"),
        new ByLabel<>("friendsWith")
        Subgraph.Strategy.BOTH
    ).verify();

// Apply only a vertex filter function
LogicalGraph vertexInduced = n1
    .vertexInducedSubgraph(new ByProperty<>("age"));

// Apply only a edge filter function
LogicalGraph edgeInduced = n1
    .edgeInducedSubgraph(new BySourceId<>(myGradoopId));

// Apply a combined vertex filter function:
LogicalGraph fromCombinedFilters = n1
    .vertexInducedSubgraph(
        new ByLabel<EPGMVertex>("Person").and(new ByProperty<>("Name", PropertyValue.create("Marc")).negate())
            .or(new ByLabel<>("Company")));

Sampling

Graph sampling is a technique to pick a subset of vertices and/or edges from the original graph. This can be used to obtain a smaller graph if the whole graph is known or, if the graph is unknown, to explore the graph. The provided techniques are either vertex based, edge based, traversal based or a combination of these. They are implemented as operators for Gradoops LogicalGraph and return the sampled graph. The following sampling methods are provided:

Method	Description
`RandomVertexSampling`	Computes a vertex sampling of the graph. Retains randomly chosen vertices of a given relative amount. Retains all edges which source- and target-vertices were chosen. There may retain some unconnected vertices in the sampled graph. Parameters: `sampleSize`: relative amount of vertices in the sampled graph `randomSeed`: seed-value for the random number generator
`RandomNonUniformVertexSampling`	Computes a vertex sampling of the graph. Retains randomly chosen vertices of a given relative amount and all edges which source- and target-vertices were chosen. A degree-dependent value is taken into account to have a bias towards high-degree vertices. There may retain some unconnected vertices in the sampled graph. Parameters: `sampleSize`: relative amount of vertices in the sampled graph `randomSeed`: seed-value for the random number generator
`RandomLimitedDegreeVertexSampling`	Computes a vertex sampling of the graph. Retains all vertices with a degree greater than a given degree threshold and degree type. Also retains randomly chosen vertices with a degree smaller or equal this threshold. Retains all edges which source- and target-vertices were chosen. Parameters: `sampleSize`: relative amount of vertices with degree smaller or equal the given degreeThreshold `randomSeed`: seed-value for the random number generator `degreeThreshold`: threshold for degrees of sampled vertices `degreeType`: type of degree considered for sampling, see `VertexDegree`
`RandomVertexNeighborhoodSampling`	Computes a vertex sampling of the graph. Retains randomly chosen vertices of a given relative amount and includes all neighbors of those vertices in the sampling. Retains all edges which source- and target-vertices were chosen. Parameters: `sampleSize`: relative amount of vertices in the sampled graph `randomSeed`: seed-value for the random number generator `neighborType`: type of neighborhood considered for sampling, see `Neighborhood`
`RandomEdgeSampling`	Computes an edge sampling of the graph. Retains randomly chosen edges of a given relative amount and their associated source- and target-vertices. No unconnected vertices will retain in the sampled graph. Parameters: `sampleSize`: relative amount of vertices in the sampled graph `randomSeed`: seed-value for the random number generator
`RandomVertexEdgeSampling`	Computes an edge sampling of the graph. First selects randomly chosen vertices of a given relative amount and all edges which source- and target-vertices were chosen. Then randomly chooses edges of a given relative amount from this set of edges and their associated source- and target-vertices. No unconnected vertices will retain in the sampled graph. Parameters: `vertexSampleSize`: relative amount of vertices in the sampled graph `edgeSampleSize`: relative amount of edges in the sampled graph `randomSeed`: seed-value for the random number generator `vertexEdgeSamplingType`: can be simple, nonUniform or nonUniformHybrid
`PageRankSampling`	Computes a PageRank-Sampling of the graph. Uses the Gradoop-Wrapper of Flinks PageRank-algorithm. Retains all vertices with a PageRank-score greater or equal/smaller than a given sampling threshold - depending on the Parameters set. If all vertices got the same PageRank-score, it can be decided whether to sample all vertices or none of them. Retains all edges which source- and target-vertices were chosen. There may retain some unconnected vertices in the sampled graph. Parameters: `dampeningFactor`: probability of following an out-link, otherwise jump to a random vertex `maxIteration`: maximum number of algorithm iterations `threshold`: threshold for the PageRank-score, determining if a vertex is sampled, e.g. 0.5 `sampleGreaterThanThreshold`: sample vertices with a PageRank-score greater (`true`) or equal/smaller (`false`) the threshold `keepVerticesIfSameScore`: sample all vertices (`true`) or none of them (`false`), if all got the same PageRank-score

Sampling example for RandomLimitedDegreeVertexSampling

// read the original graph
LogicalGraph graph = readLogicalGraph("path/to/graph", "json");

// define sampling parameters
float sampleSize = 0.3f; // retain vertex with degree smaller or equal degreeThreshold with a probability of 30%
long randomSeed = 0L; // random seed can be 0
long degreeThreshold = 3L; // sample all vertices with degree of degreeType greater than 3
VertexDegree degreeType = VertexDegree.IN; // consider in-degree for sampling

// call sampling method
LogicalGraph sample = new RandomLimitedDegreeVertexSampling(
      sampleSize, randomSeed, degreeThreshold, degreeType).sample(graph);

Group by RollUp

The rollUp operation allows aggregations at multiple levels using a single query.

The operator can be used by instantiating the class directly with the desired options or by calling one of the shown convenience methods of LogicalGraph:

Function	Description	Returns
`groupVerticesByRollUp`	Runs rollUp operation with changing aggregation level via varying combinations of vertex grouping keys.	`GraphCollection` with the differently grouped graphs
`groupEdgesByRollUp`	Runs rollUp operation with changing aggregation level via varying combinations of edge grouping keys.	`GraphCollection` with the differently grouped graphs

Both methods require the same parameters:

A list containing the vertex property keys used for grouping.
A list of property value aggregators used for grouping on vertices.
A list containing the edge property keys used for grouping.
A list of property value aggregators used for grouping on edges.

For the following examples, consider a graph containing persons as vertices and phone calls between those persons as edges. Vertices do contain the country, state and city as separate properties and edges, on the other hand, the time of calling with year, month, day, hour and minute as separate properties.

The following code would lead to a graph collection containing graphs grouped by {(:label, country, state, city), (:label, country, state), (:label, country), (:label)}. You would be able to see how many persons initiated a phone call living in the exact same city, state, country and for all persons using a single query.

// Apply vertex based rollUp on previously described graph
List<String> vertexGroupingKeys =
  Arrays.asList(Grouping.LABEL_SYMBOL, "country", "state", "city");
List<AggregateFunction> vertexAggregators =
  Collections.singletonList(new Count("count"));
List<String> edgeGroupingKeys = Collections.emptyList();
List<AggregateFunction> edgeAggregators = Collections.emptyList();

GraphCollection output =
  n1.groupVerticesByRollUp(
    vertexGroupingKeys,
    vertexAggregators,
    edgeGroupingKeys,
    edgeAggregators);

Analogously the following code would lead to an edge based rollUp and therefore to a graph collection containing graphs grouped by {(year, month, day, hour, minute), (year, month, day, hour), (year, month, day), (year, month), (year)}. You would be able to see how many persons initiated a phone call within the exact same minute, hour, on the same day, month and year.

// Apply edge based rollUp on previously described graph
List<String> vertexGroupingKeys = Collections.emptyList();
List<AggregateFunction> vertexAggregators = Collections.emptyList();
List<String> edgeGroupingKeys =
  Arrays.asList("year", "month", "day", "hour", "minute");
List<AggregateFunction> edgeAggregators =
  Collections.singletonList(new Count("count"));

GraphCollection output =
  n1.groupVerticesByRollUp(
    vertexGroupingKeys,
    vertexAggregators,
    edgeGroupingKeys,
    edgeAggregators);

Split

The split operator splits a given LogicalGraph into a GraphCollection based on the values of a pre-defined Property that is present for a set of vertices in the graph. The operator supports overlapping logical graphs, where a vertex can be in more than one logical graph. Edges, where source and target vertex have no graphs in common, are removed from the resulting collection.

The following code showcases the simplest application of the split operator, which is LogicalGraph's splitBy method:

Given is an undirected graph which models the relationships of employees. The graph consists of five vertices labeled employee which are connected by edges labeled worksWith.

String coworkingGraph = "coworking[" +
                "(v0:employee {age: 30, employedSince: 2010, role: \"developer\"})" +
                "(v1:employee {age: 25, employedSince: 2010, role: \"developer\"})" +
                "(v2:employee {age: 36, employedSince: 2013, role: \"manager\"})" +
                "(v3:employee {age: 29, employedSince: 2013, role: \"developer\"})" +
                "(v4:employee {age: 35, employedSince: 2013, role: \"manager\"})" +
                "(v0)-[:worksWith]->(v1)-[:worksWith]->(v0)" +
                "(v0)-[:worksWith]->(v3)-[:worksWith]->(v0)" +
                "(v1)-[:worksWith]->(v2)-[:worksWith]->(v1)" +
                "(v1)-[:worksWith]->(v4)-[:worksWith]->(v1)" +
                "(v2)-[:worksWith]->(v4)-[:worksWith]->(v2)" +
                "(v2)-[:worksWith]->(v3)-[:worksWith]->(v2)" +
                "(v3)-[:worksWith]->(v4)-[:worksWith]->(v1)" +
                "]";

After we initialized the database and retrieved a LogicalGraph, we can use splitBy as follows:

GraphCollection result = coworking.splitBy("role");

This yields the following GraphCollection:

gc [
  g0: [
    (v2)-[:worksWith]->(v4)
    (v4)-[:worksWith]->(v2)
  ]
  g1: [
    (v0)-[:worksWith]->(v1)
    (v1)-[:worksWith]->(v0)
    (v3)-[:worksWith]->(v0)
    (v0)-[:worksWith]->(v3)
  ]
]

Home

Getting Started

Examples

Data Model

Temporal Graph Support

Sources and Sinks

Graph Operators

Graph Analyzing Algorithms

Gradoop Store

Changelog

JavaDoc (Snapshot)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unary Logical Graph Operators

Aggregation

Aggregate example

Pattern Matching

GDL pattern matching with `query()`

`query` example

`construct` example

Cypher pattern matching with CAPF

CAPFQuery

CAPFQueryResult

CAPF examples

Transformation

Transformation example

Property transformation

Property transformation example

Grouping

Strategies

Functions

Grouping example

Grouping example using GroupingBuilder

Label Groups

KeyedGrouping

Key functions

Operator usage

Example

Implementing a key-function

Example

Subgraph

Subgraph example

Sampling

Sampling example for RandomLimitedDegreeVertexSampling

Group by RollUp

Split

Clone this wiki locally

Unary Logical Graph Operators

Aggregation

Aggregate example

Pattern Matching

GDL pattern matching with query()

query example

construct example

Cypher pattern matching with CAPF

CAPFQuery

CAPFQueryResult

CAPF examples

Transformation

Transformation example

Property transformation

Property transformation example

Grouping

Strategies

Functions

Grouping example

Grouping example using GroupingBuilder

Label Groups

KeyedGrouping

Key functions

Operator usage

Example

Implementing a key-function

Example

Subgraph

Subgraph example

Sampling

Sampling example for RandomLimitedDegreeVertexSampling

Group by RollUp

Split

Clone this wiki locally

GDL pattern matching with `query()`

`query` example

`construct` example