Gradoop DataSinks

This section provides an overview of gradoop specific data sinks.

Gradoop Data Sinks
DOTDataSink
GDLDataSink
HBaseDataSink
AccumuloDataSink
TLFDataSink
JSONDataSink (deprecated since v0.5.0)
CSVDataSink

DOTDataSink

Writes an EPGM representation into one DOT file. For more information see the Wikipedia article of DOT. The Path can be local (file://) or HDFS (hdfs://). The format is documented at DOTFileFormat.

DOTDataSink example

LogicalGraph logicalGraph = ...
DataSink dataSink = new DOTDataSink("/path/to/out.dot", true);
dataSink.write(logicalGraph);

Example out.dot

digraph 0
{
   gradoopId1 [label="person",name="Bob",age="20"];
   gradoopId2 [label="person",name="Alice",age="20"];
   gradoopId3;
   gradoopId4;
   gradoopId1->gradoopId2 [label="knows",since="2003"];
   gradoopId2->gradoopId1 [label="knows",since="2003"];
   gradoopId3->gradoopId4;
}

GDLDataSink

Writes an EPGM representation into a GDL file. The Path can be local (file://) or HDFS (hdfs://).

GDLDataSink example

LogicalGraph logicalGraph = ...
DataSink dataSink = new GDLDataSink("/path/to/out.gdl");
dataSink.write(logicalGraph);

Example out.gdl

g0:Community {interest:"Hadoop",vertexCount:3}[
(v_Person_1:Person {gender:"m",city:"Dresden",name:"Dave",age:40})
(v_Person_2:Person {locIP:"127.0.0.1",gender:"m",city:"Berlin",name:"Frank",age:35})
(v_Person_3:Person {gender:"f",city:"Dresden",name:"Carol",age:30})


(v_Person_3)-[e_knows_15:knows{since:2014}]->(v_Person_1)
(v_Person_2)-[e_knows_1:knows{since:2015}]->(v_Person_1)
(v_Person_2)-[e_knows_2:knows{since:2015}]->(v_Person_3)
(v_Person_1)-[e_knows_14:knows{since:2014}]->(v_Person_3)
(v_Person_3)-[e_knows_15]->(v_Person_1)
(v_Person_1)-[e_knows_14]->(v_Person_3)
(v_Person_3)-[e_knows_15]->(v_Person_1)
]

HBaseDataSink

Converts runtime representation of EPGM elements into persistent representations and writes them to HBase. By default, graphs are stored in three tables: graph_heads, vertices and edges. The following example shows the result of inserting the graph below (represented as GDL) in a HBase database.

Example graph as GDL

g1:graph[(p1:Person{name:"Bob",age:24})-[:friendsWith]->(p2:Person{name:"Alice",age:30})]

Resulting HBase Table structure (ids are shortened for readability)

hbase> scan 'graph_heads'
ROW       COLUMN+CELL                                                                                                        
 x03      column=e:x06,  timestamp=1521125952854, value=                                        
 x03      column=m:l,    timestamp=1521125952854, value=graph                                                                   
 x03      column=v:x04,  timestamp=1521125952854, value=                                        
 x03      column=v:x05,  timestamp=1521125952854, value=                                        
1 row(s) in 0.0850 seconds

hbase> scan 'vertices'
ROW       COLUMN+CELL                                                                                                        
 x04      column=m:g,    timestamp=1521125953029, value=x03                                       
 x04      column=m:l,    timestamp=1521125953029, value=Person                                                                  
 x04      column=oe:x06, timestamp=1521125953029, value=                                       
 x04      column=p:age,  timestamp=1521125953029, value=\x02\x00\x00\x00\x18                                                  
 x04      column=p:name, timestamp=1521125953029, value=\x06\x00\x03Bob                                                      
 x05      column=ie:x06, timestamp=1521125953002, value=                                       
 x05      column=m:g,    timestamp=1521125953002, value=x03                                       
 x05      column=m:l,    timestamp=1521125953002, value=Person                                                                  
 x05      column=p:age,  timestamp=1521125953002, value=\x02\x00\x00\x00\x1E                                                  
 x05      column=p:name, timestamp=1521125953002, value=\x06\x00\x05Alice                                                    
2 row(s) in 0.0240 seconds

hbase> scan 'edges'
ROW       COLUMN+CELL                                                                                                        
 x06      column=m:g,    timestamp=1521125952789, value=x03                                       
 x06      column=m:l,    timestamp=1521125952789, value=friendsWith                                                             
 x06      column=m:s,    timestamp=1521125952789, value=x04                                       
 x06      column=m:t,    timestamp=1521125952789, value=x05                                       
1 row(s) in 0.0320 seconds

The graph_heads table stores the graph declaration and the row-keys of its vertices and edges.

The vertices table contains both vertices with five different columns: row-key of the graph (column=m:g), label of the vertex (column=m:l), row-key of the incoming/outgoing edge (column=ie:x06 or column=oe:x06) and both properties.

The single edge is stored at edges table, including columns to reference to the row-id of the graph (column=m:g), label of the edge (column=m:l), the row-key of the source vertex (column=m:s) and the row-key of the target vertex (column=m:t).

AccumuloDataSink

Writes an EPGM representation of a graph or a graph collection to a Apache Accumulo® store. By default, graphs are stored in three tables: graph, vertex and edge.

AccumuloDataSink example

// flink execution env
ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment();
// create gradoop accumulo configuration
GradoopAccumuloConfig config = GradoopAccumuloConfig.create(env)  
  .set(GradoopAccumuloConfig.ACCUMULO_USER, {user})
  .set(GradoopAccumuloConfig.ACCUMULO_INSTANCE, {instance})
  .set(GradoopAccumuloConfig.ZOOKEEPER_HOSTS, {comma separated zookeeper host list})
  .set(GradoopAccumuloConfig.ACCUMULO_PASSWD, {password})
  .set(GradoopAccumuloConfig.ACCUMULO_TABLE_PREFIX, {table prefix});
// create store
AccumuloStore graphStore = new AccumuloStore(config);
// create sink
AccumuloDataSink dataSink = new AccumuloDataSink(graphStore);

dataSink.write(someOtherSource.getGraphCollection(), true);

TLFDataSink

Writes an EPGM representation into one TLF file. See The vertex-transitive TLF-planar graphs for further details about TLF. Paths can be local (file://) or HDFS (hdfs://). The exact format is documented in TLFFileFormat.

TLFDataSink example

DataSource dataSource = ...
DataSink dataSink = new TLFDataSink("/path/to/out.tlf", config);
dataSink.write(dataSource.getGraphCollection(), true);

JSONDataSink (deprecated since v0.5.0)

Write an EPGM representation into three separate JSON files: one for vertex declaration, one for edge declaration and one for the graph declaration.Paths can be local (file://) or HDFS (hdfs://). The exact format is documented in the classes GraphHeadToJSON, VertexToJSON, EdgeToJSON.

JSONDataSink example

String graphFile = "/path/to/graphs.json";
String vertexFile = "/path/to/nodes.json";
String edgeFile = "/path/to/edges.json";

DataSink jsonDataSink = new JSONDataSink(graphFile, vertexFile, edgeFile, config);
jsonDataSink.write(dataSource.getGraphCollection(), true);

CSVDataSink

A graph data sink to write the logical graph as CSV files. Paths can be local (file://) or HDFS (hdfs://). Four *.csv files will be created in the specified directory: graphs.csv, edges.csv, vertices.csv and metadata.csv.

CSVDataSink example

LogicalGraph logicalGraph = ...
DataSink csvDataSink = new CSVDataSink("/path/to/csv/out/", config);
csvDataSink.write(logicalGraph, true);

CSVDataSink example (graph collection)

GraphCollection graphCollection = ...
DataSink csvDataSink = new CSVDataSink("/path/to/csv/out/", config);
csvDataSink.write(logicalGraph, true);

If a metadata file already exists, it can be reused.

CSVDataSink example (with existing metadata file)

LogicalGraph logicalGraph = ...
DataSink csvDataSink = new CSVDataSink("/path/to/csv/out/", "/path/to/metadata.csv" config);
csvDataSink.write(logicalGraph, true);