-
Notifications
You must be signed in to change notification settings - Fork 89
Gradoop DataSinks
This section provides an overview of gradoop specific data sinks.
Writes an EPGM representation into one DOT file. For more information see the Wikipedia article of DOT. The Path can be local (file://
) or HDFS (hdfs://
). The format is documented at DOTFileFormat.
DOTDataSink example
LogicalGraph logicalGraph = ...
DataSink dataSink = new DOTDataSink("/path/to/out.dot", true);
dataSink.write(logicalGraph);
Example out.dot
digraph 0
{
gradoopId1 [label="person",name="Bob",age="20"];
gradoopId2 [label="person",name="Alice",age="20"];
gradoopID3;
gradoopID4;
gradoopId1->gradoopId2 [label="knows",since="2003"];
gradoopId2->gradoopId1 [label="knows",since="2003"];
gradoopId3->gradoopId4;
}
Converts runtime representation of EPGM elements into persistent representations and writes them to HBase. By default, graphs are stored in three tables: graph_heads
, vertices
and edges
. The following example shows the result of inserting the graph below (represented as GDL) in a HBase database.
Example graph as GDL
g1:graph[(p1:Person{name:"Bob",age:24})-[:friendsWith]->(p2:Person{name:"Alice",age:30})]
Resulting HBase Table structure (ids are shortened for readability)
hbase> scan 'graph_heads'
ROW COLUMN+CELL
x03 column=e:x06, timestamp=1521125952854, value=
x03 column=m:l, timestamp=1521125952854, value=graph
x03 column=v:x04, timestamp=1521125952854, value=
x03 column=v:x05, timestamp=1521125952854, value=
1 row(s) in 0.0850 seconds
hbase> scan 'vertices'
ROW COLUMN+CELL
x04 column=m:g, timestamp=1521125953029, value=x03
x04 column=m:l, timestamp=1521125953029, value=Person
x04 column=oe:x06, timestamp=1521125953029, value=
x04 column=p:age, timestamp=1521125953029, value=\x02\x00\x00\x00\x18
x04 column=p:name, timestamp=1521125953029, value=\x06\x00\x03Bob
x05 column=ie:x06, timestamp=1521125953002, value=
x05 column=m:g, timestamp=1521125953002, value=x03
x05 column=m:l, timestamp=1521125953002, value=Person
x05 column=p:age, timestamp=1521125953002, value=\x02\x00\x00\x00\x1E
x05 column=p:name, timestamp=1521125953002, value=\x06\x00\x05Alice
2 row(s) in 0.0240 seconds
hbase> scan 'edges'
ROW COLUMN+CELL
x06 column=m:g, timestamp=1521125952789, value=x03
x06 column=m:l, timestamp=1521125952789, value=friendsWith
x06 column=m:s, timestamp=1521125952789, value=x04
x06 column=m:t, timestamp=1521125952789, value=x05
1 row(s) in 0.0320 seconds
The graph_heads
table stores the graph declaration and the row-keys of its vertices and edges.
The vertices
table contains both vertices with five different columns: row-key of the graph (column=m:g
), label of the vertex (column=m:l
), row-key of the incoming/outgoing edge (column=ie:x06
or column=oe:x06
) and both properties.
The single edge is stored at edges
table, including columns to reference to the row-id of the graph (column=m:g
), label of the edge (column=m:l
), the row-key of the source vertex (column=m:s
) and the row-key of the target vertex (column=m:t
).
HBaseDataSink example
GraphCollection graphCollection = ...
// create hbase and gradoop-hbase configuration
Configuration hBaseConfiguration = HBaseConfiguration.create();
GradoopHBaseConfig<GraphHead, Vertex, Edge> gradoopConfig = GradoopHBaseConfig.getDefaultConfig(env);
// get the EPGM store
HBaseEPGMStore<GraphHead, Vertex, Edge> store = HBaseEPGMStoreFactory.createOrOpenEPGMStore(hBaseConfiguration, gradoopConfig);
// create the HBaseDataSink and write the GraphCollection
DataSink sink = new HBaseDataSink(store, config);
sink.write(graphCollection, true);
store.flush();
store.close();
Writes an EPGM representation into one TLF file. See The vertex-transitive TLF-planar graphs for further details about TLF. Paths can be local (file://
) or HDFS (hdfs://
). The exact format is documented in TLFFileFormat.
TLFDataSink example
DataSource dataSource = ...
DataSink dataSink = new TLFDataSink("/path/to/out.tlf", config);
dataSink.write(dataSource.getGraphCollection(), true);
Write an EPGM representation into three separate JSON files: one for vertex declaration, one for edge declaration and one for the graph declaration.Paths can be local (file://
) or HDFS (hdfs://
). The exact format is documented in the classes GraphHeadToJSON, VertexToJSON, EdgeToJSON.
JSONDataSink example
String graphFile = "/path/to/graphs.json";
String vertexFile = "/path/to/nodes.json";
String edgeFile = "/path/to/edges.json";
DataSink jsonDataSink = new JSONDataSink(graphFile, vertexFile, edgeFile, config);
jsonDataSink.write(dataSource.getGraphCollection(), true);
A graph data sink to write the logical graph as CSV files. Paths can be local (file://
) or HDFS (hdfs://
).
JSONDataSink example
LogicalGraph logicalGraph = ...
DataSink csvDataSink = new CSVDataSink("/path/to/csv/out/", config);
csvDataSink.write(logicalGraph, true);
If a metadata file already exists, it can be reused.
JSONDataSink example (with existing metadata file)
LogicalGraph logicalGraph = ...
DataSink csvDataSink = new CSVDataSink("/path/to/csv/out/", "/path/to/metadata.csv" config);
csvDataSink.write(logicalGraph, true);