-
Notifications
You must be signed in to change notification settings - Fork 89
Gradoop DataSinks
This section provides an overview of gradoop specific data sinks.
Gradoop Data Sinks |
---|
DOTDataSink |
GDLDataSink |
HBaseDataSink |
AccumuloDataSink |
TLFDataSink |
JSONDataSink (deprecated since v0.5.0) |
CSVDataSink |
Writes an EPGM representation into one DOT file. For more information see the Wikipedia article of DOT. The Path can be local (file://
) or HDFS (hdfs://
). The format is documented at DOTFileFormat.
DOTDataSink example
LogicalGraph logicalGraph = ...
DataSink dataSink = new DOTDataSink("/path/to/out.dot", true);
dataSink.write(logicalGraph);
Example out.dot
digraph 0
{
gradoopId1 [label="person",name="Bob",age="20"];
gradoopId2 [label="person",name="Alice",age="20"];
gradoopId3;
gradoopId4;
gradoopId1->gradoopId2 [label="knows",since="2003"];
gradoopId2->gradoopId1 [label="knows",since="2003"];
gradoopId3->gradoopId4;
}
Writes an EPGM representation into a GDL file. The Path can be local (file://
) or HDFS (hdfs://
).
GDLDataSink example
LogicalGraph logicalGraph = ...
DataSink dataSink = new GDLDataSink("/path/to/out.gdl");
dataSink.write(logicalGraph);
Example out.gdl
g0:Community {interest:"Hadoop",vertexCount:3}[
(v_Person_1:Person {gender:"m",city:"Dresden",name:"Dave",age:40})
(v_Person_2:Person {locIP:"127.0.0.1",gender:"m",city:"Berlin",name:"Frank",age:35})
(v_Person_3:Person {gender:"f",city:"Dresden",name:"Carol",age:30})
(v_Person_3)-[e_knows_15:knows{since:2014}]->(v_Person_1)
(v_Person_2)-[e_knows_1:knows{since:2015}]->(v_Person_1)
(v_Person_2)-[e_knows_2:knows{since:2015}]->(v_Person_3)
(v_Person_1)-[e_knows_14:knows{since:2014}]->(v_Person_3)
(v_Person_3)-[e_knows_15]->(v_Person_1)
(v_Person_1)-[e_knows_14]->(v_Person_3)
(v_Person_3)-[e_knows_15]->(v_Person_1)
]
Converts runtime representation of EPGM elements into persistent representations and writes them to HBase. By default, graphs are stored in three tables: graph_heads
, vertices
and edges
. The following example shows the result of inserting the graph below (represented as GDL) in a HBase database.
Example graph as GDL
g1:graph[(p1:Person{name:"Bob",age:24})-[:friendsWith]->(p2:Person{name:"Alice",age:30})]
Resulting HBase Table structure (ids are shortened for readability)
hbase> scan 'graph_heads'
ROW COLUMN+CELL
x03 column=e:x06, timestamp=1521125952854, value=
x03 column=m:l, timestamp=1521125952854, value=graph
x03 column=v:x04, timestamp=1521125952854, value=
x03 column=v:x05, timestamp=1521125952854, value=
1 row(s) in 0.0850 seconds
hbase> scan 'vertices'
ROW COLUMN+CELL
x04 column=m:g, timestamp=1521125953029, value=x03
x04 column=m:l, timestamp=1521125953029, value=Person
x04 column=oe:x06, timestamp=1521125953029, value=
x04 column=p:age, timestamp=1521125953029, value=\x02\x00\x00\x00\x18
x04 column=p:name, timestamp=1521125953029, value=\x06\x00\x03Bob
x05 column=ie:x06, timestamp=1521125953002, value=
x05 column=m:g, timestamp=1521125953002, value=x03
x05 column=m:l, timestamp=1521125953002, value=Person
x05 column=p:age, timestamp=1521125953002, value=\x02\x00\x00\x00\x1E
x05 column=p:name, timestamp=1521125953002, value=\x06\x00\x05Alice
2 row(s) in 0.0240 seconds
hbase> scan 'edges'
ROW COLUMN+CELL
x06 column=m:g, timestamp=1521125952789, value=x03
x06 column=m:l, timestamp=1521125952789, value=friendsWith
x06 column=m:s, timestamp=1521125952789, value=x04
x06 column=m:t, timestamp=1521125952789, value=x05
1 row(s) in 0.0320 seconds
The graph_heads
table stores the graph declaration and the row-keys of its vertices and edges.
The vertices
table contains both vertices with five different columns: row-key of the graph (column=m:g
), label of the vertex (column=m:l
), row-key of the incoming/outgoing edge (column=ie:x06
or column=oe:x06
) and both properties.
The single edge is stored at edges
table, including columns to reference to the row-id of the graph (column=m:g
), label of the edge (column=m:l
), the row-key of the source vertex (column=m:s
) and the row-key of the target vertex (column=m:t
).
Writes an EPGM representation of a graph or a graph collection to a Apache Accumulo® store. By default, graphs are stored in three tables: graph
, vertex
and edge
.
AccumuloDataSink example
// flink execution env
ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment();
// create gradoop accumulo configuration
GradoopAccumuloConfig config = GradoopAccumuloConfig.create(env)
.set(GradoopAccumuloConfig.ACCUMULO_USER, {user})
.set(GradoopAccumuloConfig.ACCUMULO_INSTANCE, {instance})
.set(GradoopAccumuloConfig.ZOOKEEPER_HOSTS, {comma separated zookeeper host list})
.set(GradoopAccumuloConfig.ACCUMULO_PASSWD, {password})
.set(GradoopAccumuloConfig.ACCUMULO_TABLE_PREFIX, {table prefix});
// create store
AccumuloStore graphStore = new AccumuloStore(config);
// create sink
AccumuloDataSink dataSink = new AccumuloDataSink(graphStore);
dataSink.write(someOtherSource.getGraphCollection(), true);
Writes an EPGM representation into one TLF file. See The vertex-transitive TLF-planar graphs for further details about TLF. Paths can be local (file://
) or HDFS (hdfs://
). The exact format is documented in TLFFileFormat.
TLFDataSink example
DataSource dataSource = ...
DataSink dataSink = new TLFDataSink("/path/to/out.tlf", config);
dataSink.write(dataSource.getGraphCollection(), true);
Write an EPGM representation into three separate JSON files: one for vertex declaration, one for edge declaration and one for the graph declaration.Paths can be local (file://
) or HDFS (hdfs://
). The exact format is documented in the classes GraphHeadToJSON, VertexToJSON, EdgeToJSON.
JSONDataSink example
String graphFile = "/path/to/graphs.json";
String vertexFile = "/path/to/nodes.json";
String edgeFile = "/path/to/edges.json";
DataSink jsonDataSink = new JSONDataSink(graphFile, vertexFile, edgeFile, config);
jsonDataSink.write(dataSource.getGraphCollection(), true);
A graph data sink to write the logical graph as CSV files. Paths can be local (file://
) or HDFS (hdfs://
). Four *.csv files will be created in the specified directory: graphs.csv
, edges.csv
, vertices.csv
and metadata.csv
.
CSVDataSink example
LogicalGraph logicalGraph = ...
DataSink csvDataSink = new CSVDataSink("/path/to/csv/out/", config);
csvDataSink.write(logicalGraph, true);
CSVDataSink example (graph collection)
GraphCollection graphCollection = ...
DataSink csvDataSink = new CSVDataSink("/path/to/csv/out/", config);
csvDataSink.write(logicalGraph, true);
If a metadata file already exists, it can be reused.
CSVDataSink example (with existing metadata file)
LogicalGraph logicalGraph = ...
DataSink csvDataSink = new CSVDataSink("/path/to/csv/out/", "/path/to/metadata.csv" config);
csvDataSink.write(logicalGraph, true);