-
Notifications
You must be signed in to change notification settings - Fork 13
Data Output
Gabor Szarnyas edited this page May 30, 2021
·
1 revision
Datagen provides a mechanism to implement the serialization of the datasets. It allows the user to define their own formats or to ingest the data directly to a data store. This mechanism is based on three abstract classes, that have to be extended and specified in the configuration file as explained in Compilation_Execution. The abstract classes are the following, and only the initialize, close and the different versions (one for each entity) of the serialize method have to be implemented:
-
StaticSerializer
: This class serializes all the entities that are independent of the dataset sizes, that is, tags, tagClasses, organisations and places. -
DynamicPersonSerializer
: This class serializes the Persons, Knows, studyAt and workAt relationships. -
DynamicActivitySerializer
: This class serializes all the entities related to person activity generation, that is, Forums, Posts, Comments and likes.
Currently, by default we provide the serializer classes for the CsvBasic
, CsvCompite
, CsvMergeForeign
, CsvCompositeMergeForeign
and Turtle
formats. These are documented in the LDBC SNB benchmark specification document. Some general guidelines:
- Use
CsvBasic
for graph databases that support CSV import. - Use
CsvComposite
for graph databases that support CSV import and composite data structures. - Use
CsvMergeForeign
for relational databases. - Use
CsvCompositeMergeForeign
for relational databases that support composite data structures. - Use
Turtle
for RDF tools and graph-based tools that support it.