Skip to content

Commit

Permalink
Implement cluster support
Browse files Browse the repository at this point in the history
  • Loading branch information
flyingsilverfin committed Apr 4, 2023
1 parent eb82705 commit 9b09c68
Show file tree
Hide file tree
Showing 14 changed files with 169 additions and 91 deletions.
161 changes: 98 additions & 63 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,77 +1,94 @@


![TypeDBLoader_icon](https://github.com/bayer-science-for-a-better-life/grami/blob/master/typedbloader.png?raw=true)
---
---
###

###

[![TypeDB Loader Test](https://github.com/bayer-science-for-a-better-life/grami/actions/workflows/testandbuild.yaml/badge.svg)](https://github.com/bayer-science-for-a-better-life/grami/actions/workflows/testandbuild.yaml)
[![TypeDB Loader Build](https://github.com/bayer-science-for-a-better-life/grami/actions/workflows/release.yaml/badge.svg)](https://github.com/bayer-science-for-a-better-life/grami/actions/workflows/release.yaml)

###

---

If your [TypeDB](https://github.com/vaticle/typedb) project
- has a lot of data
- and you want/need to focus on schema design, inference, and querying

Use TypeDB Loader to take care of your data migration for you. TypeDB Loader streams data from files and migrates them into TypeDB **at scale**!

- has a lot of data
- and you want/need to focus on schema design, inference, and querying

Use TypeDB Loader to take care of your data migration for you. TypeDB Loader streams data from files and migrates them
into TypeDB **at scale**!

## Features:
- Data Input:

- Data Input:
- data is streamed to reduce memory requirements
- supports any tabular data file with your separator of choice (i.e.: csv, tsv, whatever-sv...)
- supports gzipped files
- ignores unnecessary columns
- [Attribute](https://github.com/typedb-osi/typedb-loader/wiki/02-Loading-Attributes), [Entity](https://github.com/typedb-osi/typedb-loader/wiki/03-Loading-Entities), [Relation](https://github.com/typedb-osi/typedb-loader/wiki/04-Loading-Relations) Loading:
- [Attribute](https://github.com/typedb-osi/typedb-loader/wiki/02-Loading-Attributes), [Entity](https://github.com/typedb-osi/typedb-loader/wiki/03-Loading-Entities), [Relation](https://github.com/typedb-osi/typedb-loader/wiki/04-Loading-Relations)
Loading:
- load required/optional attributes of any TypeDB type (string, boolean, long, double, datetime)
- load required/optional role players (attribute / entity / relation)
- load list-like attribute columns as n attributes (recommended procedure until attribute lists are fully supported by TypeDB)
- load list-like attribute columns as n attributes (recommended procedure until attribute lists are fully supported
by TypeDB)
- load list-like player columns as n players for a relation
- load entity if not present - if present, either do not write or append attributes
- [Appending Attributes](https://github.com/typedb-osi/typedb-loader/wiki/05-Appending-Attributes) to existing things
- [Append-Attribute-Or-Insert-Entity](https://github.com/typedb-osi/typedb-loader/wiki/06-Append-Or-Insert) for entities
- Data Validation:
- validate input data rows and log issues for easy diagnosis input data-related issues (i.e. missing attributes/players, invalid characters...)
- Configuration Validation:
- write your configuration with confidence: warnings will display useful information for fine tuning, errors will let you know what you forgot. All BEFORE the database is touched.
- Performance:
- parallelized asynchronous writes to TypeDB to make the most of your hardware configuration, optimized with engineers @vaticle
- Stop/Restart (in re-implementation, currently NOT available):
- [Appending Attributes](https://github.com/typedb-osi/typedb-loader/wiki/05-Appending-Attributes) to existing things
- [Append-Attribute-Or-Insert-Entity](https://github.com/typedb-osi/typedb-loader/wiki/06-Append-Or-Insert) for entities
- Data Validation:
- validate input data rows and log issues for easy diagnosis input data-related issues (i.e. missing
attributes/players, invalid characters...)
- Configuration Validation:
- write your configuration with confidence: warnings will display useful information for fine tuning, errors will
let you know what you forgot. All BEFORE the database is touched.
- Performance:
- parallelized asynchronous writes to TypeDB to make the most of your hardware configuration, optimized with
engineers @vaticle
- Stop/Restart (in re-implementation, currently NOT available):
- tracking of your migration status to stop/restart, or restart after failure

- [Basic Column Preprocessing using RegEx's](https://github.com/typedb-osi/typedb-loader/wiki/08-Preprocessing)
- [Basic Column Preprocessing using RegEx's](https://github.com/typedb-osi/typedb-loader/wiki/08-Preprocessing)

Create a Loading Configuration ([example](https://github.com/typedb-osi/typedb-loader/blob/master/src/test/resources/phoneCalls/config.json)) and use TypeDB Loader
- as an [executable CLI](https://github.com/typedb-osi/typedb-loader/wiki/10-TypeDB-Loader-as-Executable-CLI) - no coding
- in [your own Java project](https://github.com/typedb-osi/typedb-loader/wiki/09-TypeDB-Loader-as-Dependency) - easy API
Create a Loading
Configuration ([example](https://github.com/typedb-osi/typedb-loader/blob/master/src/test/resources/phoneCalls/config.json))
and use TypeDB Loader

- as an [executable CLI](https://github.com/typedb-osi/typedb-loader/wiki/10-TypeDB-Loader-as-Executable-CLI) - no
coding
- in [your own Java project](https://github.com/typedb-osi/typedb-loader/wiki/09-TypeDB-Loader-as-Dependency) - easy API

## How it works:

To illustrate how to use TypeDB Loader, we will use a slightly extended version of the "phone-calls" example [dataset](https://github.com/typedb-osi/typedb-loader/tree/master/src/test/resources/phoneCalls) and [schema](https://github.com/typedb-osi/typedb-loader/blob/master/src/test/resources/phoneCalls/schema.gql) from the TypeDB developer documentation:
To illustrate how to use TypeDB Loader, we will use a slightly extended version of the "phone-calls"
example [dataset](https://github.com/typedb-osi/typedb-loader/tree/master/src/test/resources/phoneCalls)
and [schema](https://github.com/typedb-osi/typedb-loader/blob/master/src/test/resources/phoneCalls/schema.gql) from the
TypeDB developer documentation:

### Configuration

The configuration file tells TypeDB Loader what things you want to insert for each of your data files and how to do it.

Here are some example:

- [Attribute Examples](https://github.com/typedb-osi/typedb-loader/wiki/02-Loading-Attributes)
- [Entity Examples](https://github.com/typedb-osi/typedb-loader/wiki/03-Loading-Entities)
- [Relation Examples](https://github.com/typedb-osi/typedb-loader/wiki/04-Loading-Relations)
- [Nested Relation - Match by Attribute(s) Example](https://github.com/typedb-osi/typedb-loader/wiki/04-Loading-Relations#loading-relations-with-entityrelation-players-matched-on-attribute-ownerships-incl-nested-relations)
- [Nested Relation - Match by Player(s) Example](https://github.com/typedb-osi/typedb-loader/wiki/04-Loading-Relations#loading-relations-relation-players-matching-on-players-in-playing-relation-incl-nested-relations)
- [Attribute-Player Relation Example](https://github.com/typedb-osi/typedb-loader/wiki/04-Loading-Relations#loading-relations-with-attribute-players)
- [Custom Migration Order Example](https://github.com/typedb-osi/typedb-loader/wiki/07-Custom-Load-Order)
- [Attribute Examples](https://github.com/typedb-osi/typedb-loader/wiki/02-Loading-Attributes)
- [Entity Examples](https://github.com/typedb-osi/typedb-loader/wiki/03-Loading-Entities)
- [Relation Examples](https://github.com/typedb-osi/typedb-loader/wiki/04-Loading-Relations)
- [Nested Relation - Match by Attribute(s) Example](https://github.com/typedb-osi/typedb-loader/wiki/04-Loading-Relations#loading-relations-with-entityrelation-players-matched-on-attribute-ownerships-incl-nested-relations)
- [Nested Relation - Match by Player(s) Example](https://github.com/typedb-osi/typedb-loader/wiki/04-Loading-Relations#loading-relations-relation-players-matching-on-players-in-playing-relation-incl-nested-relations)
- [Attribute-Player Relation Example](https://github.com/typedb-osi/typedb-loader/wiki/04-Loading-Relations#loading-relations-with-attribute-players)
- [Custom Migration Order Example](https://github.com/typedb-osi/typedb-loader/wiki/07-Custom-Load-Order)

For detailed documentation, please refer to the [WIKI](https://github.com/bayer-science-for-a-better-life/grami/wiki).

The [config](https://github.com/typedb-osi/typedb-loader/tree/master/src/test/resources/phoneCalls/config.json) in the phone-calls test is a good starting example of a configuration.
The [config](https://github.com/typedb-osi/typedb-loader/tree/master/src/test/resources/phoneCalls/config.json) in the
phone-calls test is a good starting example of a configuration.

### Migrate Data

Once your configuration files are complete, you can use TypeDB Loader in one of two ways:

1. As an executable command line interface - no coding required:
1. As an executable command line interface - no coding required:

```Shell
./bin/typedbloader load \
Expand All @@ -83,65 +100,83 @@ Once your configuration files are complete, you can use TypeDB Loader in one of

[See details here](https://github.com/typedb-osi/typedb-loader/wiki/10-TypeDB-Loader-as-Executable-CLI)

2. As a dependency in your own Java code:
2. As a dependency in your own Java code:

```Java
public class LoadingData {

public void loadData() {
String uri = "localhost:1729";
String config = "path/to/your/config.json";
String database = "databaseName";
String[] args = {
"load",
"-tdb", uri,
"-c", config,
"-db", database,
"-cm"
};

LoadOptions options = LoadOptions.parse(args);
TypeDBLoader loader = new TypeDBLoader(options);
loader.load();
}
public void loadData() {
String uri = "localhost:1729";
String config = "path/to/your/config.json";
String database = "databaseName";

String[] args = {
"load",
"-tdb", uri,
"-c", config,
"-db", database,
"-cm"
};

LoadOptions options = LoadOptions.parse(args);
TypeDBLoader loader = new TypeDBLoader(options);
loader.load();
}
}
```

[See details here](https://github.com/typedb-osi/typedb-loader/wiki/09-TypeDB-Loader-as-Dependency)


## Step-by-Step Tutorial

A complete tutorial for TypeDB version >= 2.5.0 is in work and will be published asap.

An example of configuration and usage of TypeDB Loader on real data can be found [in the TypeDB Examples](https://github.com/vaticle/typedb-examples/tree/master/biology/catalogue_of_life).
An example of configuration and usage of TypeDB Loader on real data can be
found [in the TypeDB Examples](https://github.com/vaticle/typedb-examples/tree/master/biology/catalogue_of_life).

A complete tutorial for TypeDB (Grakn) version < 2.0 can be found [on Medium](https://medium.com/@hkuich/introducing-grami-a-data-migration-tool-for-grakn-d4051582f867).
A complete tutorial for TypeDB (Grakn) version < 2.0 can be
found [on Medium](https://medium.com/@hkuich/introducing-grami-a-data-migration-tool-for-grakn-d4051582f867).

There is an [example repository](https://github.com/bayer-science-for-a-better-life/grami-example) for your convenience.

## Connecting to TypeDB Cluster

To connect to TypeDB Cluster, a set of options is provided:
```
--typedb-cluster=<address:port>
--username=<username>
--password // can be asked for interactively
--tls-enabled
--tls-root-ca=<path/to/CA/cert>
```

## Compatibility Table

| TypeDB Loader | TypeDB Client (internal) | TypeDB | TypeDB Cluster |
| :------------: | :---------------------: | :-------------: | :------------: |
| 1.1.0 to 1.2.0 | 2.8.0 | 2.8.x | N/A |
| 1.0.0 | 2.5.0 to 2.7.1 | 2.5.x to 2.7.x | N/A |
| 0.1.1 | 2.0.0 to 2.5.0 | 2.0.x to 2.4.x | N/A |
| <0.1 | 1.8.0 | 1.8.x | N/A |
Ranges are [inclusive, exclusive).

| TypeDB Loader | TypeDB Client (internal) | TypeDB | TypeDB Cluster |
|:--------------:|:------------------------:|:---------------:|:--------------:|
| 1.6.0 | 2.14.0 | 2.14.x | 2.14.x |
| 1.2.0 to 1.6.0 | 2.8.0 - 2.14.0 | 2.8.0 to 2.14.0 | N/A |
| 1.1.0 to 1.2.0 | 2.8.0 | 2.8.x | N/A |
| 1.0.0 | 2.5.0 to 2.7.1 | 2.5.x to 2.7.x | N/A |
| 0.1.1 | 2.0.0 to 2.5.0 | 2.0.x to 2.4.x | N/A |
| <0.1 | 1.8.0 | 1.8.x | N/A |

* [Type DB](https://github.com/vaticle/typedb)

Find the Readme for GraMi for grakn < 2.0 [here](https://github.com/bayer-science-for-a-better-life/grami/blob/b3d6d272c409d6c40254354027b49f90b255e1c3/README.md)

## Contributions

TypeDB Loader was built @[Bayer AG](https://www.bayer.com/) in the Semantic and Knowledge Graph Technology Group with the support of the engineers @[Grakn Labs](https://github.com/orgs/vaticle/people).
TypeDB Loader was built @[Bayer AG](https://www.bayer.com/) in the Semantic and Knowledge Graph Technology Group with
the support of the engineers @[Vaticle](https://github.com/vaticle).

## Licensing

This repository includes software developed at [Bayer AG](https://www.bayer.com/). It is released under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).

This repository includes software developed at [Bayer AG](https://www.bayer.com/). It is released under
the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).

## Credits

Icon in banner by [Freepik](https://www.freepik.com") from [Flaticon](https://www.flaticon.com/)
8 changes: 4 additions & 4 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ plugins {
}

group 'com.vaticle.typedb-osi'
version '1.5.1'
version '1.6.0'

repositories {
mavenCentral()
Expand All @@ -15,13 +15,13 @@ repositories {
}

dependencies {
implementation("com.vaticle.typedb:typedb-client:2.11.1")
implementation("com.vaticle.typeql:typeql-grammar:2.9.0")
implementation("com.vaticle.typedb:typedb-client:2.14.2")
implementation("com.vaticle.typeql:typeql-grammar:2.14.0")
implementation("com.google.code.gson:gson:2.8.6")
implementation("org.slf4j:slf4j-api:1.7.25")
implementation("org.apache.logging.log4j:log4j-api:2.17.1")
implementation("org.apache.logging.log4j:log4j-core:2.17.1")
implementation("info.picocli:picocli:4.5.1")
implementation("info.picocli:picocli:4.6.1")
implementation("org.apache.commons:commons-csv:1.8")
implementation("commons-io:commons-io:2.8.0")
compileOnly("info.picocli:picocli-codegen:4.5.1")
Expand Down
30 changes: 27 additions & 3 deletions src/main/java/com/vaticle/typedb/osi/loader/cli/LoadOptions.java
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@

import picocli.CommandLine;

import javax.annotation.Nullable;

@CommandLine.Command(name = "load", description = "load data and/or schema", mixinStandardHelpOptions = true)
public class LoadOptions {
@CommandLine.Spec
Expand All @@ -26,12 +28,27 @@ public class LoadOptions {
@CommandLine.Option(names = {"-c", "--config"}, description = "config file in JSON format", required = true)
public String dataConfigFilePath;

@CommandLine.Option(names = {"-db", "--database"}, description = "target database in your grakn instance", required = true)
@CommandLine.Option(names = {"-db", "--database"}, description = "target database in your TypeDB instance", required = true)
public String databaseName;

@CommandLine.Option(names = {"-tdb", "--typedb"}, description = "optional - TypeDB server in format: server:port (default: localhost:1729)", defaultValue = "localhost:1729")
@CommandLine.Option(names = {"-tdb", "--typedb"}, description = "Connect to TypeDB Core server in format: server:port (default: localhost:1729)", defaultValue = "localhost:1729")
public String typedbURI;

@CommandLine.Option(names = {"-tdbc", "--typedb-cluster"}, description = "Connect to TypeDB Cluster instead of TypeDB Core. Specify a cluster server with 'server:port'.")
public String typedbClusterURI;

@CommandLine.Option(names = {"--username"}, description = "Username")
public @Nullable String username;

@CommandLine.Option(names = {"--password"}, description = "Password", interactive = true, arity = "0..1")
public @Nullable String password;

@CommandLine.Option(names = {"--tls-enabled"}, description = "Connect to TypeDB Cluster with TLS encryption")
public boolean tlsEnabled;

@CommandLine.Option( names = {"--tls-root-ca"}, description = "Path to the TLS root CA file")
public @Nullable String tlsRootCAPath;

@CommandLine.Option(names = {"-cm", "--cleanMigration"}, description = "optional - delete old schema and data and restart migration from scratch - default: continue previous migration, if exists", defaultValue = "false")
public boolean cleanMigration;

Expand Down Expand Up @@ -74,7 +91,14 @@ public void print() {
spec.commandLine().getOut().println("TypeDB Loader started with parameters:");
spec.commandLine().getOut().println("\tconfiguration: " + dataConfigFilePath);
spec.commandLine().getOut().println("\tdatabase name: " + databaseName);
spec.commandLine().getOut().println("\tTypeDB server: " + typedbURI);
if (typedbClusterURI != null) {
spec.commandLine().getOut().println("\tTypeDB cluster URI: " + typedbClusterURI);
spec.commandLine().getOut().println("\tTypeDB cluster username: " + username);
spec.commandLine().getOut().println("\tTypeDB cluster TLS enabled: " + tlsEnabled);
spec.commandLine().getOut().println("\tTypeDB cluster TLS path: " + (tlsRootCAPath == null ? "N/A" : tlsRootCAPath));
} else {
spec.commandLine().getOut().println("\tTypeDB server URI: " + typedbURI);
}
spec.commandLine().getOut().println("\tdelete database and all data in it for a clean new migration?: " + cleanMigration);
spec.commandLine().getOut().println("\treload schema (if not doing clean migration): " + loadSchema);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ public TypeDBLoader(LoadOptions options) {

public void load() {
Util.info("validating your config...");
TypeDBClient schemaClient = TypeDBUtil.getClient(options.typedbURI);
TypeDBClient schemaClient = TypeDBUtil.getClient(options);
ConfigurationValidation cv = new ConfigurationValidation(dc);
HashMap<String, ArrayList<String>> validationReport = new HashMap<>();
ArrayList<String> errors = new ArrayList<>();
Expand Down Expand Up @@ -83,7 +83,7 @@ public void load() {
Instant start = Instant.now();
try {
AsyncLoaderWorker asyncLoaderWorker = null;
try (TypeDBClient client = TypeDB.coreClient(options.typedbURI, Runtime.getRuntime().availableProcessors())) {
try (TypeDBClient client = TypeDBUtil.getClient(options)) {
Runtime.getRuntime().addShutdownHook(
NamedThreadFactory.create(AsyncLoaderWorker.class, "shutdown").newThread(client::close)
);
Expand Down
Loading

0 comments on commit 9b09c68

Please sign in to comment.