Skip to content

Commit

Permalink
Merge pull request #13 from hkuich/playerMatchAttributeCasting
Browse files Browse the repository at this point in the history
Player match attribute casting
  • Loading branch information
hkuich authored Jan 8, 2021
2 parents c3f9e94 + 444d8ff commit 9bf7e31
Show file tree
Hide file tree
Showing 25 changed files with 488 additions and 74 deletions.
147 changes: 141 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ Do above for all entities and their attributes in the schema. GraMi will ensure

For each relation in your schema, define a processor object that specifies
- each relation attribute, its value type, and whether it is required
- each relation player entity type, role, identifying attribute in the data file and its value type, as well as whether the player is required
- each relation player of type entity, its role, identifying attribute in the data file and value type, as well as whether the player is required

For example, given the following relation in your schema:

Expand All @@ -126,7 +126,8 @@ call sub relation,
relates caller,
relates callee,
has started-at,
has duration;
has duration,
plays past-call;
```

Add the following processor object:
Expand All @@ -143,8 +144,7 @@ Add the following processor object:
"uniquePlayerId": "phone-number", // using attribute phone-number as unique identifier for type person
"idValueType": "string", // of value type string
"roleType": "caller", // inserts person as player the role caller
"required": true // which is a required role for each data record
"required": true // which is a required role for each data record
},
"callee": { // ID of player generator
"playerType": "person", // matches entity of type person
Expand Down Expand Up @@ -172,6 +172,64 @@ Add the following processor object:

Do above for all relations and their players and attributes in the schema. GraMi will ensure that all values in your data files adhere to the value type specified or try to cast them. GraMi will also ensure that no data records enter grakn that are incomplete (missing required attributes/players).

##### Relation-Of-Relation Processors

Grakn comes with the powerful feature of using relations as players in other relations.

For each relation-of-relation in your schema, define a processor object that specifies
- each relation attribute, its value type, and whether it is required
- each relation player of type entity, its role, identifying attribute in the data file and its value type, as well as whether the player is required
- each relation player of type relation, its role, identifying attribute in the data file and its value type, as well as whether the player is required

For example, given the following relation in your schema:

```GraphQL
person sub entity,
...,
plays peer;

call sub relation,
relates caller,
relates callee,
has started-at,
has duration,
plays past-call;

communication-channel sub relation,
relates peer,
relates past-call;
```

Add the following processor object:

```
{
"processor": "communication-channel", // the ID of your processor
"processorType": "relation-of-relation", // creates a relation
"schemaType": "communication-channel", // of type communication-channel
"conceptGenerators": {
"players": { // with the following players according to schema
"peer": { // ID of player generator
"playerType": "person", // matches entity of type person
"uniquePlayerId": "phone-number", // using attribute phone-number as unique identifier for type person
"idValueType": "string", // of value type string
"roleType": "peer", // inserts person as player the role caller
"required": true // which is a required role for each data record
}
"past-call": { // ID of player generator
"playerType": "call", // matches entity of type person
"uniquePlayerId": "started-at", // using attribute phone-number as unique identifier for type person
"idValueType": "date", // of value type string
"roleType": "past-call", // inserts person as player the role callee
"required": true // which is a required role for each data record
},
}
}
}
```

Just remember that these relations of relation must be added AFTER the relations that will act as players in the relation have been migrated. GraMi will migrate all relation-of-relations after having migrated entities and relations - but keep this in mind as you are building your graph - relations are only inserted as expected when all its players are already present.

See the [full configuration file for phone-calls here](https://github.com/bayer-science-for-a-better-life/grami/tree/master/src/test/resources/phone-calls/processorConfig.json).

#### Data Configuration
Expand Down Expand Up @@ -257,7 +315,7 @@ The data config entry would be:
"players": [ // player columns present in the data file
{
"columnName": "caller_id", // column name in data file
"generator": "caller" // player generator in processor call to be used for the column
"generator": "caller" // player generator in processor call to be used for the column
},
{
"columnName": "callee_id", // column name in data file
Expand All @@ -267,7 +325,7 @@ The data config entry would be:
"attributes": [ // attribute columns present in the data file
{
"columnName": "started_at", // column name in data file
"generator": "started-at" // attribute generator in processor call to be used for the column
"generator": "started-at" // attribute generator in processor call to be used for the column
},
{
"columnName": "duration", // column name in data file
Expand All @@ -279,6 +337,83 @@ The data config entry would be:

Do above for all data files that need to be migrated.

Please note that you can also add a listSeparator for players that are in a list in a column:

Your data might look like:

```
company_name,person_id
Unity,+62 999 888 7777###+62 999 888 7778
```

```
"contract": {
"dataPath": "src/test/resources/phone-calls/contract.csv",
"separator": ",",
"processor": "contract",
"players": [
{
"columnName": "company_name",
"generator": "provider"
},
{
"columnName": "person_id",
"generator": "customer",
"listSeparator": "###" // like this!
}
],
"batchSize": 100,
"threads": 4
}
```

For troubleshooting, it might be worth setting the troublesome data configuration entry to a single thread, as the log messages for error from grakn are more verbose and specific that way...

##### Relation-of-Relation Data Config Entries

Given the data file [communication-channel.csv](https://github.com/bayer-science-for-a-better-life/grami/tree/master/src/test/resources/phone-calls/communication-channel.csv):

```CSV
peer_1,peer_2,call_started_at
+54 398 559 0423,+48 195 624 2025,2018-09-16T22:24:19
+54 398 559 0423,+48 195 624 2025,2018-09-17T22:24:19
+54 398 559 0423,+48 195 624 2025,2018-09-18T22:24:19
+54 398 559 0423,+48 195 624 2025,2018-09-19T22:24:19
+54 398 559 0423,+48 195 624 2025,2018-09-20T22:24:19
+263 498 495 0617,+33 614 339 0298,2018-09-11T22:10:34###2018-09-12T22:10:34###2018-09-13T22:10:34###2018-09-14T22:10:34###2018-09-15T22:10:34###2018-09-16T22:10:34
+54 398 559 0423,+7 552 196 4096,2018-09-25T20:24:59
...
```

The data config entry would be:

```
"communication-channel": {
"dataPath": "/your/absolute/path/to/communication-channel.csv", // the absolute path to your data file
"separator": ",", // the separation character used in your data file (alternatives: "\t", ";", etc...)
"processor": "communication-channel", // processor from processor config file
"batchSize": 100, // batchSize to be used for this data file
"threads": 4, // # of threads to be used for this data file
"players": [ // player columns present in the data file
{
"columnName": "peer_1", // column name in data file
"generator": "peer" // player generator in processor call to be used for the column
},
{
"columnName": "peer_2", // column name in data file
"generator": "peer" // player generator in processor call to be used for the column
},
{
"columnName": "call_started_at", // column name in data file
"generator": "past-call", // player generator in processor call to be used for the column
"listSeparator": "###"
}
]
}
```

Do above for all data files that need to be migrated.

For troubleshooting, it might be worth setting the troublesome data configuration entry to a single thread, as the log messages for error from grakn are more verbose and specific that way...

See the [full configuration file for phone-calls here](https://github.com/bayer-science-for-a-better-life/grami/tree/master/src/test/resources/phone-calls/dataConfig.json).
Expand Down
2 changes: 1 addition & 1 deletion build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ plugins {
}

group 'com.github.bayer-science-for-a-better-life'
version '0.0.2'
version '0.0.2-hotfix-1'

repositories {
mavenCentral()
Expand Down
2 changes: 1 addition & 1 deletion src/main/java/generator/GeneratorUtil.java
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ public static StatementInstance cleanExplodeAdd(StatementInstance pattern, Strin
}
}

private static StatementInstance addAttributeOfColumnType(StatementInstance pattern, String conceptType, String valueType, String cleanedValue) {
public static StatementInstance addAttributeOfColumnType(StatementInstance pattern, String conceptType, String valueType, String cleanedValue) {
if (valueType.equals("string")) {
pattern = pattern.has(conceptType, cleanedValue);
} else if (valueType.equals("long")) {
Expand Down
100 changes: 51 additions & 49 deletions src/main/java/generator/RelationInsertGenerator.java
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import static generator.GeneratorUtil.idxOf;
import static generator.GeneratorUtil.cleanToken;
import static generator.GeneratorUtil.addAttribute;
import static generator.GeneratorUtil.addAttributeOfColumnType;

import configuration.DataConfigEntry;
import configuration.ProcessorConfigEntry;
Expand All @@ -12,10 +13,7 @@
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Map;
import java.util.*;

public class RelationInsertGenerator extends InsertGenerator {

Expand Down Expand Up @@ -62,12 +60,13 @@ public ArrayList<ArrayList<Statement>> graknRelationshipQueryFromRow(String row,
appLogger.debug("processing tokenized row: " + Arrays.toString(tokens));
GeneratorUtil.malformedRow(row, tokens, headerTokens.length);

ArrayList<Statement> matchStatements = new ArrayList<>(playersMatch(tokens, headerTokens, insertCounter));
ArrayList<Statement> miStatements = new ArrayList<>(createPlayerMatchAndInsert(tokens, headerTokens, insertCounter));
ArrayList<Statement> matchStatements = new ArrayList<>(miStatements.subList(0, miStatements.size() - 1));
ArrayList<Statement> insertStatements = new ArrayList<>();

if (!matchStatements.isEmpty()) {
StatementInstance insert = playersInsert(matchStatements, insertCounter);
insert = relationInsert(insert);
StatementInstance playerInsert = (StatementInstance) miStatements.subList(miStatements.size() - 1, miStatements.size()).get(0);
StatementInstance insert = relationInsert(playerInsert);
if (dce.getAttributes() != null) {
for (DataConfigEntry.GeneratorSpecification attDataConfigEntry : dce.getAttributes()) {
insert = addAttribute(tokens, insert, headerTokens, attDataConfigEntry, gce.getAttributeGenerator(attDataConfigEntry.getGenerator()));
Expand Down Expand Up @@ -101,61 +100,61 @@ private String assembleQuery(ArrayList<ArrayList<Statement>> queries) {
return ret.toString();
}

private StatementInstance relationInsert(StatementInstance si) {
if (si != null) {
si = si.isa(gce.getSchemaType());
return si;
} else {
return null;
}
}

private StatementInstance playersInsert(ArrayList<Statement> matchStatements, int insertCounter) {
Statement s = Graql.var("rel-" + insertCounter);
int playerCounter = 0;
for (DataConfigEntry.GeneratorSpecification dataPlayer : dce.getPlayers()) {
ProcessorConfigEntry.ConceptGenerator playerGenerator = gce.getPlayerGenerator(dataPlayer.getGenerator());
boolean insert = false;
for (Statement st :matchStatements) {
//need to have player in match statement or cannot insert as player in relation
if (st.toString().contains(playerGenerator.getUniquePlayerId())) {
insert = true;
}
}
if (insert) {
s = s.rel(playerGenerator.getRoleType(), playerGenerator.getPlayerType() + "-" + playerCounter + "-" + insertCounter);
}
playerCounter++;
}
if (s.toString().contains("(")) {
return (StatementInstance) s;
} else {
return null;
}
}

private Collection<? extends Statement> playersMatch(String[] tokens, String[] headerTokens, int insertCounter) {
private Collection<? extends Statement> createPlayerMatchAndInsert(String[] tokens, String[] headerTokens, int insertCounter) {
ArrayList<Statement> players = new ArrayList<>();
Statement playersInsertStatement = Graql.var("rel-" + insertCounter);
int playerCounter = 0;
for (DataConfigEntry.GeneratorSpecification playerDataConfigEntry : dce.getPlayers()) {
ProcessorConfigEntry.ConceptGenerator playerGenerator = gce.getPlayerGenerator(playerDataConfigEntry.getGenerator());
int playerDataIndex = idxOf(headerTokens, playerDataConfigEntry);

if(playerDataIndex == -1) {
appLogger.error("The column header in your dataconfig mapping to the uniquePlayerId [" + playerGenerator.getUniquePlayerId() + "] cannot be found in the file you specified.");
}
if (tokens.length > playerDataIndex &&
!cleanToken(tokens[playerDataIndex]).isEmpty()) {
StatementInstance ms = Graql
.var(playerGenerator.getPlayerType() + "-" + playerCounter + "-" + insertCounter)
.isa(playerGenerator.getPlayerType()).has(playerGenerator.getUniquePlayerId(),
cleanToken(tokens[playerDataIndex]));
players.add(ms);

if (tokens.length > playerDataIndex && // make sure that there are enough tokens in the row for your column of interest
!cleanToken(tokens[playerDataIndex]).isEmpty()) { // make sure that after cleaning, there is more than an empty string
String listSeparator = playerDataConfigEntry.getListSeparator();
if(listSeparator != null) {
for (String exploded: tokens[playerDataIndex].split(listSeparator)) {
if(!cleanToken(exploded).isEmpty()) {
String playerVariable = playerGenerator.getPlayerType() + "-" + playerCounter + "-" + insertCounter;
players.add(createPlayerMatchStatement(exploded, playerGenerator, playerVariable));
playersInsertStatement = playersInsertStatement.rel(playerGenerator.getRoleType(), playerVariable);
playerCounter++;
}
}
} else { // single player, no listSeparator
String playerVariable = playerGenerator.getPlayerType() + "-" + playerCounter + "-" + insertCounter;
players.add(createPlayerMatchStatement(cleanToken(tokens[playerDataIndex]), playerGenerator, playerVariable));
playersInsertStatement = playersInsertStatement.rel(playerGenerator.getRoleType(), playerVariable);
playerCounter++;
}
}
playerCounter++;
}
players.add(playersInsertStatement);
return players;
}

private StatementInstance createPlayerMatchStatement(String token, ProcessorConfigEntry.ConceptGenerator playerGenerator, String playerVariable) {
String cleanedValue = cleanToken(token);
StatementInstance ms = Graql
.var(playerVariable)
.isa(playerGenerator.getPlayerType());
ms = addAttributeOfColumnType(ms, playerGenerator.getUniquePlayerId(), playerGenerator.getIdValueType(), cleanedValue);
//.has(playerGenerator.getUniquePlayerId(), cleanedValue);
return ms;
}

private StatementInstance relationInsert(StatementInstance si) {
if (si != null) {
si = si.isa(gce.getSchemaType());
return si;
} else {
return null;
}
}

private boolean isValid(ArrayList<ArrayList<Statement>> si) {
ArrayList<Statement> matchStatements = si.get(0);
ArrayList<Statement> insertStatements = si.get(1);
Expand All @@ -169,6 +168,9 @@ private boolean isValid(ArrayList<ArrayList<Statement>> si) {
if (!matchStatement.toString().contains("isa " + generatorEntry.getValue().getPlayerType())) {
return false;
}
if (!insertStatement.contains(generatorEntry.getValue().getRoleType())) {
return false;
}
}
// missing required attribute
for (Map.Entry<String, ProcessorConfigEntry.ConceptGenerator> generatorEntry: gce.getRequiredAttributes().entrySet()) {
Expand Down
3 changes: 3 additions & 0 deletions src/main/java/migrator/GraknMigrator.java
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,9 @@ public void migrate(boolean migrateEntities, boolean migrateRelations) throws IO
}

private void migrateThingsInOrder(GraknClient.Session session, boolean migrateEntities, boolean migrateRelations) throws IOException {
if(!migrateEntities && migrateRelations) {
migrateEntities = true;
}
if (migrateEntities) {
appLogger.info("migrating entities...");
getStatusAndMigrate(session, "entity");
Expand Down
Loading

0 comments on commit 9bf7e31

Please sign in to comment.