Skip to content
This repository has been archived by the owner on Nov 18, 2023. It is now read-only.

Pytorch Geometric support (deprecates KGCN) #161

Merged
merged 124 commits into from
Jul 28, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
124 commits
Select commit Hold shift + click to select a range
4bc5ae9
Upgrade to TypeDB 2.10.0 and associated build configurations. Won't b…
Jun 8, 2022
e01ed36
Buildable state
Jun 10, 2022
e2998c9
Baseline example for CSV loads to PyTorch Geometric
Jun 15, 2022
b6c603c
WIP swapping out the KGCN for a PyTorch GCN
Jun 15, 2022
7b75171
WIP
Jun 17, 2022
c96fbb9
Adds binary_relations_to_edges
Jun 17, 2022
5c1fa52
WIP
Jun 21, 2022
aacc489
Change StandardKGCNNetworkxTransform into explicitly a feature encode…
Jul 5, 2022
d2a6f3d
Properly prepare nodes and edges with necessary additions/deletions f…
Jul 5, 2022
55da3db
Include all role supertypes in case the user uses them in their queri…
Jul 5, 2022
aabe9f8
Finish conversion to a HeteroData object
Jul 5, 2022
05456fb
Give the random link splitter only the edge we want to predict
Jul 6, 2022
49f6768
Remove candidate-diagnosis, kgcn and doctor concepts from the diagnos…
Jul 6, 2022
756bbe7
Handle an error for missing data and tell the user what to do
Jul 6, 2022
8d1e8f8
Some fixes for the generated data
Jul 6, 2022
715e86f
There were more missing types to be detected, now fixed, but encoder …
Jul 7, 2022
552ee34
Use a different model, called HAN. Evades several errors that perhaps…
Jul 7, 2022
243803f
The learner now sees the negative training examples
Jul 8, 2022
2d7bad4
Use a simple dot product to compute edge logits and therefore the app…
Jul 8, 2022
4f72eef
WIP improving features
Jul 8, 2022
1d54acd
Non-ideal fix to the graph structure breaking. It breaks when the fea…
Jul 8, 2022
d065eeb
Log scalars and histograms to tensorboard
Jul 12, 2022
cd95ef0
Use a HGTConv rather than a HANConv as the heterogeneous convolution …
Jul 12, 2022
77ef6ef
WIP removing data leak of diagnosis edges. edge_label needs to be sav…
Jul 12, 2022
aea16d0
Make age a less significant feature
Jul 14, 2022
d635151
Remove diagnosis edges to ensure there's no data leakage. Might also …
Jul 14, 2022
ad9bc2b
Change the type used to represent past history of diagnosis of a pare…
Jul 14, 2022
d204d24
Clean up encoder
Jul 14, 2022
42ac773
Minor changes
Jul 18, 2022
d9c819f
Remove PyG example scripts
Jul 18, 2022
503a150
Created a dictionary of concepts by type in order to be able to map b…
Jul 18, 2022
23dfe73
Remove unneeded tests and update BUILD files
Jul 19, 2022
3fcefdf
Fix automation.yml
Jul 19, 2022
7b5c5b2
Upgrade to Python 3.7.2
Jul 19, 2022
1b8bff6
Fix result reporting
Jul 19, 2022
4b2e95c
Cleaning, refactoring, added confusion matrix from TypeDB
Jul 20, 2022
7e910c3
Update dependencies and fix build issues
Jul 20, 2022
7105217
Rename kgcn_data_loader to pytorch_geometric
Jul 20, 2022
9a001cd
Update automation.yml to match
Jul 20, 2022
1ed0cf2
Move all of the code relating to the diagnosis example into one place
Jul 20, 2022
27eaf04
Move thing and type libraries up a level
Jul 20, 2022
a705f21
Fix some license headers
Jul 20, 2022
152815a
Remove superfluous test file
Jul 20, 2022
eb7126a
Remove references to kgcn from test structure
Jul 20, 2022
119eea5
Move simple test parent class next to its usages
Jul 20, 2022
410d7eb
Update checkstyle to exclude inappropriate files
Jul 20, 2022
ffdff09
Add missing exclusions for checkstyles for .md files
Jul 20, 2022
23645b2
Use Python 3.7.2 for deployment
Jul 20, 2022
ce042a8
Fix imports
Jul 20, 2022
b989e07
Fix README exclusions again
Jul 20, 2022
721d03b
Move typedb related code to a module and networkx to another
Jul 20, 2022
e101e22
Missing license header
Jul 20, 2022
0438a51
Fix imports
Jul 20, 2022
ebadf39
Fix paths
Jul 20, 2022
bd8ed87
Typo in automation
Jul 20, 2022
14f3bba
Flatten the networkx package
Jul 20, 2022
3135314
Add omitted requirements
Jul 20, 2022
f514bba
Remove reference to non-existant dir
Jul 21, 2022
6c5a0c1
Correct path to networkx package
Jul 21, 2022
8b43d4b
Update requirements to slimmest possible
Jul 21, 2022
51492bf
Change package name back to vaticle-kglib for now
Jul 21, 2022
767c946
Revert "Update requirements to slimmest possible"
Jul 21, 2022
152dfba
Figure out that it's torch-scatter and torch-sparse throwing an error…
Jul 21, 2022
c69db2f
Add missing dependency
Jul 21, 2022
54993aa
Simplify naming in networkx packaage
Jul 21, 2022
14b2428
Fixes for networkx rename
Jul 21, 2022
2a14150
Fix the ModuleError torch for missing torch, see https://github.com/p…
Jul 21, 2022
f683b82
Add missing dep
Jul 21, 2022
f713ae3
Add more deps
Jul 21, 2022
dadde8e
More deps fixes
Jul 21, 2022
9b5613e
Deps fixes for end-to-end test
Jul 21, 2022
a218275
Remove auto-deleting diagnosis database on each run. Can't be enabled…
Jul 21, 2022
d16a8e7
Refactor to remove the solution field from networkx graphs. No longer…
Jul 21, 2022
b3719d4
Try using a requirements file specifically to fill the install_requir…
Jul 21, 2022
945a348
Try using absolute path to the schema and data files
Jul 21, 2022
a64453f
Lower the threshold accuracy required to pass end-to-end tests (havin…
Jul 21, 2022
66a8e2f
README updates and relevant amendments
Jul 22, 2022
42b1b61
Extra comments
Jul 22, 2022
0e2c3ce
Refactoring
Jul 22, 2022
ff2bc07
Use enum value
Jul 22, 2022
4f85438
Fix overloaded name
Jul 22, 2022
837e5ca
WIP updating READMEs and their images
Jul 22, 2022
dddc93c
Install python 3.7 for deploy-pip-snapshot to fix a syntax error
Jul 22, 2022
685bf94
WIP on READMEs
Jul 22, 2022
64481ef
Fix two links, remove tensorboard image as it feels redundant
Jul 22, 2022
6513637
Check test-deployment-pip is working
Jul 22, 2022
8344d68
Try relative link again
Jul 22, 2022
fcd30c4
Install python 3.7.2 for test-deployment-pip
Jul 22, 2022
a12b565
README fixes
Jul 22, 2022
40c1bd5
Fix or remove TODOs
Jul 22, 2022
61ee681
Fix dependencies on TypeDB (by removing them because we depend on an …
Jul 22, 2022
ab342c5
Bump TypeDB artifact to 2.11.1
Jul 22, 2022
51dde5d
Fix path that was still using python3.6 rather than python3.7
Jul 25, 2022
d43a084
Depend on client python by tag
Jul 25, 2022
03a2ed9
Revert "Depend on client python by tag"
Jul 25, 2022
b8a54b1
Depend on client python by tag
Jul 25, 2022
1297aae
Revert "Depend on client python by tag"
Jul 25, 2022
a32920d
Fix typedb extractor target name
Jul 25, 2022
77a0b2c
Correct and update some CI configs
Jul 25, 2022
5d1c3e0
Focus on rest-deployment-pip and pip install requirements due to issu…
Jul 25, 2022
444283e
Run the example as per the readme rather than via a deployment test
Jul 25, 2022
772db76
Update READMES, remove deployment test since it's all done in the aut…
Jul 25, 2022
5df8f0d
Change the workspace name to vaticle_typedb_client_python and the pac…
Jul 25, 2022
e02e036
Reduce test timeout
Jul 25, 2022
7553308
Remove typedb-common dependency
Jul 25, 2022
6135153
Continue changing name to typedb-kglib
Jul 26, 2022
966207e
Update the validate deps step to match client python
Jul 26, 2022
5bb3147
Fix name of typedb-client dependency from pip's perspective
Jul 26, 2022
75430b3
Use requirements.txt not install_requires.txt to validate the deps
Jul 26, 2022
1476b6f
Rename KGLIB to TypeDB-ML!
Jul 27, 2022
c94f439
Remove __init__.py
Jul 27, 2022
8ee8997
Use typedb_ml as the root package rather than `typedb` due to otherwi…
Jul 27, 2022
012dde3
Update client-python dep
Jul 27, 2022
49e09fb
Fix remaining todos
Jul 27, 2022
c25d0be
Move examples to root directory
Jul 27, 2022
45adbca
Move end-to-end tests dir to root directory
Jul 27, 2022
7cb05bb
Remove query handles in favour of a less confusing Query object. Also…
Jul 27, 2022
ce38e74
Update the networkx conversion tests accordingly
Jul 27, 2022
0bbb9c7
Fix tests deps
Jul 27, 2022
ebf02dd
Update client-python dep
Jul 27, 2022
537be1d
Depend on tag not commit
Jul 27, 2022
869be8b
Use verbose option names so that dependency validation ignores these …
Jul 28, 2022
e8788b2
Uncomment automation.yml elements to have normal flow back
Jul 28, 2022
0ba7cc3
Bump VERSION
Jul 28, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .bazelversion
Original file line number Diff line number Diff line change
@@ -1 +1 @@
4.0.0
5.1.1
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/bug-report.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Please replace every line in curly brackets ( { like this } ) with appropriate a

1. OS (where TypeDB server runs): { e.g. Mac OS 10, Windows 10, Ubuntu 16.4, etc. }
2. TypeDB version (and platform): { e.g. TypeDB 2.1.0, or TypeDB Cluster 2.1.1 on Google Cloud }
3. TypeDB KGLIB and client-python version: { e.g. KGLIB 0.1 and client-python 1.4 }
3. TypeDB, typedb-ml and client-python version: { e.g. typedb-ml 0.2 and client-python 1.4 }
4. Python version: { e.g. 2.7, 3.6, etc. }
5. Other environment details:

Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,6 @@ tmp/
__pycache__/

# Data input/output directories
kglib/kgcn_tensorflow/examples/diagnosis/events/
examples/diagnosis/events/

*.egg-info/
75 changes: 42 additions & 33 deletions .grabl/automation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,6 @@ config:
dependencies:
dependencies: [build]
typedb-client-python: [build, release]
typedb-common: [build, release]
typedb: [build, release]

build:
quality:
Expand All @@ -37,47 +35,56 @@ build:
build:
image: vaticle-ubuntu-20.04
command: |
pyenv install -s 3.6.3
pyenv global 3.6.3 system
pyenv install -s 3.7.2
pyenv global 3.7.2 system
bazel build //...
bazel run @vaticle_dependencies//tool/checkstyle:test-coverage
bazel test $(bazel query 'kind(checkstyle_test, //...)') --test_output=errors
test-markdown-link-health:
image: vaticle-ubuntu-20.04
command: |
find . -name \*.md | xargs -L1 npx [email protected] -v
test-kgcn-data-loader:
test-pytorch-geometric:
image: vaticle-ubuntu-20.04
timeout: "10m"
command: |
pyenv install -s 3.6.3
pyenv global 3.6.3 system
bazel test //kglib/utils/... --test_output=streamed --spawn_strategy=standalone --action_env=PATH
test-utils:
pyenv install -s 3.7.2
pyenv global 3.7.2 system
bazel test //typedb_ml/pytorch_geometric/... --test_output=streamed --spawn_strategy=standalone --action_env=PATH
test-typedb:
image: vaticle-ubuntu-20.04
timeout: "10m"
command: |
pyenv install -s 3.6.3
pyenv global 3.6.3 system
bazel test //kglib/kgcn_data_loader/... --test_output=streamed --spawn_strategy=standalone --action_env=PATH
test-kgcn-tensorflow:
pyenv install -s 3.7.2
pyenv global 3.7.2 system
bazel test //typedb_ml/typedb/... --test_output=streamed --spawn_strategy=standalone --action_env=PATH
test-networkx:
image: vaticle-ubuntu-20.04
timeout: "10m"
command: |
pyenv install -s 3.6.3
pyenv global 3.6.3 system
bazel test //kglib/kgcn_tensorflow/... --test_output=streamed --spawn_strategy=standalone --action_env=PATH
pyenv install -s 3.7.2
pyenv global 3.7.2 system
bazel test //typedb_ml/networkx/... --test_output=streamed --spawn_strategy=standalone --action_env=PATH
test-examples:
image: vaticle-ubuntu-20.04
timeout: "10m"
command: |
pyenv install -s 3.7.2
pyenv global 3.7.2 system
bazel test //examples/... --test_output=streamed --spawn_strategy=standalone --action_env=PATH
test-end-to-end:
image: vaticle-ubuntu-20.04
timeout: "30m"
timeout: "10m"
command: |
pyenv install -s 3.6.3
pyenv global 3.6.3 system
bazel test //kglib/tests/end_to_end:diagnosis --test_output=streamed --spawn_strategy=standalone --action_env=PATH
pyenv install -s 3.7.2
pyenv global 3.7.2 system
bazel test //tests/end_to_end:diagnosis --test_output=streamed --spawn_strategy=standalone --action_env=PATH
deploy-pip-snapshot:
image: vaticle-ubuntu-20.04
dependencies: [build, test-kgcn-data-loader, test-utils, test-kgcn-tensorflow, test-end-to-end]
dependencies: [build, test-pytorch-geometric, test-typedb, test-networkx, test-examples, test-end-to-end]
command: |
pyenv install -s 3.7.2
pyenv global 3.7.2 system
export DEPLOY_PIP_USERNAME=$REPO_VATICLE_USERNAME
export DEPLOY_PIP_PASSWORD=$REPO_VATICLE_PASSWORD
bazel run --define version=$(git rev-parse HEAD) //:deploy-pip -- snapshot
Expand All @@ -89,15 +96,17 @@ build:
branch: master
type: foreground
command: |
pyenv global 3.6.10
pyenv install -s 3.7.2
pyenv global 3.7.2
pip3 install -U pip
sudo unlink /usr/bin/python3
sudo ln -s $(which python3) /usr/bin/python3
sudo ln -s /usr/share/pyshared/lsb_release.py /opt/pyenv/versions/3.6.10/lib/python3.6/site-packages/lsb_release.py
bazel run //test:typedb-extractor -- typedb-all-linux
sudo ln -s /usr/share/pyshared/lsb_release.py /opt/pyenv/versions/3.7.2/lib/python3.7/site-packages/lsb_release.py
bazel run //tests/end_to_end:typedb-extractor-linux -- typedb-all-linux
./typedb-all-linux/typedb server &
pip install --extra-index-url https://repo.vaticle.com/repository/pypi-snapshot/simple typedb-kglib==0.0.0-$GRABL_COMMIT
cd kglib/tests/deployment/ && python -m unittest kgcn.diagnosis && export TEST_SUCCESS=0 ||
pip install -r requirements.txt
pip install --extra-index-url https://repo.vaticle.com/repository/pypi-snapshot/simple typedb-ml==0.0.0-$GRABL_COMMIT
python -m examples.diagnosis.diagnosis "./typedb-all-linux" && export TEST_SUCCESS=0 ||
export TEST_SUCCESS=1
kill $(jps | awk '/TypeDBServer/ {print $1}')
exit $TEST_SUCCESS
Expand All @@ -108,27 +117,27 @@ release:
validation:
validate-dependencies:
image: vaticle-ubuntu-20.04
command: bazel test //:release-validate-deps --test_output=streamed
command: bazel test //:release-validate-python-deps --test_output=streamed
deployment:
deploy-github:
image: vaticle-ubuntu-20.04
command: |
pyenv install -s 3.6.10
pyenv global 3.6.10 system
pyenv install -s 3.7.2
pyenv global 3.7.2 system
pip3 install -U pip
pip install certifi
export ARTIFACT_USERNAME=$REPO_VATICLE_USERNAME
export ARTIFACT_PASSWORD=$REPO_VATICLE_PASSWORD
bazel run @vaticle_dependencies//distribution/artifact:create-netrc
export RELEASE_NOTES_TOKEN=$REPO_GITHUB_TOKEN
bazel run @vaticle_dependencies//tool/release:create-notes -- kglib $(cat VERSION) ./RELEASE_TEMPLATE.md
export NOTES_CREATE_TOKEN=$REPO_GITHUB_TOKEN
bazel run @vaticle_dependencies//tool/release/notes:create -- $GRABL_OWNER $GRABL_REPO $GRABL_COMMIT $(cat VERSION) ./RELEASE_TEMPLATE.md
export DEPLOY_GITHUB_TOKEN=$REPO_GITHUB_TOKEN
bazel run --define version=$(cat VERSION) //:deploy-github -- $GRABL_COMMIT
deploy-pip-release:
image: vaticle-ubuntu-20.04
command: |
pyenv install -s 3.6.10
pyenv global 3.6.10 system
pyenv install -s 3.7.2
pyenv global 3.7.2 system
pip3 install -U pip
export ARTIFACT_USERNAME=$REPO_VATICLE_USERNAME
export ARTIFACT_PASSWORD=$REPO_VATICLE_PASSWORD
Expand Down
35 changes: 19 additions & 16 deletions BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -23,27 +23,25 @@ exports_files(["requirements.txt", "RELEASE_TEMPLATE.md"])

load("@rules_python//python:defs.bzl", "py_library", "py_test")

load("@vaticle_kglib_pip//:requirements.bzl",
vaticle_kglib_requirement = "requirement")
load("@vaticle_typedb_ml_pip//:requirements.bzl", vaticle_typedb_ml_requirement = "requirement")

load("@vaticle_bazel_distribution//github:rules.bzl", "deploy_github")
load("@vaticle_bazel_distribution//pip:rules.bzl", "assemble_pip", "deploy_pip")
load("@vaticle_kglib_pip//:requirements.bzl",
vaticle_kglib_requirement = "requirement")
load("@vaticle_typedb_ml_pip//:requirements.bzl", vaticle_typedb_ml_requirement = "requirement")

load("@vaticle_dependencies//distribution:deployment.bzl", "deployment")
load("//:deployment.bzl", github_deployment = "deployment")
load("@vaticle_dependencies//tool/release/deps:rules.bzl", "release_validate_deps")
load("@vaticle_dependencies//tool/release/deps:rules.bzl", "release_validate_python_deps")

load("@vaticle_dependencies//tool/checkstyle:rules.bzl", "checkstyle_test")

assemble_pip(
name = "assemble-pip",
target = "//kglib:kglib",
package_name = "vaticle-kglib",
target = "//typedb_ml:typedb-ml",
package_name = "typedb-ml",
classifiers = [
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3.7",
"License :: OSI Approved :: Apache Software License",
"Operating System :: OS Independent",
"Intended Audience :: Developers",
Expand All @@ -54,11 +52,11 @@ assemble_pip(
"Topic :: Software Development :: Libraries",
"Topic :: Software Development :: Libraries :: Python Modules"
],
url = "https://github.com/vaticle/kglib",
url = "https://github.com/vaticle/typedb-ml",
author = "Vaticle",
author_email = "[email protected]",
license = "Apache-2.0",
requirements_file = "//:requirements.txt",
requirements_file = "//:install_requires.txt",
keywords = ["machine learning", "logical reasoning", "knowledege graph", "typedb", "database", "graph",
"knowledgebase", "knowledge-engineering"],

Expand All @@ -73,14 +71,12 @@ deploy_pip(
release = deployment["pypi.release"],
)

release_validate_deps(
name = "release-validate-deps",
refs = "@vaticle_kglib_workspace_refs//:refs.json",
release_validate_python_deps(
name = "release-validate-python-deps",
requirements = "//:requirements.txt",
tagged_deps = [
"@vaticle_typedb",
"@vaticle_typedb_client_python",
"typedb-client",
],
tags = ["manual"]
)

checkstyle_test(
Expand All @@ -89,6 +85,13 @@ checkstyle_test(
"*",
".grabl/*",
]),
exclude = glob([
"*.md"
]) + [
".bazelversion",
"LICENSE",
"VERSION",
],
license_type = "apache-header",
)

Expand Down
74 changes: 39 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,68 +1,71 @@
[![GitHub release](https://img.shields.io/github/release/vaticle/kglib.svg)](https://github.com/vaticle/typedb/releases/latest)
[![GitHub release](https://img.shields.io/github/release/vaticle/typedb-ml.svg)](https://github.com/vaticle/typedb/releases/latest)
[![Discord](https://img.shields.io/discord/665254494820368395?color=7389D8&label=chat&logo=discord&logoColor=ffffff)](https://vaticle.com/discord)
[![Discussion Forum](https://img.shields.io/discourse/https/forum.vaticle.com/topics.svg)](https://forum.vaticle.com)
[![Stack Overflow](https://img.shields.io/badge/stackoverflow-typedb-796de3.svg)](https://stackoverflow.com/questions/tagged/typedb)
[![Stack Overflow](https://img.shields.io/badge/stackoverflow-typeql-3dce8c.svg)](https://stackoverflow.com/questions/tagged/typeql)

# TypeDB KGLIB (Knowledge Graph Library)
# TypeDB ML
_Previously known as KGLIB._

**KGLIB provides tools to enable machine learning with [TypeDB](https://github.com/vaticle/typedb).**
**TypeDB ML provides tools to enable graph algorithms and machine learning with [TypeDB](https://github.com/vaticle/typedb).**

This library is under development and will henceforth be transformed into primarily infrastructure tools and integrations between TypeDB and machine learning libraries.
There are integrations for [NetworkX](https://networkx.org) and for [PyTorch Geometric (PyG)](https://github.com/pyg-team/pytorch_geometric).

## Machine Learning Pipeline
[NetworkX](https://networkx.org) integration allows you to use a [large library of algorithms](https://networkx.org/documentation/stable/reference/algorithms/index.html) over graph data exported from TypeDB.

![Flow Diagram](kglib/kgcn_tensorflow/.images/knowledge_graph_machine_learning.png)
[PyTorch Geometric (PyG)](https://github.com/pyg-team/pytorch_geometric) integration gives you a toolbox to build Graph Neural Networks (GNNs) for your TypeDB data, with an example included for link prediction (or: binary relation prediction, in TypeDB terms). The structure of the GNNs are totally customisable, with network components for popular topics such as graph attention and graph transformers built-in.

The pipeline provided helps by allowing us to extract subgraphs from TypeDB. Each subgraph is a training example, which are sent to the learner in batches. Algorithms using this approach are scalable since they do not need to hold the whole graph in memory for training.
## Features

The pipeline is as follows:
1. Extract data from `TypeDB` into Python [NetworkX](https://networkx.org) in-memory subgraphs by specifying multiple [TypeQL](https://github.com/vaticle/typeql) queries.
2. Encode the nodes and edges of the NetworkX graphs
3. Either (a) transform the encoded values into features, ready for input into a graph/geometric learning pipeline (for example the upcoming PyTorch implementation); or (b) Embed the encoded values according to the Types present in your database (TensorFlow only, PyTorch coming soon). This type-centric embedding is crucial to extracting the context explicitly captured in TypeDB's Type System.
4. Feed the features to a learning algorithm (see below)
5. Optionally, store the predictions made by the learner in TypeDB. These predictions can then be queried using TypeQL. This means we can trivially run more learning tasks over the knowledge base, including the newly made predictions. This is knowledge graph completion.

## Learning Algorithms
This repo contains one algorithmic implementation: [*Knowledge Graph Convolutional Network* (KGCN)](kglib/kgcn_tensorflow). This is a generic method for relation predication over any TypeDB database. There is a [full worked example](kglib/kgcn_tensorflow/examples/diagnosis) and an explanation of how the approach works.

You are encouraged to use the tools available in KGLIB to interface TypeDB to your own algorithmic implementations, or to use/leverage prebuilt implementations available from popular libraries such as [PyTorch Geometric](https://github.com/rusty1s/pytorch_geometric) or [Graph Nets](https://github.com/deepmind/graph_nets) (TensorFlow/Sonnet).
### NetworkX
- Declare the graph structure of your queries, with optional sampling functions.
- Query a TypeDB instance and combine many results across many queries into a single graph (`build_graph_from_queries`).
### PyTorch Geometric
- A `DataSet` object to lazily load graphs from a TypeDB instance. Each graph is converted to a PyG `Data` object.
- It's most natural to work with `HeteroData` objects since all data in TypeDB has a type. This conversion is available by default in PyG, but TypeDB-ML provides `store_concepts_by_type` to map concepts by type so that they can be re-associated after learning is finished.
- A `FeatureEncoder` to orchestrate encoders to generate features for graphs.
- Encoders for Continuous and Categorical values to apply encodings/embedding spaces to the types and attribute values present in TypeDB data.
- A [full example for link prediction](examples/diagnosis)
### Other
- Example usage of Tensorboard for PyG `HeteroData`

## Resources
You may find the following resources useful:
- [Strongly Typed Data for Machine Learning](https://www.youtube.com/watch?v=qhUyurWMiSQ) (YouTube)
- [How Can We Complete a Knowledge Graph?](https://www.youtube.com/watch?v=nYDi1_UaFtU) (YouTube)
You may find the following resources useful, particularly to understand why TypeDB-ML started:
- [Strongly Typed Data for Machine Learning](https://www.youtube.com/watch?v=qhUyurWMiSQ) (YouTube, 2021)
- [How Can We Complete a Knowledge Graph?](https://www.youtube.com/watch?v=nYDi1_UaFtU) (YouTube, 2018)

## Quickstart

### Requirements
### Install

- Python >= 3.7.x

- Python >= 3.6, <= 3.7.x (TensorFlow 1.14.0 doesn't support later Python versions).
- Grab the `requirements.txt` file from [here](requirements.txt) and install the requirements with `pip install requirements.txt`. This is due to some intricacies installing PyG's dependencies, see [here](https://github.com/pyg-team/pytorch_geometric/issues/861) for details.

- KGLIB installed via pip: `pip install typedb-kglib`.
- Installed TypeDB-ML: `pip install typedb-ml`.

- [TypeDB 2.1.1](https://github.com/vaticle/typedb/releases) running in the background.
- [TypeDB 2.11.1](https://github.com/vaticle/typedb/releases) running in the background.

- `typedb-client-python` 2.1.0 ([PyPi](https://pypi.org/project/typedb-client/), [GitHub release](https://github.com/vaticle/typedb-client-python/releases)). This should be installed automatically when you `pip install typedb-kglib`.
- `typedb-client-python` 2.11.x ([PyPi](https://pypi.org/project/typedb-client/), [GitHub release](https://github.com/vaticle/typedb-client-python/releases)). This should be installed automatically when you `pip install typedb-ml`.

### Run the Example

Take a look at [*Knowledge Graph Convolutional Networks* (KGCNs)](kglib/kgcn_tensorflow) to see a walkthrough of how to use the library.
Take a look at the [PyTorch Geometric heterogeneous link prediction example](examples/diagnosis) to see how to use TypeDB-ML to build a GNN on TypeDB data.

### Building from source

It's expected that you will use Pip to install, but should you need to make your own changes to the library, and import it into your project, you can build from source as follows.
It's expected that you will use Pip to install, but should you need to make your own changes to the library, and import it into your project, you can build from source as follows:

Clone KGLIB:
Clone TypeDB-ML:

```
git clone [email protected]:vaticle/kglib.git
git clone [email protected]:vaticle/typedb-ml.git
```

Go into the project directory:

```
cd kglib
cd typedb-ml
```

Build all targets:
Expand All @@ -71,10 +74,10 @@ Build all targets:
bazel build //...
```

Run all tests. Requires Python 3.6+ on your `PATH`. Test dependencies are for Linux since that is the CI environment:
Run all tests. Requires Python 3.7+ on your `PATH`. Test dependencies are for Linux since that is the CI environment:

```
bazel test //kglib/... --test_output=streamed --spawn_strategy=standalone --action_env=PATH
bazel test //typedb_ml/... --test_output=streamed --spawn_strategy=standalone --action_env=PATH
```

Build the pip distribution. Outputs to `bazel-bin`:
Expand All @@ -85,6 +88,7 @@ bazel build //:assemble-pip

## Development

To follow the development conversation, please join the [Vaticle Discord](https://discord.com/invite/grakn), and join the `#kglib` channel. Alternatively, start a new topic on the [Vaticle Discussion Forum](https://forum.vaticle.com).
To follow the development conversation, please join the [Vaticle Discord](https://discord.com/invite/vaticle), and join the `#typedb-ml` channel. Alternatively, start a new topic on the [Vaticle Discussion Forum](https://forum.vaticle.com).

KGLIB requires that you have migrated your data into a [TypeDB](https://github.com/vaticle/typedb) or TypeDB Cluster instance. There is an [official examples repo](https://github.com/vaticle/examples) for how to go about this, and information available on [migration in the docs](https://docs.vaticle.com/docs/examples/phone-calls-migration-python). Alternatively, there are fantastic community-led projects growing in the [TypeDB OSI](https://typedb.org) to facilitate fast and easy data loading.
TypeDB-ML requires that you have migrated your data into a [TypeDB](https://github.com/vaticle/typedb) or TypeDB
Cluster instance. There is an [official examples repo](https://github.com/vaticle/examples) for how to go about this, and information available on [migration in the docs](https://docs.vaticle.com/docs/examples/phone-calls-migration-python). Alternatively, there are fantastic community-led projects growing in the [TypeDB OSI](https://typedb.org) to facilitate fast and easy data loading, for example [TypeDB Loader](https://github.com/typedb-osi/typedb-loader).
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.2.2
0.3.0
Loading