From c06f7f3bd9c764701869e996085c3d317110a9bd Mon Sep 17 00:00:00 2001 From: David Pitera Date: Fri, 9 Mar 2018 13:21:53 -0500 Subject: [PATCH] Document graph cache and bindings for JG clusters [skip ci] Issues: #938 Signed-off-by: David Pitera --- docs/multinodejanusgraphcluster.adoc | 71 ++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) create mode 100644 docs/multinodejanusgraphcluster.adoc diff --git a/docs/multinodejanusgraphcluster.adoc b/docs/multinodejanusgraphcluster.adoc new file mode 100644 index 00000000000..a8a14b8ef1f --- /dev/null +++ b/docs/multinodejanusgraphcluster.adoc @@ -0,0 +1,71 @@ +[[things-to-consider-in-a-multi-node-janusgraph-cluster]] +== Things to Consider in a Multi-Node JanusGraph Cluster + +JanusGraph is a distributed graph database, which means it can be setup in a multi-node cluster. However, when working in such an environment, there are important things to consider. Furthermore, if configured properly, JanusGraph handles some of these special considerations for the user. + +[[dynamic-graphs]] +=== Dynamic Graphs + +JanusGraph supports <>. This is d eviation from the way in which standard GremlinServer implementations allow one to access a graph. Traditionally, users create bindings to graphs at server-start, by configuring the gremlin-server.yaml file accordingly. For example, if the `graphs {}` section of your yaml file looks like this: + +[source, properties] +---- +graphs { + graph1: conf/graph1.properties, + graph2: conf/graph2.properties +} +---- + +then you will access your graphs on the GremlinServer using the fact that the String `graph1` will be bound to the graph opened on the server as per its supplied properties file, and the same holds true for `graph2`. + +However, if we use the `ConfiguredGraphFactory` to dynamically create graphs, then those graphs are managed by the <> and the graph configurations are managed by the <>. This is especially useful because it 1. allows you to define graph configurations post-server-start and 2. allows the graph configurations to be managed in a persisted and distributed nature across your JanusGraph cluster. + +To properly use the `ConfiguredGraphFactory`, you must configure every GremlinServer in your cluster to use the `JanusGraphManager` and the `ConfigurationManagementGraph`. This procedure is explained in detail <>. + +[[ensuring-graph-references-are-up-to-date-across-the-cluster]] +==== Ensuring Graph References Are Up-To-Date Across All JanusGraph Nodes + +If you configure all your JanusGraph servers to use the <>, JanusGraph will ensure all graph representations are-up-to-date across all JanusGraph nodes in your cluster. + +For example, if you update or delete the configuration to a graph on one JanusGraph node, then we must evict that graph from the cache on _every JanusGraph node in the cluster_. Otherwise, we may have inconsistent graph representations across your cluster. JanusGraph automatically handles this eviction using a messaging log queue through the backend system that the graph in question is configured to use. + +If one of your servers is configured incorrectly, then it may not be able to successfully remove the graph from the cache. + +[IMPORTANT] +==== +Any updates to your <> will not result in the updating of graphs/graph configurations previously created using said template configuration. If you want to update the individual graph configurations, you must do so using the <>. These update APIs will _then_ result in the graphe cache eviction across all JanusGraph nodes in your cluster. +==== + +[[accessing-graph-and-traversal-objects-through-bindings]] +==== Accessing Graph and Traversal Objects Through String Bindings Across Your Cluster + +JanusGraph has the ability to bind dynamically created graphs and their traversal references to `` and `_traversal`, respectively, across all JanusGraph nodes in your cluster, with a maximum of a 20s lag for the binding to take effect on any node in the cluster. Read more about this <>. + +JanusGraph accomplishes this by having each node in your cluster poll the `ConfigurationManagementGraph` for all graphs for which you have created configurations. The `JanusGraphManager` will then open said graph with its persisted configuration, store it in its graph cache, and bind the `` to the graph reference on the `GremlinExecutor` as well as bind `_traversal` to the graph's traversal reference on the `GremlinExecutor`. + +This allows you to access a dynamically created graph and its traversal reference by their string bindings, on every node in your JanusGraph cluster. This is particularly important to be able to work with GremlinServer clients. + +[[set-up]] +===== Set Up + +To set up your cluster to bind dynamically created graphs and their traversal references, you must: + +1. Configure each node to use the <>. + +2. Configure each node to use a `JanusGraphChannelizer`, which injects lower-level GremlinServer components, like the GremlinExecutor, into the JanusGraph project, giving us greater control of the GremlinServer. + +To configure each node to use a `JanusGraphChannelizer`, we must update the `gremlin-server.yaml` to do so: + +[source, properties] +---- +channelizer: org.janusgraph.channelizers.JanusGraphWebSocketChannelizer +---- + +There are a few channelizers you can choose from: + +1. org.janusgraph.channelizers.JanusGraphWebSocketChannelizer +2. org.janusgraph.channelizers.JanusGraphHttpChannelizer +3. org.janusgraph.channelizers.JanusGraphNioChannelizer +4. org.janusgraph.channelizers.JanusGraphWsAndHttpChannelizer + +All of the channelizers share the exact same functionality as their TinkerPop counterparts.