Merge pull request JanusGraph#953 from dpitera/documentation-jg-clust…

…er-938 Document graph cache and bindings for JG clusters
davidclement90 · Apr 3, 2018 · 8ec173f · 8ec173f
2 parents 9071f9c + 2cc4276
commit 8ec173f
Show file tree

Hide file tree

Showing 3 changed files with 115 additions and 2 deletions.
diff --git a/docs/basics.adoc b/docs/basics.adoc
@@ -864,6 +864,8 @@ It is possible to extend Gremlin Server with other means of communication by imp
 
 include::configuredgraphfactory.adoc[]
 
+include::multinodejanusgraphcluster.adoc[]
+
 [[indexes]]
 == Indexing for Better Performance
 

diff --git a/docs/configuredgraphfactory.adoc b/docs/configuredgraphfactory.adoc
@@ -85,7 +85,7 @@ for which you have instantiated _and_ the references are stored inside the `Janu
 
 IMPORTANT: This is an irreversible operation that will delete all graph and index data.
 
-IMPORTANT: To ensure all graph representations are consistent across all JanusGraph nodes in your cluster, this removes the graph from the `JanusGraphManager` graph cache on every node in the cluster, assuming each node has been properly configured to use the `JanusGraphManager`.
+IMPORTANT: To ensure all graph representations are consistent across all JanusGraph nodes in your cluster, this removes the graph from the `JanusGraphManager` graph cache on every node in the cluster, assuming each node has been properly configured to use the `JanusGraphManager`. Learn more about this feature and how to configure your server to use said feature <<multinodejanusgraphcluster.adoc#graph-reference-consistency, here>>.
 
 [[configuring-JanusGraph-server-for-configuredgraphfactory]]
 === Configuring JanusGraph Server for ConfiguredGraphFactory
@@ -210,7 +210,7 @@ the property `graph.graphname` go through the `JanusGraphManager` which
 keeps track of graph references created on the given JVM. Think of it as
 a graph cache. For this reason:
 
-IMPORTANT: Any updates to a graph configuration results in the eviction of the relevant graph from the graph cache on every node in the JanusGraph cluster, assuming each node has been configured properly to use the `JanusGraphManager`.
+IMPORTANT: Any updates to a graph configuration results in the eviction of the relevant graph from the graph cache on every node in the JanusGraph cluster, assuming each node has been configured properly to use the `JanusGraphManager`. Learn more about this feature and how to configure your server to use said feature <<multinodejanusgraphcluster.adoc#graph-reference-consistency, here>>.
 
 Since graphs created using the template configuration first create a
 configuration for that graph in question using a copy and create method,
@@ -386,6 +386,8 @@ def g3 = ConfiguredGraphFactory.create("graph3"); //storage directory === /data/
 
 Graphs created using the ConfiguredGraphFactory are bound to the executor context on the Gremlin Server by the "graph.graphname" property, and the graph's traversal reference is bound to the context by "<graphname>_traversal". This means, on subsequent connections to the server after the first time you create/open a graph, you can access the graph and traversal references by the "<graphname>" and "<graphname>_traversal" properties.
 
+Learn more about this feature and how to configure your server to use said feature <<multinodejanusgraphcluster.adoc#dynamic-graph-and-traversal-bindings, here>>.
+
 IMPORTANT: If you are connected to a remote Gremlin Server using the Gremlin Console and a *sessioned* connection, then you will have to reconnect to the server to bind the variables. This is also true for any sessioned WebSocket connection.
 
 IMPORTANT: The JanusGraphManager rebinds every graph stored on the ConfigurationManagementGraph (or those for which you have created configurations) every 20 seconds. This means your graph and traversal bindings for graphs created using the ConfigredGraphFactory will be available on all JanusGraph nodes with a maximum of a 20 second lag. It also means that a binding will still be available on a node after a server restart.

diff --git a/docs/multinodejanusgraphcluster.adoc b/docs/multinodejanusgraphcluster.adoc
@@ -0,0 +1,109 @@
+[[things-to-consider-in-a-multi-node-janusgraph-cluster]]
+== Things to Consider in a Multi-Node JanusGraph Cluster
+
+JanusGraph is a distributed graph database, which means it can be setup in a multi-node cluster. However, when working in such an environment, there are important things to consider. Furthermore, if configured properly, JanusGraph handles some of these special considerations for the user.
+
+[[dynamic-graphs]]
+=== Dynamic Graphs
+
+JanusGraph supports <<configuredgraphfactory.adoc#configuredgraphfactory,dynamically creating graphs>>. This is deviation from the way in which standard Gremlin Server implementations allow one to access a graph. Traditionally, users create bindings to graphs at server-start, by configuring the gremlin-server.yaml file accordingly. For example, if the `graphs` section of your yaml file looks like this:
+
+[source, properties]
+----
+graphs {
+  graph1: conf/graph1.properties,
+  graph2: conf/graph2.properties
+}
+----
+
+then you will access your graphs on the Gremlin Server using the fact that the String `graph1` will be bound to the graph opened on the server as per its supplied properties file, and the same holds true for `graph2`.
+
+However, if we use the `ConfiguredGraphFactory` to dynamically create graphs, then those graphs are managed by the <<configuredgraphfactory.adoc#JanusGraphmanager,JanusGraphManager>> and the graph configurations are managed by the <<configuredgraphfactory.adoc#configurationmanagementgraph,ConfigurationManagementGraph>>. This is especially useful because it 1. allows you to define graph configurations post-server-start and 2. allows the graph configurations to be managed in a persisted and distributed nature across your JanusGraph cluster.
+
+To properly use the `ConfiguredGraphFactory`, you must configure every Gremlin Server in your cluster to use the `JanusGraphManager` and the `ConfigurationManagementGraph`. This procedure is explained in detail <<configuredgraphfactory.adoc#configuring-JanusGraph-server-for-configuredgraphfactory,here>>.
+
+[[graph-reference-consistency]]
+==== Graph Reference Consistency
+
+If you configure all your JanusGraph servers to use the <<configuredgraphfactory.adoc#configuring-JanusGraph-server-for-configuredgraphfactory,ConfiguredGraphFactory>>, JanusGraph will ensure all graph representations are-up-to-date across all JanusGraph nodes in your cluster.
+
+For example, if you update or delete the configuration to a graph on one JanusGraph node, then we must evict that graph from the cache on _every JanusGraph node in the cluster_. Otherwise, we may have inconsistent graph representations across your cluster. JanusGraph automatically handles this eviction using a messaging log queue through the backend system that the graph in question is configured to use.
+
+If one of your servers is configured incorrectly, then it may not be able to successfully remove the graph from the cache.
+
+[IMPORTANT]
+====
+Any updates to your <<configuredgraphfactory.adoc#template-configuration,TemplateConfiguration>> will not result in the updating of graphs/graph configurations previously created using said template configuration. If you want to update the individual graph configurations, you must do so using the <<configuredgraphfactory.adoc#updating-configurations,available update APIs>>. These update APIs will _then_ result in the graphe cache eviction across all JanusGraph nodes in your cluster.
+====
+
+[[dynamic-graph-and-traversal-bindings]]
+==== Dynamic Graph and Traversal Bindings
+
+JanusGraph has the ability to bind dynamically created graphs and their traversal references to `<graph.graphname>` and `<graph.graphname>_traversal`, respectively, across all JanusGraph nodes in your cluster, with a maximum of a 20s lag for the binding to take effect on any node in the cluster. Read more about this <<configuredgraphfactory.adoc#graph-and-traversal-bindings, here>>.
+
+JanusGraph accomplishes this by having each node in your cluster poll the `ConfigurationManagementGraph` for all graphs for which you have created configurations. The `JanusGraphManager` will then open said graph with its persisted configuration, store it in its graph cache, and bind the `<graph.graphname>` to the graph reference on the `GremlinExecutor` as well as bind `<graph.graphname>_traversal` to the graph's traversal reference on the `GremlinExecutor`.
+
+This allows you to access a dynamically created graph and its traversal reference by their string bindings, on every node in your JanusGraph cluster. This is particularly important to be able to work with Gremlin Server clients and use <<tinkerpop-with-remote,TinkerPops's withRemote functionality>>.
+
+[[set-up]]
+===== Set Up
+
+To set up your cluster to bind dynamically created graphs and their traversal references, you must:
+
+1. Configure each node to use the <<configuredgraphfactory.adoc#configuring-JanusGraph-server-for-configuredgraphfactory,ConfiguredGraphFactory>>.
+
+2. Configure each node to use a `JanusGraphChannelizer`, which injects lower-level Gremlin Server components, like the GremlinExecutor, into the JanusGraph project, giving us greater control of the Gremlin Server.
+
+To configure each node to use a `JanusGraphChannelizer`, we must update the `gremlin-server.yaml` to do so:
+
+[source, properties]
+----
+channelizer: org.janusgraph.channelizers.JanusGraphWebSocketChannelizer
+----
+
+There are a few channelizers you can choose from:
+
+1. org.janusgraph.channelizers.JanusGraphWebSocketChannelizer
+2. org.janusgraph.channelizers.JanusGraphHttpChannelizer
+3. org.janusgraph.channelizers.JanusGraphNioChannelizer
+4. org.janusgraph.channelizers.JanusGraphWsAndHttpChannelizer
+
+All of the channelizers share the exact same functionality as their TinkerPop counterparts.
+
+[[tinkerpop-with-remote]]
+===== Using TinkerPop's withRemote Functionality
+
+Since traversal references are bound on the JanusGraph servers, we can make use of http://tinkerpop.apache.org/docs/current/reference/#connecting-via-remotegraph[TinkerPop's withRemote functionality]. This will allow one to run gremlin queries locally, against a remote graph reference. Traditionally, one runs queries against remote Gremlin Servers by sending String script representations, which are processed on the remote server and the response serialized and sent back. However, TinkerPop also allows for the use of `remoteGraph`, which could be useful if you are building a TinkerPop compliant graph infrastructure that is easily transferable to multiple implementations.
+
+
+To use this functionality in JanusGraph, we must first ensure we have created a graph on the remote JanusGraph cluster:
+
+```
+ConfiguredGraphFactory.create("graph1");
+```
+
+Next, we must wait 20 seconds to ensure the traversal reference is bound on every JanusGraph node in the remote cluster.
+
+Finally, we can locally make use of the `withRemote` method to access a local reference to a remote graph:
+
+[source, gremlin]
+----
+gremlin> cluster = Cluster.open('conf/remote-objects.yaml')
+==>localhost/127.0.0.1:8182
+gremlin> graph = EmptyGraph.instance()
+==>emptygraph[empty]
+gremlin> g = graph.traversal().withRemote(DriverRemoteConnection.using(cluster, "graph1_traversal"))
+==>graphtraversalsource[emptygraph[empty], standard]
+----
+
+For completion, the above `conf/remote-objects.yaml` should tell the `Cluster` API how to access the remote JanusGraph servers; for example, it may look like:
+
+[source, properties]
+----
+hosts: [remoteaddress1.com, remoteaddress2.com]
+port: 8182
+username: admin
+password: password
+connectionPool: { enableSsl: true }
+serializer: { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
+----