-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evict graph from cache upon configuration updates #897
Conversation
@@ -744,6 +746,30 @@ public IndexJobFuture updateIndex(Index index, SchemaAction updateAction) { | |||
return future; | |||
} | |||
|
|||
/** | |||
* Upon the open managementsystem's commit, this graph will be evicted from the cache on all JanusGraph nodes in your | |||
* cluster, once there are no open transactions on this graph on each respective JanusGraph node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and assuming each node in the cluster is configured correctly to use the JGM
f7e5a6a
to
82cbadd
Compare
de34a61
to
06d519e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you reference the issue number in the PR description? Also do you think we'll eventually want to move JanusGraphManager to the janusgraph-server module?
|
||
public enum GraphCacheEvictionAction { | ||
|
||
EVICT, DO_NOT_EVICT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split to separate lines
@@ -0,0 +1,21 @@ | |||
// Copyright 2017 JanusGraph Authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"2018" or "2017-2018" depending on the advice in the other PR
@@ -88,7 +92,9 @@ public void read(Message message) { | |||
long typeId = VariableLong.readPositive(in); | |||
schemaCache.expireSchemaElement(typeId); | |||
} | |||
Thread ack = new Thread(new SendAckOnTxClose(evictionId, senderId, graph.getOpenTransactions())); | |||
GraphCacheEvictionAction action = serializer.readObjectNotNull(in, GraphCacheEvictionAction.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
Thread ack = new Thread(new SendAckOnTxClose(evictionId, senderId, graph.getOpenTransactions())); | ||
GraphCacheEvictionAction action = serializer.readObjectNotNull(in, GraphCacheEvictionAction.class); | ||
Preconditions.checkNotNull(action); | ||
Thread ack = new Thread(new SendAckOnTxClose(evictionId, senderId, graph.getOpenTransactions(), action, graph.getGraphName())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
@@ -127,6 +134,8 @@ public void sendCacheEviction(Set<JanusGraphSchemaVertex> updatedTypes, | |||
assert type.hasId(); | |||
VariableLong.writePositive(out,type.longId()); | |||
} | |||
if (evictGraphFromCache) out.writeObjectNotNull(EVICT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split to separate lines and add curly braces
/** | ||
* Upon the open managementsystem's commit, this graph will be asynchronously evicted from the cache on all JanusGraph nodes in your | ||
* cluster, once there are no open transactions on this graph on each respective JanusGraph node | ||
* and assuming each node is correctly configured to use the {@link JanusGraphManager}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to import JanusGraph
manager or include full path in link.
final JanusGraphManager jgm = JanusGraphManager.getInstance(); | ||
final boolean janusGraphManagerIsInBadState = null == jgm && action.equals(EVICT); | ||
if (!txStillOpen && janusGraphManagerIsInBadState) { | ||
log.error("JanusGraphManager should be instantiated on this server, but it is not. " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test to cover this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this would be hit asynchronously on a node other than the one the original action was made on, so this is a log action-- not the throwing of an error. I can add a test that ensures if the JGM is null that we don't remove it from the cache, but that is pretty obvious.
In terms of testing this functionality across multi-node deployments, I would like to avoid if we can as it would be ~non-trivial as there is no infra in place to do so, and there is also no infra in place to test the existing ManagementLogger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ManagementLogger seems to be otherwise well covered in tests (see Codecov). If the test you describe hits this case I think it's worth including.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That code-coverage analysis is pretty cool!
I can hit the case with the simpler test of opening a graph without configuring the JGM and trying to evict the graph from the cache.
@@ -191,12 +208,24 @@ public void run() { | |||
txStillOpen = true; | |||
} | |||
} | |||
if (!txStillOpen) { | |||
final JanusGraphManager jgm = JanusGraphManager.getInstance(); | |||
final boolean janusGraphManagerIsInBadState = null == jgm && action.equals(EVICT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If jgm
is null couldn't that just mean that JanusGraph Server is not being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way action.equals(EVICT)
is if the user actively called mgmt.evictGraphFromCache() on their graph's mangementSystem, or if it was called by the ConfiguredGraphFactory updateConfiguration or removeConfiguration.
The issue in this scenario is indeed that a user is using the JanusGraph Server (more specifically configured to use the JanusGraphManager) for one of their JG nodes but not the other node (the one that received this eviction request).
06d519e
to
aeb34e5
Compare
@sjudeng Pushed new code with your suggested updates and I added 2 tests, both testing the graph management logger's graph cache eviction logic; one when the JGM is null (mostly an obvious test, but we will see the code get covered in the codecov analysis) and one when it is not (the test proving the functionality works, i.e. the log can successfully evict graphs from the cache). |
Sorry, I forgot to answer this question. Artificially, maybe the JanusGraphManager should live in the |
Thanks for the explanation. It looks to me like ConfiguredGraphFactory could maybe be moved over as is. I don't see any integration to other classes in core. JGM would be more painful given integration in JanusGraphFactory and (with this PR) ManagementLogger. Can we create a separate issue to investigate further, maybe specifically for moving over ConfiguredGraphFactory if you agree that would be straight-forward, as well as for identifying/moving any other relevant classes if any? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
if (!txStillOpen) { | ||
final JanusGraphManager jgm = JanusGraphManager.getInstance(); | ||
final boolean janusGraphManagerIsInBadState = null == jgm && action.equals(EVICT); | ||
if (!txStillOpen && janusGraphManagerIsInBadState) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this branch still isn't getting hit (see here). It's not a big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, mostly nits
final JanusGraph graph = open(graphName); | ||
removeGraphFromCache(graph); | ||
} catch (Exception e) { | ||
// cannot open graph, do nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log it
final JanusGraph graph = open(graphName); | ||
removeGraphFromCache(graph); | ||
} catch (Exception e) { | ||
// cannot open graph, do nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log it
@@ -115,6 +116,7 @@ public StandardSerializer() { | |||
registerClassInternal(51,SchemaStatus.class, new EnumSerializer<>(SchemaStatus.class)); | |||
registerClassInternal(52,LogTxStatus.class, new EnumSerializer<>(LogTxStatus.class)); | |||
registerClassInternal(53,MgmtLogType.class, new EnumSerializer<>(MgmtLogType.class)); | |||
registerClassInternal(69,GraphCacheEvictionAction.class, new EnumSerializer<>(GraphCacheEvictionAction.class)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this probably doesn't need to be at this exact spot, so move it down to make it easier to find the highest registration number.
@@ -83,7 +83,7 @@ for which you have instantiated _and_ the references are stored inside the `Janu | |||
|
|||
IMPORTANT: This is an irreversible operation that will delete all graph and index data. | |||
|
|||
IMPORTANT: To ensure all graph representations are consistent across all JanusGraph nodes in your cluster, remove the graph from the `JanusGraphManager` graph reference tracker on all nodes in your cluster: `ConfiguredGraphFactory.close("graphName");`. | |||
IMPORTANT: To ensure all graph representations are consistent across all JanusGraph nodes in your cluster, this removes the graph from the `JanusGraphManager` graph cache on every node in the cluster, assuming each node has been properly configured to use the `JanusGraphManager`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than having important call-out boxes (this one and the one later on) "assuming each node has been properly configured to use the JanusGraphManager
", add a section in this doc that describes how to properly configure an environment with multiple Gremlin Servers. In that section, you could discuss the extra stuff that JanusGraph does to ensure consistency across the Gremlin Servers (perhaps discuss the scenario from the original issue) and also what might go wrong if there is a misconfigured Gremlin Server in the mix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this suggestion makes alot of sense; I prefer to leave the call out and link to the new section about using a multinode JG cluster. I also think the information in this PR here will be relevant, so I suggest creating a separate PR with this new doc section that includes the information relevant to both PRs; this will avoid merge-conflicts between these two PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate PR for docs update is fine, see #938
---- | ||
ConfiguredGraphFactory.close("graph2"); | ||
---- | ||
IMPORTANT: Any updates to a graph configuration results in the eviction of the relevant graph from the graph cache on every node in the JanusGraph cluster, assuming each node has been configured properly to use the `JanusGraphManager`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see previous comment
|
||
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
|
||
import java.time.Duration; | ||
import java.util.HashSet; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't used
aeb34e5
to
09ffe28
Compare
@pluradj Pushed update with suggested fixes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
public enum GraphCacheEvictionAction { | ||
|
||
EVICT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider documenting each of the enum values with a Javadoc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks david
When a graph configuration is updated or removed from the ConfigurationGraphManagement, the related graph is evicted from the cache on all JanusGraph nodes in the cluster. This is done to prevent users from opening graphs with old/incorrect configurations across the cluster and finding themselves in an inconsistent state. Note that this is only done for users updating the ConfigurationManagementGraph by going through the ConfiguredGraphFactory APIs; this is done for two reasons: 1. to allow more fine-tuned control for users who wish to use the lower-levls CMG APIs and 2. to avoid a cyclical dependency between the CMG and the JanusGraphManager. Issues: JanusGraph#590 Signed-off-by: David Pitera <[email protected]>
09ffe28
to
8cbe871
Compare
Evict graph from cache upon configuration updates
Evict graph from cache upon configuration updates
When a graph configuration is updated or removed from the
ConfigurationGraphManagement, the related graph is evicted from the
cache on all JanusGraph nodes in the cluster. This is done to prevent
users from opening graphs with old/incorrect configurations across the
cluster and finding themselves in an inconsistent state. Note that this
is only done for users updating the ConfigurationManagementGraph by
going through the ConfiguredGraphFactory APIs; this is done for two
reasons: 1. to allow more fine-tuned control for users who wish to use
the lower-levls CMG APIs and 2. to avoid a cyclical dependency between
the CMG and the JanusGraphManager.
Issues: #590
Signed-off-by: David Pitera [email protected]
Thank you for contributing to JanusGraph!
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
master
)?For code changes:
For documentation related changes:
[skip ci]
tag to the first line of your commit message to avoid spending CPU cycles in
Travis CI when no code, tests, or build configuration are modified?
Note:
Please ensure that once the PR is submitted, you check Travis CI for build issues and submit an update to your PR as soon as possible.