Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evict graph from cache upon configuration updates #897

Merged
merged 1 commit into from
Mar 19, 2018

Conversation

dpitera
Copy link
Contributor

@dpitera dpitera commented Jan 18, 2018

When a graph configuration is updated or removed from the
ConfigurationGraphManagement, the related graph is evicted from the
cache on all JanusGraph nodes in the cluster. This is done to prevent
users from opening graphs with old/incorrect configurations across the
cluster and finding themselves in an inconsistent state. Note that this
is only done for users updating the ConfigurationManagementGraph by
going through the ConfiguredGraphFactory APIs; this is done for two
reasons: 1. to allow more fine-tuned control for users who wish to use
the lower-levls CMG APIs and 2. to avoid a cyclical dependency between
the CMG and the JanusGraphManager.

Issues: #590

Signed-off-by: David Pitera [email protected]


Thank you for contributing to JanusGraph!

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there an issue associated with this PR? Is it referenced in the commit message?
  • Does your PR body contain #xyz where xyz is the issue number you are trying to resolve?
  • Has your PR been rebased against the latest commit within the target branch (typically master)?
  • Is your initial contribution a single, squashed commit?

For code changes:

  • Have you written and/or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE.txt file, including the main LICENSE.txt file in the root of this repository?
  • If applicable, have you updated the NOTICE.txt file, including the main NOTICE.txt file found in the root of this repository?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?
  • If this PR is a documentation-only change, have you added a [skip ci]
    tag to the first line of your commit message to avoid spending CPU cycles in
    Travis CI when no code, tests, or build configuration are modified?

Note:

Please ensure that once the PR is submitted, you check Travis CI for build issues and submit an update to your PR as soon as possible.

@janusgraph-bot janusgraph-bot added the cla: yes This PR is compliant with the CLA label Jan 18, 2018
@@ -744,6 +746,30 @@ public IndexJobFuture updateIndex(Index index, SchemaAction updateAction) {
return future;
}

/**
* Upon the open managementsystem's commit, this graph will be evicted from the cache on all JanusGraph nodes in your
* cluster, once there are no open transactions on this graph on each respective JanusGraph node.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and assuming each node in the cluster is configured correctly to use the JGM

@dpitera dpitera force-pushed the graph-cache-evictions branch 3 times, most recently from f7e5a6a to 82cbadd Compare January 19, 2018 21:32
@janusgraph-bot janusgraph-bot added the cla: yes This PR is compliant with the CLA label Jan 22, 2018
@dpitera dpitera force-pushed the graph-cache-evictions branch 4 times, most recently from de34a61 to 06d519e Compare January 22, 2018 22:51
@dpitera dpitera changed the title WIP WIP WIP Evict graph from cache upon configuration updates Jan 22, 2018
Copy link
Contributor

@sjudeng sjudeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reference the issue number in the PR description? Also do you think we'll eventually want to move JanusGraphManager to the janusgraph-server module?


public enum GraphCacheEvictionAction {

EVICT, DO_NOT_EVICT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split to separate lines

@@ -0,0 +1,21 @@
// Copyright 2017 JanusGraph Authors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"2018" or "2017-2018" depending on the advice in the other PR

@@ -88,7 +92,9 @@ public void read(Message message) {
long typeId = VariableLong.readPositive(in);
schemaCache.expireSchemaElement(typeId);
}
Thread ack = new Thread(new SendAckOnTxClose(evictionId, senderId, graph.getOpenTransactions()));
GraphCacheEvictionAction action = serializer.readObjectNotNull(in, GraphCacheEvictionAction.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final

Thread ack = new Thread(new SendAckOnTxClose(evictionId, senderId, graph.getOpenTransactions()));
GraphCacheEvictionAction action = serializer.readObjectNotNull(in, GraphCacheEvictionAction.class);
Preconditions.checkNotNull(action);
Thread ack = new Thread(new SendAckOnTxClose(evictionId, senderId, graph.getOpenTransactions(), action, graph.getGraphName()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final

@@ -127,6 +134,8 @@ public void sendCacheEviction(Set<JanusGraphSchemaVertex> updatedTypes,
assert type.hasId();
VariableLong.writePositive(out,type.longId());
}
if (evictGraphFromCache) out.writeObjectNotNull(EVICT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split to separate lines and add curly braces

/**
* Upon the open managementsystem's commit, this graph will be asynchronously evicted from the cache on all JanusGraph nodes in your
* cluster, once there are no open transactions on this graph on each respective JanusGraph node
* and assuming each node is correctly configured to use the {@link JanusGraphManager}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to import JanusGraph manager or include full path in link.

final JanusGraphManager jgm = JanusGraphManager.getInstance();
final boolean janusGraphManagerIsInBadState = null == jgm && action.equals(EVICT);
if (!txStillOpen && janusGraphManagerIsInBadState) {
log.error("JanusGraphManager should be instantiated on this server, but it is not. " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test to cover this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this would be hit asynchronously on a node other than the one the original action was made on, so this is a log action-- not the throwing of an error. I can add a test that ensures if the JGM is null that we don't remove it from the cache, but that is pretty obvious.

In terms of testing this functionality across multi-node deployments, I would like to avoid if we can as it would be ~non-trivial as there is no infra in place to do so, and there is also no infra in place to test the existing ManagementLogger.

Copy link
Contributor

@sjudeng sjudeng Feb 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ManagementLogger seems to be otherwise well covered in tests (see Codecov). If the test you describe hits this case I think it's worth including.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code-coverage analysis is pretty cool!

I can hit the case with the simpler test of opening a graph without configuring the JGM and trying to evict the graph from the cache.

@@ -191,12 +208,24 @@ public void run() {
txStillOpen = true;
}
}
if (!txStillOpen) {
final JanusGraphManager jgm = JanusGraphManager.getInstance();
final boolean janusGraphManagerIsInBadState = null == jgm && action.equals(EVICT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If jgm is null couldn't that just mean that JanusGraph Server is not being used?

Copy link
Contributor Author

@dpitera dpitera Feb 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way action.equals(EVICT) is if the user actively called mgmt.evictGraphFromCache() on their graph's mangementSystem, or if it was called by the ConfiguredGraphFactory updateConfiguration or removeConfiguration.

The issue in this scenario is indeed that a user is using the JanusGraph Server (more specifically configured to use the JanusGraphManager) for one of their JG nodes but not the other node (the one that received this eviction request).

@dpitera dpitera force-pushed the graph-cache-evictions branch from 06d519e to aeb34e5 Compare February 7, 2018 20:13
@dpitera
Copy link
Contributor Author

dpitera commented Feb 7, 2018

@sjudeng Pushed new code with your suggested updates and I added 2 tests, both testing the graph management logger's graph cache eviction logic; one when the JGM is null (mostly an obvious test, but we will see the code get covered in the codecov analysis) and one when it is not (the test proving the functionality works, i.e. the log can successfully evict graphs from the cache).

@janusgraph-bot janusgraph-bot added the cla: yes This PR is compliant with the CLA label Feb 8, 2018
@dpitera
Copy link
Contributor Author

dpitera commented Feb 8, 2018

Also do you think we'll eventually want to move JanusGraphManager to the janusgraph-server module?

Sorry, I forgot to answer this question. Artificially, maybe the JanusGraphManager should live in the janusgraph-server module. I think it's easy to sound like this is the right place to live, especially when one considers that the notion of a GraphManager is a TinkerPop ideation. Concretely, this gets complicated because things like the JanusGraphFactory, ConfiguredGraphFactory, things that live in the janusgraph-core module... rely on the JanusGraphManager; this would cause a cyclical dependency between the two modules. Even theoretically I am more of the notion that the core project is indeed the best place for the JGM to reside. What do you think? Might not be a bad thing over which to open a discussion. Either way, if the community deems it is best to move, then it would be best done in a separate PR with its own refactoring issue.

@sjudeng
Copy link
Contributor

sjudeng commented Feb 9, 2018

Thanks for the explanation. It looks to me like ConfiguredGraphFactory could maybe be moved over as is. I don't see any integration to other classes in core. JGM would be more painful given integration in JanusGraphFactory and (with this PR) ManagementLogger. Can we create a separate issue to investigate further, maybe specifically for moving over ConfiguredGraphFactory if you agree that would be straight-forward, as well as for identifying/moving any other relevant classes if any?

Copy link
Contributor

@sjudeng sjudeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

if (!txStillOpen) {
final JanusGraphManager jgm = JanusGraphManager.getInstance();
final boolean janusGraphManagerIsInBadState = null == jgm && action.equals(EVICT);
if (!txStillOpen && janusGraphManagerIsInBadState) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this branch still isn't getting hit (see here). It's not a big deal.

Copy link
Member

@pluradj pluradj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, mostly nits

final JanusGraph graph = open(graphName);
removeGraphFromCache(graph);
} catch (Exception e) {
// cannot open graph, do nothing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log it

final JanusGraph graph = open(graphName);
removeGraphFromCache(graph);
} catch (Exception e) {
// cannot open graph, do nothing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log it

@@ -115,6 +116,7 @@ public StandardSerializer() {
registerClassInternal(51,SchemaStatus.class, new EnumSerializer<>(SchemaStatus.class));
registerClassInternal(52,LogTxStatus.class, new EnumSerializer<>(LogTxStatus.class));
registerClassInternal(53,MgmtLogType.class, new EnumSerializer<>(MgmtLogType.class));
registerClassInternal(69,GraphCacheEvictionAction.class, new EnumSerializer<>(GraphCacheEvictionAction.class));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably doesn't need to be at this exact spot, so move it down to make it easier to find the highest registration number.

@@ -83,7 +83,7 @@ for which you have instantiated _and_ the references are stored inside the `Janu

IMPORTANT: This is an irreversible operation that will delete all graph and index data.

IMPORTANT: To ensure all graph representations are consistent across all JanusGraph nodes in your cluster, remove the graph from the `JanusGraphManager` graph reference tracker on all nodes in your cluster: `ConfiguredGraphFactory.close("graphName");`.
IMPORTANT: To ensure all graph representations are consistent across all JanusGraph nodes in your cluster, this removes the graph from the `JanusGraphManager` graph cache on every node in the cluster, assuming each node has been properly configured to use the `JanusGraphManager`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having important call-out boxes (this one and the one later on) "assuming each node has been properly configured to use the JanusGraphManager", add a section in this doc that describes how to properly configure an environment with multiple Gremlin Servers. In that section, you could discuss the extra stuff that JanusGraph does to ensure consistency across the Gremlin Servers (perhaps discuss the scenario from the original issue) and also what might go wrong if there is a misconfigured Gremlin Server in the mix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this suggestion makes alot of sense; I prefer to leave the call out and link to the new section about using a multinode JG cluster. I also think the information in this PR here will be relevant, so I suggest creating a separate PR with this new doc section that includes the information relevant to both PRs; this will avoid merge-conflicts between these two PRs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate PR for docs update is fine, see #938

----
ConfiguredGraphFactory.close("graph2");
----
IMPORTANT: Any updates to a graph configuration results in the eviction of the relevant graph from the graph cache on every node in the JanusGraph cluster, assuming each node has been configured properly to use the `JanusGraphManager`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see previous comment


import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.time.Duration;
import java.util.HashSet;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't used

@dpitera
Copy link
Contributor Author

dpitera commented Feb 26, 2018

@pluradj Pushed update with suggested fixes.

@janusgraph-bot janusgraph-bot added the cla: yes This PR is compliant with the CLA label Mar 1, 2018
Copy link

@amcp amcp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pluradj @sjudeng seems like all your comments have been addressed.


public enum GraphCacheEvictionAction {

EVICT,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider documenting each of the enum values with a Javadoc.

Copy link
Member

@pluradj pluradj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks david

When a graph configuration is updated or removed from the
ConfigurationGraphManagement, the related graph is evicted from the
cache on all JanusGraph nodes in the cluster. This is done to prevent
users from opening graphs with old/incorrect configurations across the
cluster and finding themselves in an inconsistent state. Note that this
is only done for users updating the ConfigurationManagementGraph by
going through the ConfiguredGraphFactory APIs; this is done for two
reasons: 1. to allow more fine-tuned control for users who wish to use
the lower-levls CMG APIs and 2. to avoid a cyclical dependency between
the CMG and the JanusGraphManager.

Issues: JanusGraph#590

Signed-off-by: David Pitera <[email protected]>
@dpitera dpitera force-pushed the graph-cache-evictions branch from 09ffe28 to 8cbe871 Compare March 16, 2018 19:51
@janusgraph-bot janusgraph-bot added the cla: yes This PR is compliant with the CLA label Mar 19, 2018
@dpitera dpitera merged commit 9071f9c into JanusGraph:master Mar 19, 2018
@pluradj pluradj added this to the Release v0.3.0 milestone Mar 29, 2018
bwatson-rti-org pushed a commit to bwatson-rti-org/janusgraph that referenced this pull request Mar 9, 2019
Evict graph from cache upon configuration updates
micpod pushed a commit to micpod/janusgraph that referenced this pull request Nov 5, 2019
Evict graph from cache upon configuration updates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This PR is compliant with the CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants