-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Zen2] Implement Tombstone REST APIs #36007
[Zen2] Implement Tombstone REST APIs #36007
Conversation
original-brownbear
commented
Nov 28, 2018
- Adds REST API for withdrawing votes and clearing vote withdrawls
- Tests added to Netty4 module since we need a real Network impl. for Http endpoints
* Adds REST API for withdrawing votes and clearing vote withdrawls * Tests added to Netty4 module since we need a real Network impl. for Http endpoints
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions and nits.
protected Settings nodeSettings(int nodeOrdinal) { | ||
return Settings.builder().put(super.nodeSettings(nodeOrdinal)) | ||
.put(TestZenDiscovery.USE_ZEN2.getKey(), true) | ||
.put(ElectMasterService.DISCOVERY_ZEN_MINIMUM_MASTER_NODES_SETTING.getKey(), 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, the internal test cluster throws if this isn't set and you turn off auto manage min master nodes (probably something that could be adjusted for Zen2?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, so it does. Set it to something unreasonable (MAX_VALUE
) to protect against this test passing without Zen2.
.put(TestZenDiscovery.USE_ZEN2.getKey(), true) | ||
.put(ElectMasterService.DISCOVERY_ZEN_MINIMUM_MASTER_NODES_SETTING.getKey(), 2) | ||
.put(ClusterBootstrapService.INITIAL_MASTER_NODE_COUNT_SETTING.getKey(), 2) | ||
.put(DiscoverySettings.INITIAL_STATE_TIMEOUT_SETTING.getKey(), "5s") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be necessary, the cluster should form straight away.
} | ||
|
||
public void testAddAndClearVotingTombstones() throws Exception { | ||
final int nodeCount = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you inline nodeCount
?
return false; // enable http | ||
} | ||
|
||
public void testAddAndClearVotingTombstones() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think testRollingRestartOfTwoNodeCluster
would be a good name.
.setWaitForNodes(Integer.toString(nodeCount - 1)) | ||
.setTimeout(TimeValue.timeValueSeconds(30L)); | ||
|
||
clusterHealthRequestBuilder.setWaitForYellowStatus(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be in the chain of .set()
methods above, and the temporary clusterHealthRequestBuilder
can probably be inlined.
Response deleteResponse = restClient.performRequest(new Request("DELETE", "/_cluster/withdrawn_votes")); | ||
assertThat(deleteResponse.getStatusLine().getStatusCode(), is(200)); | ||
assertThat(deleteResponse.getEntity().getContentLength(), is(0L)); | ||
Response response = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the
newline? :)
public void testBasicRestApi() throws Exception { | ||
List<String> nodes = internalCluster().startNodes(3); | ||
RestClient restClient = getRestClient(); | ||
Response deleteResponse = restClient.performRequest(new Request("DELETE", "/_cluster/withdrawn_votes")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think it'd make more sense to put this after the POST
. There will, at some point, be an assertion that there's no voting tombstones in the cluster at the end of the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Just tried making that change but running into this error as a result:
[2018-11-29T00:04:21,996][WARN ][r.suppressed ] [node_t0] path: /_cluster/withdrawn_votes, params: {}
org.elasticsearch.transport.RemoteTransportException: [node_t1][127.0.0.1:33009][cluster:admin/voting/clear_tombstones]
Caused by: org.elasticsearch.ElasticsearchTimeoutException: timed out waiting for removal of nodes; if nodes should not be removed, set waitForRemoval to false. [{node_t2}{ps-Qi3TfSAOMEW6D-A5uAA}]
at org.elasticsearch.action.admin.cluster.configuration.TransportClearVotingTombstonesAction$1.onTimeout(TransportClearVotingTombstonesAction.java:109) ~[main/:?]
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:322) ~[main/:?]
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:249) ~[main/:?]
at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:561) ~[main/:?]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:627) ~[main/:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
(no matter what node I add a Tombstone for, this happens)
if nodes should not be removed, waitForRemoval to false
Should I do that in the Rest API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep :)
All but 2 comments addressed :) 2 Questions added. |
@DaveCTurner alright added the "don't wait" parameter and set the zen1 master count to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My review was unclear - I intended exposing the parameter, not hard-coding its value.
@Override | ||
protected RestChannelConsumer prepareRequest(final RestRequest request, final NodeClient client) throws IOException { | ||
ClearVotingTombstonesRequest req = new ClearVotingTombstonesRequest(); | ||
req.setWaitForRemoval(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, I meant this should be a parameter ?waitForRemoval=false
- both options are useful in different circumstances. The default of true
is the usual case, but in this particular test we should set it to false
because the node is still present.
Now that I've written that, I think it'd be good to test both cases:
- create a 3-node cluster, add a tombstone, then clear them (
?waitForRemoval=false
) <- today's test - create a 3-node cluster, add a tombstone, shut the corresponding node down, then clear the tombstones (
?waitForRemoval=true
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at other APIs, it should be wait_for_removal
not waitForRemoval
.
|
||
import static org.hamcrest.core.Is.is; | ||
|
||
// TODO: Move these tests to a more appropriate module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest:
// These tests are here today so they have access to a proper REST client. They cannot be in :server:integTest since the REST client needs a
// proper transport implementation, and they cannot be REST tests today since they need to restart nodes. When #35599 and friends land we
// should be able to move these tests to run against a proper cluster instead. TODO do this.
@DaveCTurner all done :)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Jenkins test this |