-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not restore autoscaling policy from snapshots #66023
Do not restore autoscaling policy from snapshots #66023
Conversation
Pinging @elastic/es-security (Team:Security) |
The tricky part of this change is that snapshot/restore, autoscaling and operator privileges are alll in different packages. Other than snapshot/restore, the other two are in xpack. This creates a problem when snapshot/restore needs to know whether operator privileges are enabled and whether the custom metadata is for "autoscaling". As an initial attempt, I just took the simplest approach to duplicate the strings ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple of initial comments to discuss/clarify before reviewing the tests.
@@ -454,7 +459,8 @@ public ClusterState execute(ClusterState currentState) { | |||
if (metadata.customs() != null) { | |||
for (ObjectObjectCursor<String, Metadata.Custom> cursor : metadata.customs()) { | |||
if (RepositoriesMetadata.TYPE.equals(cursor.key) == false | |||
&& DataStreamMetadata.TYPE.equals(cursor.key) == false) { | |||
&& DataStreamMetadata.TYPE.equals(cursor.key) == false | |||
&& (operatorPrivilegesEnabled == false || "autoscaling".equals(cursor.key) == false)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should just skip the operatorPrivilegesEnabled
check. It is fine to not restore autoscaling policies for now and we can revisit once we have the proper framework around operator privileges and restore in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the check makes both the logic and code dependency simpler. Another practical advantage is that Cloud probably will not enable opereator privileges straight away after 7.11 release. Therefore removing this check makes sure the restore behavioiur of autoscaling policy stay the same before and after operator privileges get enabled in Cloud. Having a consistent behaviour is one of the motivations for this work to be included in 7.11. So it seems to make sense. But it would be different than what we have agreed during the meeting. @colings86 Could you please verify whether removing this check is desirable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it would be different than what we have agreed during the meeting
I believe we discussed that approach at the meeting and I came out of the meeting with the impression that this was the direction we would take (to just disable restore for autoscaling).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with changing this so we never restore autoscaling policies regardless of whether operator privileges are enabled. At least for this first version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks both. I updated the code to drop the operator privileges check.
@@ -454,7 +459,8 @@ public ClusterState execute(ClusterState currentState) { | |||
if (metadata.customs() != null) { | |||
for (ObjectObjectCursor<String, Metadata.Custom> cursor : metadata.customs()) { | |||
if (RepositoriesMetadata.TYPE.equals(cursor.key) == false | |||
&& DataStreamMetadata.TYPE.equals(cursor.key) == false) { | |||
&& DataStreamMetadata.TYPE.equals(cursor.key) == false | |||
&& (operatorPrivilegesEnabled == false || "autoscaling".equals(cursor.key) == false)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than switch on the "autoscaling" key I would prefer to add a marker interface "NotRestorableCustomMetadata", implemented by AutoscalingMetadata
and use instanceof to omit it instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Thanks!
Pinging @elastic/es-distributed (Team:Distributed) |
I have updated the PR based on the discussion, including the code and its title and description. Since operator privileges are no longer relevant for this behaviour, this PR is no longer really relevant to security. Hence I dropped the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ywangd this looks good. There is a bit of excess changes from prior iterations that I think we should remove from this PR (at least they are not autoscaling related).
.setIncludeGlobalState(true) | ||
.get(); | ||
|
||
RestStatus status = createSnapshotResponse.getSnapshotInfo().status(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why not just:
RestStatus status = createSnapshotResponse.status();
.get(); | ||
assertEquals(RestStatus.OK, restoreSnapshotResponse.status()); | ||
|
||
final GetAutoscalingPolicyAction.Request getRequest = new GetAutoscalingPolicyAction.Request(policy.name()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should extend the test to also have the case where a policy does exist when restoring to check that it survives the restore. To protect us from "restore-global-state" clearing everything during restore. Could be just randomly doing that or the delete case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I have expanded the test to:
- Randomly deleting or mutating the existing policy after snapshot is taken
- Add another policy after snapshot is taken
Then assert things stay the same after restore.
@@ -100,4 +99,86 @@ public void testOperatorPrivilegesXpackUsage() throws IOException { | |||
assertTrue((boolean) operatorPrivileges.get("available")); | |||
assertTrue((boolean) operatorPrivileges.get("enabled")); | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just remove this and the associated build file changes etc. from this PR? Once the real framework for avoiding to restore autoscaling policies on operator privilege enabled setups goes in, this might look different anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed and all unrelated changes removed.
A bit more context: I thought about deleting these changes in last commit. But I got a bit paranoid and was trying to assert that operator privileges do not interfere with the restore behaviour. Looking back now, this thinking was an artefact of how this PR evolved, which started with the assumption that operator privileges would impact the behaviour. If it had started purely from the autoscaling side, I don't think I would add anything related to operator privileges. In short, it's the right move to delete these unrelated code.
final String authHeader = "Basic " | ||
+ Base64.getEncoder().encodeToString("test_operator:x-pack-test-password".getBytes(StandardCharsets.UTF_8)); | ||
postVotingConfigExclusionsRequest.setOptions(RequestOptions.DEFAULT.toBuilder().addHeader("Authorization", authHeader)); | ||
postVotingConfigExclusionsRequest.setOptions(RequestOptions.DEFAULT.toBuilder().addHeader("Authorization", OPERATOR_AUTH_HEADER)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also looks unrelated to at least the PR description, perhaps move to separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. See also above.
@henningandersen The PR is now updated to address your feedback. It is ready for another look. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @ywangd
Autoscaling policies (custom metadata) are marked as non-restorable and hence they are never restored from a snapshot.
This PR excludes autoscaling policies from being restored from a snapshot.