Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not restore autoscaling policy from snapshots #66023

Merged
merged 9 commits into from
Dec 11, 2020

Conversation

ywangd
Copy link
Member

@ywangd ywangd commented Dec 8, 2020

This PR excludes autoscaling policies from being restored from a snapshot.

@ywangd ywangd added >non-issue :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC v8.0.0 v7.11.0 labels Dec 8, 2020
@elasticmachine elasticmachine added the Team:Security Meta label for security team label Dec 8, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

@ywangd ywangd marked this pull request as draft December 8, 2020 13:14
@ywangd
Copy link
Member Author

ywangd commented Dec 8, 2020

The tricky part of this change is that snapshot/restore, autoscaling and operator privileges are alll in different packages. Other than snapshot/restore, the other two are in xpack. This creates a problem when snapshot/restore needs to know whether operator privileges are enabled and whether the custom metadata is for "autoscaling".

As an initial attempt, I just took the simplest approach to duplicate the strings (autoscaling for metadata custom and xpack.security.operator_privileges.enabled for the setting) in RestoreService. This is likely non-ideal. It would be better if these information can be pulled by snapshot/restore code without having to explicity spell out autoscaling or operator privileges. I am open to suggestions.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of initial comments to discuss/clarify before reviewing the tests.

@@ -454,7 +459,8 @@ public ClusterState execute(ClusterState currentState) {
if (metadata.customs() != null) {
for (ObjectObjectCursor<String, Metadata.Custom> cursor : metadata.customs()) {
if (RepositoriesMetadata.TYPE.equals(cursor.key) == false
&& DataStreamMetadata.TYPE.equals(cursor.key) == false) {
&& DataStreamMetadata.TYPE.equals(cursor.key) == false
&& (operatorPrivilegesEnabled == false || "autoscaling".equals(cursor.key) == false)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just skip the operatorPrivilegesEnabled check. It is fine to not restore autoscaling policies for now and we can revisit once we have the proper framework around operator privileges and restore in general.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the check makes both the logic and code dependency simpler. Another practical advantage is that Cloud probably will not enable opereator privileges straight away after 7.11 release. Therefore removing this check makes sure the restore behavioiur of autoscaling policy stay the same before and after operator privileges get enabled in Cloud. Having a consistent behaviour is one of the motivations for this work to be included in 7.11. So it seems to make sense. But it would be different than what we have agreed during the meeting. @colings86 Could you please verify whether removing this check is desirable?

Copy link
Contributor

@henningandersen henningandersen Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it would be different than what we have agreed during the meeting

I believe we discussed that approach at the meeting and I came out of the meeting with the impression that this was the direction we would take (to just disable restore for autoscaling).

Copy link
Contributor

@colings86 colings86 Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with changing this so we never restore autoscaling policies regardless of whether operator privileges are enabled. At least for this first version

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks both. I updated the code to drop the operator privileges check.

@@ -454,7 +459,8 @@ public ClusterState execute(ClusterState currentState) {
if (metadata.customs() != null) {
for (ObjectObjectCursor<String, Metadata.Custom> cursor : metadata.customs()) {
if (RepositoriesMetadata.TYPE.equals(cursor.key) == false
&& DataStreamMetadata.TYPE.equals(cursor.key) == false) {
&& DataStreamMetadata.TYPE.equals(cursor.key) == false
&& (operatorPrivilegesEnabled == false || "autoscaling".equals(cursor.key) == false)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than switch on the "autoscaling" key I would prefer to add a marker interface "NotRestorableCustomMetadata", implemented by AutoscalingMetadata and use instanceof to omit it instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Thanks!

@ywangd ywangd marked this pull request as ready for review December 9, 2020 10:53
@ywangd ywangd changed the title Do not restore autoscaling policy when operator privileges are enabled Do not restore autoscaling policy from snapshots Dec 9, 2020
@ywangd ywangd added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs and removed :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC labels Dec 9, 2020
@elasticmachine elasticmachine added Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. and removed Team:Security Meta label for security team labels Dec 9, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@ywangd
Copy link
Member Author

ywangd commented Dec 9, 2020

I have updated the PR based on the discussion, including the code and its title and description. Since operator privileges are no longer relevant for this behaviour, this PR is no longer really relevant to security. Hence I dropped the :Security/Authorization tag and added :Distributed/Snapshot/Restore.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ywangd this looks good. There is a bit of excess changes from prior iterations that I think we should remove from this PR (at least they are not autoscaling related).

.setIncludeGlobalState(true)
.get();

RestStatus status = createSnapshotResponse.getSnapshotInfo().status();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why not just:

RestStatus status = createSnapshotResponse.status();

.get();
assertEquals(RestStatus.OK, restoreSnapshotResponse.status());

final GetAutoscalingPolicyAction.Request getRequest = new GetAutoscalingPolicyAction.Request(policy.name());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should extend the test to also have the case where a policy does exist when restoring to check that it survives the restore. To protect us from "restore-global-state" clearing everything during restore. Could be just randomly doing that or the delete case.

Copy link
Member Author

@ywangd ywangd Dec 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I have expanded the test to:

  • Randomly deleting or mutating the existing policy after snapshot is taken
  • Add another policy after snapshot is taken

Then assert things stay the same after restore.

@@ -100,4 +99,86 @@ public void testOperatorPrivilegesXpackUsage() throws IOException {
assertTrue((boolean) operatorPrivileges.get("available"));
assertTrue((boolean) operatorPrivileges.get("enabled"));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just remove this and the associated build file changes etc. from this PR? Once the real framework for avoiding to restore autoscaling policies on operator privilege enabled setups goes in, this might look different anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed and all unrelated changes removed.

A bit more context: I thought about deleting these changes in last commit. But I got a bit paranoid and was trying to assert that operator privileges do not interfere with the restore behaviour. Looking back now, this thinking was an artefact of how this PR evolved, which started with the assumption that operator privileges would impact the behaviour. If it had started purely from the autoscaling side, I don't think I would add anything related to operator privileges. In short, it's the right move to delete these unrelated code.

final String authHeader = "Basic "
+ Base64.getEncoder().encodeToString("test_operator:x-pack-test-password".getBytes(StandardCharsets.UTF_8));
postVotingConfigExclusionsRequest.setOptions(RequestOptions.DEFAULT.toBuilder().addHeader("Authorization", authHeader));
postVotingConfigExclusionsRequest.setOptions(RequestOptions.DEFAULT.toBuilder().addHeader("Authorization", OPERATOR_AUTH_HEADER));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also looks unrelated to at least the PR description, perhaps move to separate PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. See also above.

@ywangd
Copy link
Member Author

ywangd commented Dec 11, 2020

@henningandersen The PR is now updated to address your feedback. It is ready for another look. Thanks!

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @ywangd

@ywangd ywangd merged commit 78ad68f into elastic:master Dec 11, 2020
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Dec 11, 2020
Autoscaling policies (custom metadata) are marked as non-restorable and hence
they are never restored from a snapshot.
ywangd added a commit that referenced this pull request Dec 11, 2020
Autoscaling policies (custom metadata) are marked as non-restorable and hence
they are never restored from a snapshot.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.11.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants