Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always set server acl tokens #832

Merged
merged 1 commit into from
Nov 4, 2021
Merged

Conversation

lkysow
Copy link
Member

@lkysow lkysow commented Nov 2, 2021

If server-acl-init is re-run, ensure that server ACL tokens have been
created for each server and call the update token API for each server
again.

This ensures that if server-acl-init fails to pass out tokens to every
server that when it is re-run it won't assume that every server has a
token.

In addition this fixes a long-standing issue where if you increase your number of servers the new servers never get ACL tokens provisioned.

Fixes #825, #677

How I've tested this PR:

  1. Brough up cluster with 3 servers:

    global:
      imageK8S: ghcr.io/lkysow/consul-k8s-dev:nov02-2
      acls:
        manageSystemACLs: true
    server:
      replicas: 3
  2. Then updated to 5 servers and ran helm upgrade.

  3. Confirmed that there are no "Coordinate update" errors in logs of new servers. This would indicate they didn't have an ACL token set.

  4. Confirmed that there are still only 5 server tokens (so it didn't re-create tokens)

How I expect reviewers to test this PR:

  • code

Checklist:

  • Tests added
  • CHANGELOG entry added

    HashiCorp engineers only, community PRs should not add a changelog entry.
    Entries should use present tense (e.g. Add support for...)

@lkysow lkysow force-pushed the lkysow/consul-server-acl-rerun branch from 36363cf to 3692c49 Compare November 2, 2021 20:51
@@ -313,7 +314,6 @@ func (c *Command) Run(args []string) int {
scheme = "https"
}

var updateServerPolicy bool
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer needed. Previously this would be set when there already exists a kube bootstrap secret to ensure that we update the server policy. Now we call c.bootstrapServers, even when the kube secret exists. That function then calls setServerTokens that then calls c.setServerPolicy(consulClient). So we don't need this codepath anymore.

@@ -1650,6 +1677,81 @@ func TestRun_AlreadyBootstrapped(t *testing.T) {
}, consulAPICalls)
}

// Test if there is an old bootstrap Secret and the server token exists
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this to test that we don't re-create the server tokens if the token already exists.

@lkysow lkysow force-pushed the lkysow/consul-server-acl-rerun branch from 3692c49 to c28e0be Compare November 2, 2021 21:07
return 1
}
}
bootstrapToken, err = c.bootstrapServers(serverAddresses, bootstrapToken, bootTokenSecretName, scheme)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we always call this.

Comment on lines +1288 to +1289
case "/v1/acl/tokens":
fmt.Fprintln(w, `[]`)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these tests get modified because now we do a token list call to ensure we're not recreating server tokens.


if bootstrapToken == "" {
var err error
bootstrapToken, err = c.bootstrapACLs(firstServerAddr, scheme, bootTokenSecretName)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted some of the code that was here into bootstrapACLs.

If server-acl-init is re-run, ensure that server ACL tokens have been
created for each server and call the update token API for each server
again.

This ensures that if server-acl-init fails to pass out tokens to every
server that when it is re-run it won't assume that every server has a
token.
@lkysow lkysow force-pushed the lkysow/consul-server-acl-rerun branch from c28e0be to bfb52ac Compare November 2, 2021 21:10
@lkysow lkysow requested review from a team, thisisnotashwin and t-eckert and removed request for a team November 2, 2021 21:11
@lkysow lkysow marked this pull request as ready for review November 2, 2021 21:11
Copy link
Contributor

@thisisnotashwin thisisnotashwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent!! Thanks @lkysow

ottaviohartman pushed a commit to ottaviohartman/consul-k8s that referenced this pull request Nov 3, 2021
Increase the memory for the client daemonset's tls-init
container. We saw an out of memory error when running on OpenShift.
@t-eckert
Copy link
Contributor

t-eckert commented Nov 3, 2021

About to review this. Should we add a test for this scenario to our acceptance tests?

@lkysow
Copy link
Member Author

lkysow commented Nov 4, 2021

Should we add a test for this scenario to our acceptance tests?

The pattern we follow with acceptance tests is to test the happy path of big features. We don't test failure paths because there are too many and so our acceptance tests would be too large and take too long. With regards to the bugfix where this is now idempotent, I don't think it fits the criteria for acceptance tests because it's not a happy path nor is it a feature. With regards to being able to upgrade to a greater number of servers, this is something that might make sense in an acceptance test but currently we don't do any upgrade testing. We've had a plan to do upgrade testing on our backlog so I think it would make sense to evaluate this when we address that ticket.

Copy link
Contributor

@t-eckert t-eckert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No comments. Very solid.

@lkysow lkysow merged commit 0b30795 into main Nov 4, 2021
@lkysow lkysow deleted the lkysow/consul-server-acl-rerun branch November 4, 2021 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

If server-acl-init fails to pass out server tokens to every server it will not retry when re-run
3 participants