-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test Shamir-to-Transit and Transit-to-Shamir Seal Migration for post-1.4 Vault. #9214
Conversation
This reverts commit 39593defd0d4c6fd79aedfd37df6298391abb9db.
This reverts commit a776452.
@@ -520,17 +540,6 @@ func NewHardcodedServerAddressProvider(cluster *vault.TestCluster, baseClusterPo | |||
} | |||
} | |||
|
|||
// SetRaftAddressProviders sets a ServerAddressProvider for all the nodes in a | |||
// cluster. | |||
func SetRaftAddressProviders(t testing.T, cluster *vault.TestCluster, provider raftlib.ServerAddressProvider) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come this is no longer used/required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm doing it inside the ReusableRaftStorage now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outside of the one TODO that you have out there, it is looking good to me. Nice work!
Co-authored-by: Calvin Leung Huang <[email protected]>
Co-authored-by: Calvin Leung Huang <[email protected]>
Co-authored-by: Calvin Leung Huang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
||
isLeader, _, _, err := core.Leader() | ||
if err != nil { | ||
t.Fatal(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably not error out here. Allow errors until the timeout elapses in case the cluster is still coming up and behaving strangely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(but still report the last error encountered upon timeout, don't just report a timeout)
time.Sleep(time.Second) | ||
} | ||
|
||
return 0, fmt.Errorf("timeout waiting leader") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 is a valid index, I would return -1 here.
} else { | ||
cluster.UnsealCore(t, core) | ||
} | ||
cluster.UnsealCore(t, core) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is is equivalent logic to what was previously here. We don't want to use recovery keys to unseal we want to wait for the stored key to do the job for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is equivalent logic to what was originally here though -- we currently don't have any tests that call this function with stored keys.
@@ -565,6 +518,35 @@ func VerifyRaftConfiguration(core *vault.TestClusterCore, numCores int) error { | |||
return nil | |||
} | |||
|
|||
// AwaitLeader waits for one of the cluster's nodes to become leader. | |||
func AwaitLeader(t testing.T, cluster *vault.TestCluster) (int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a wait for leader function, do we need both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed a function that returned the index of the leader core
basePort_ShamirToTransit_Pre14 = 20000 | ||
basePort_TransitToShamir_Pre14 = 21000 | ||
basePort_ShamirToTransit_Post14 = 22000 | ||
basePort_TransitToShamir_Post14 = 23000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little while back we did a lot of work to remove static ports from our tests which removed a lot of flakiness due to port collisions. Instead of re-introducing static ports can we use the in-memory cluster listener and a random API port? Here's an example:
Lines 182 to 193 in 6e2b91f
inmemCluster, err := cluster.NewInmemLayerCluster("inmem-cluster", 3, log.New(&log.LoggerOptions{ | |
Mutex: &sync.Mutex{}, | |
Level: log.Trace, | |
Name: "inmem-cluster", | |
})) | |
if err != nil { | |
t.Fatal(err) | |
} | |
testCluster_ForwardRequestsCommon(t, &TestClusterOptions{ | |
ClusterLayers: inmemCluster, | |
}) |
@@ -117,7 +134,8 @@ func migrateFromShamirToTransit_Pre14( | |||
var transitSeal vault.Seal | |||
|
|||
var conf = vault.CoreConfig{ | |||
Logger: logger.Named("migrateFromShamirToTransit"), | |||
Logger: logger.Named("migrateFromShamirToTransit"), | |||
DisablePerformanceStandby: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come we need to disable performance standbys here?
|
||
var baseClusterPort = basePort + 10 | ||
|
||
// Start the cluster | ||
var conf = vault.CoreConfig{ | ||
Logger: logger.Named("initializeShamir"), | ||
Logger: logger.Named("initializeShamir"), | ||
DisablePerformanceStandby: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, do we need this disabled?
// joinRaftFollowers unseals the leader, and then joins-and-unseals the | ||
// followers one at a time. We assume that the ServerAddressProvider has | ||
// already been installed on all the nodes. | ||
func joinRaftFollowers(t *testing.T, cluster *vault.TestCluster, useStoredKeys bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the test helpers version of this?
* master: (31 commits) changelog++ changelog++ Ui/replication status discoverability (#8705) Update CHANGELOG.md Counter that increments on every secret engine lease creation. (#9244) Add password_policy field to Azure docs (#9249) Replaced ClusterMetricSink's cluster name with an atomic.Value. (#9252) Fix database creds rotation panic for nil resp (#9258) changelog++ changelog++ Move sdk/helper/random -> helper/random (#9226) UI: Disallow kv2 with too large 'max versions' value (#9242) Allow mTLS for mysql secrets engine (#9181) docs: add sample revocation for mongodb (#9245) Add new Telemetry config options (#9238) Add a simple sealed gauge, updated when seal status changes (#9177) Test Shamir-to-Transit and Transit-to-Shamir Seal Migration for post-1.4 Vault. (#9214) Configure metrics wrapper with the "global" object, not just the fanout. (#9099) changelog++ Add backend type to audit logs (#9167) ...
…1.4 Vault. (#9214) * move adjustForSealMigration to vault package * fix adjustForSealMigration * begin working on new seal migration test * create shamir seal migration test * refactor testhelpers * add VerifyRaftConfiguration to testhelpers * stub out TestTransit * Revert "refactor testhelpers" This reverts commit 39593defd0d4c6fd79aedfd37df6298391abb9db. * get shamir test working again * stub out transit join * work on transit join * remove debug code * initTransit now works with raft join * runTransit works with inmem * work on runTransit with raft * runTransit works with raft * cleanup tests * TestSealMigration_TransitToShamir_Pre14 * TestSealMigration_ShamirToTransit_Pre14 * split for pre-1.4 testing * add simple tests for transit and shamir * fix typo in test suite * debug wrapper type * test debug * test-debug * refactor core migration * Revert "refactor core migration" This reverts commit a776452. * begin refactor of adjustForSealMigration * fix bug in adjustForSealMigration * clean up tests * clean up core refactoring * fix bug in shamir->transit migration * stub out test that brings individual nodes up and down * refactor NewTestCluster * pass listeners into newCore() * simplify cluster address setup * simplify extra test core setup * refactor TestCluster for readability * refactor TestCluster for readability * refactor TestCluster for readability * add shutdown func to TestCore * add cleanup func to TestCore * create RestartCore * stub out TestSealMigration_ShamirToTransit_Post14 * refactor address handling in NewTestCluster * fix listener setup in newCore() * remove unnecessary lock from setSealsForMigration() * rename sealmigration test package * use ephemeral ports below 30000 * work on post-1.4 migration testing * clean up pre-1.4 test * TestSealMigration_ShamirToTransit_Post14 works for non-raft * work on raft TestSealMigration_ShamirToTransit_Post14 * clean up test code * refactor TestClusterCore * clean up TestClusterCore * stub out some temporary tests * use HardcodedServerAddressProvider in seal migration tests * work on raft for TestSealMigration_ShamirToTransit_Post14 * always use hardcoded raft address provider in seal migration tests * debug TestSealMigration_ShamirToTransit_Post14 * fix bug in RestartCore * remove debug code * TestSealMigration_ShamirToTransit_Post14 works now * clean up debug code * clean up tests * cleanup tests * refactor test code * stub out TestSealMigration_TransitToShamir_Post14 * set seals properly for transit->shamir migration * migrateFromTransitToShamir_Post14 works for inmem * migrateFromTransitToShamir_Post14 works for raft * use base ports per-test * fix seal verification test code * simplify seal migration test suite * simplify test suite * cleanup test suite * use explicit ports below 30000 * simplify use of numTestCores * Update vault/external_tests/sealmigration/seal_migration_test.go Co-authored-by: Calvin Leung Huang <[email protected]> * Update vault/external_tests/sealmigration/seal_migration_test.go Co-authored-by: Calvin Leung Huang <[email protected]> * clean up imports * rename to StartCore() * Update vault/testing.go Co-authored-by: Calvin Leung Huang <[email protected]> * simplify test suite * clean up tests Co-authored-by: Calvin Leung Huang <[email protected]>
This PR introduces unit tests that test Shamir-to-Transit and Transit-to-Shamir Seal Migration for post-1.4 Vault.
The following changes have been made to the test suite to support these changes:
vault.Cores
in avault.TestCluster
, via the new methodsStopCore()
andRestartCore()
. This allows tests to simulate bringing an individual node in a cluster up and down.raft.ServerAddressProvider
can optionally be provided at the time theTestCluster
is created, if the cluster is using integrated raft storage. In addition, there is an option calledJoinRaftFollowers
that specifies that the raft follower nodes should be joined to the raft leader at the time the cluster is started.teststorage.ReusableRaftStorage
are now created with araft.ServerAddressProvider
.