Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Shamir-to-Transit and Transit-to-Shamir Seal Migration for post-1.4 Vault. #9214

Merged
merged 86 commits into from
Jun 16, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
7864437
move adjustForSealMigration to vault package
mjarmy Apr 16, 2020
9360ddd
fix adjustForSealMigration
mjarmy Apr 16, 2020
05a612c
begin working on new seal migration test
mjarmy Apr 17, 2020
28386e9
create shamir seal migration test
mjarmy May 12, 2020
1f8c5f1
refactor testhelpers
mjarmy May 13, 2020
4cb0394
add VerifyRaftConfiguration to testhelpers
mjarmy May 13, 2020
6e8caae
stub out TestTransit
mjarmy May 13, 2020
2abca4a
Revert "refactor testhelpers"
mjarmy May 13, 2020
f7536c2
get shamir test working again
mjarmy May 13, 2020
f3039df
stub out transit join
mjarmy May 14, 2020
3066b51
work on transit join
mjarmy May 14, 2020
594bccf
remove debug code
mjarmy May 14, 2020
b223d50
initTransit now works with raft join
mjarmy May 14, 2020
26b7a94
runTransit works with inmem
mjarmy May 14, 2020
9a523df
work on runTransit with raft
mjarmy May 14, 2020
0e68260
runTransit works with raft
mjarmy May 14, 2020
ee20dba
cleanup tests
mjarmy May 14, 2020
926a957
TestSealMigration_TransitToShamir_Pre14
mjarmy May 26, 2020
6781c7e
TestSealMigration_ShamirToTransit_Pre14
mjarmy May 26, 2020
3939bf3
split for pre-1.4 testing
mjarmy May 26, 2020
f7da813
add simple tests for transit and shamir
mjarmy May 26, 2020
ccc79c9
fix typo in test suite
mjarmy May 26, 2020
e830135
debug wrapper type
mjarmy May 26, 2020
9dd793f
test debug
mjarmy May 26, 2020
23aad62
test-debug
mjarmy May 26, 2020
784102e
refactor core migration
mjarmy May 26, 2020
4ec9dc1
Revert "refactor core migration"
mjarmy May 26, 2020
e5f0d0f
begin refactor of adjustForSealMigration
mjarmy May 26, 2020
559ddcc
fix bug in adjustForSealMigration
mjarmy May 26, 2020
1293273
clean up tests
mjarmy May 26, 2020
f7f49ee
clean up core refactoring
mjarmy May 26, 2020
e2dd3d6
fix bug in shamir->transit migration
mjarmy May 26, 2020
8c44144
remove unnecessary lock from setSealsForMigration()
mjarmy Jun 8, 2020
149bd7f
rename sealmigration test package
mjarmy Jun 11, 2020
f645ab6
use ephemeral ports below 30000
mjarmy Jun 11, 2020
0cfb125
stub out test that brings individual nodes up and down
mjarmy May 27, 2020
28e5d78
refactor NewTestCluster
mjarmy May 27, 2020
3d00a70
pass listeners into newCore()
mjarmy May 27, 2020
d40b963
simplify cluster address setup
mjarmy May 27, 2020
59a3b2f
simplify extra test core setup
mjarmy May 28, 2020
2fa7116
refactor TestCluster for readability
mjarmy May 28, 2020
9676aba
refactor TestCluster for readability
mjarmy May 28, 2020
bf755da
refactor TestCluster for readability
mjarmy May 28, 2020
6b313a7
add shutdown func to TestCore
mjarmy May 28, 2020
31eb3a7
add cleanup func to TestCore
mjarmy May 28, 2020
c211fe8
create RestartCore
mjarmy May 29, 2020
91eb9c1
stub out TestSealMigration_ShamirToTransit_Post14
mjarmy May 29, 2020
0e1d20b
refactor address handling in NewTestCluster
mjarmy May 29, 2020
3738d76
fix listener setup in newCore()
mjarmy May 29, 2020
7e3b1cf
work on post-1.4 migration testing
mjarmy May 29, 2020
4481ac1
clean up pre-1.4 test
mjarmy May 29, 2020
4f5eba6
TestSealMigration_ShamirToTransit_Post14 works for non-raft
mjarmy May 29, 2020
debbbb1
work on raft TestSealMigration_ShamirToTransit_Post14
mjarmy Jun 1, 2020
357a61b
clean up test code
mjarmy Jun 1, 2020
1a5f997
refactor TestClusterCore
mjarmy Jun 1, 2020
304e932
clean up TestClusterCore
mjarmy Jun 1, 2020
fa9a707
stub out some temporary tests
mjarmy Jun 1, 2020
106bc89
use HardcodedServerAddressProvider in seal migration tests
mjarmy Jun 1, 2020
1ea087b
work on raft for TestSealMigration_ShamirToTransit_Post14
mjarmy Jun 1, 2020
4fc1f52
always use hardcoded raft address provider in seal migration tests
mjarmy Jun 2, 2020
22db307
debug TestSealMigration_ShamirToTransit_Post14
mjarmy Jun 2, 2020
1bb9944
fix bug in RestartCore
mjarmy Jun 4, 2020
f77e990
remove debug code
mjarmy Jun 4, 2020
4ba1036
TestSealMigration_ShamirToTransit_Post14 works now
mjarmy Jun 4, 2020
d717e53
clean up debug code
mjarmy Jun 4, 2020
3dcb5ed
clean up tests
mjarmy Jun 4, 2020
1f6c138
cleanup tests
mjarmy Jun 4, 2020
4239d8a
refactor test code
mjarmy Jun 4, 2020
5e9d375
stub out TestSealMigration_TransitToShamir_Post14
mjarmy Jun 5, 2020
641c40d
set seals properly for transit->shamir migration
mjarmy Jun 5, 2020
45df411
migrateFromTransitToShamir_Post14 works for inmem
mjarmy Jun 5, 2020
5f80eee
migrateFromTransitToShamir_Post14 works for raft
mjarmy Jun 5, 2020
dcc8bcc
use base ports per-test
mjarmy Jun 5, 2020
04dc99d
fix seal verification test code
mjarmy Jun 5, 2020
00f3c23
simplify seal migration test suite
mjarmy Jun 5, 2020
2221432
simplify test suite
mjarmy Jun 8, 2020
a1f6bd7
cleanup test suite
mjarmy Jun 10, 2020
8087a07
use explicit ports below 30000
mjarmy Jun 11, 2020
8c3375a
simplify use of numTestCores
mjarmy Jun 12, 2020
a6140cb
Update vault/external_tests/sealmigration/seal_migration_test.go
mjarmy Jun 16, 2020
f85159b
Update vault/external_tests/sealmigration/seal_migration_test.go
mjarmy Jun 16, 2020
fdff2c4
clean up imports
mjarmy Jun 16, 2020
ebf9716
rename to StartCore()
mjarmy Jun 16, 2020
c1cf085
Update vault/testing.go
mjarmy Jun 16, 2020
d78f80e
simplify test suite
mjarmy Jun 16, 2020
0528d78
clean up tests
mjarmy Jun 16, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 34 additions & 52 deletions helper/testhelpers/testhelpers.go
Original file line number Diff line number Diff line change
Expand Up @@ -412,16 +412,9 @@ func (p *TestRaftServerAddressProvider) ServerAddr(id raftlib.ServerID) (raftlib
}

func RaftClusterJoinNodes(t testing.T, cluster *vault.TestCluster) {
raftClusterJoinNodes(t, cluster, false)
}

func RaftClusterJoinNodesWithStoredKeys(t testing.T, cluster *vault.TestCluster) {
raftClusterJoinNodes(t, cluster, true)
}

func raftClusterJoinNodes(t testing.T, cluster *vault.TestCluster, useStoredKeys bool) {

addressProvider := &TestRaftServerAddressProvider{Cluster: cluster}

atomic.StoreUint32(&vault.UpdateClusterAddrForTests, 1)

leader := cluster.Cores[0]
Expand All @@ -430,11 +423,7 @@ func raftClusterJoinNodes(t testing.T, cluster *vault.TestCluster, useStoredKeys
{
EnsureCoreSealed(t, leader)
leader.UnderlyingRawStorage.(*raft.RaftBackend).SetServerAddressProvider(addressProvider)
if useStoredKeys {
cluster.UnsealCoreWithStoredKeys(t, leader)
} else {
cluster.UnsealCore(t, leader)
}
cluster.UnsealCore(t, leader)
vault.TestWaitActive(t, leader.Core)
}

Expand All @@ -454,37 +443,12 @@ func raftClusterJoinNodes(t testing.T, cluster *vault.TestCluster, useStoredKeys
t.Fatal(err)
}

if useStoredKeys {
// For autounseal, the raft backend is not initialized right away
// after the join. We need to wait briefly before we can unseal.
awaitUnsealWithStoredKeys(t, core)
} else {
cluster.UnsealCore(t, core)
}
cluster.UnsealCore(t, core)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is is equivalent logic to what was previously here. We don't want to use recovery keys to unseal we want to wait for the stored key to do the job for us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is equivalent logic to what was originally here though -- we currently don't have any tests that call this function with stored keys.

}

WaitForNCoresUnsealed(t, cluster, len(cluster.Cores))
}

func awaitUnsealWithStoredKeys(t testing.T, core *vault.TestClusterCore) {

timeout := time.Now().Add(30 * time.Second)
for {
if time.Now().After(timeout) {
t.Fatal("raft join: timeout waiting for core to unseal")
}
// Its actually ok for an error to happen here the first couple of
// times -- it means the raft join hasn't gotten around to initializing
// the backend yet.
err := core.UnsealWithStoredKeys(context.Background())
if err == nil {
return
}
core.Logger().Warn("raft join: failed to unseal core", "error", err)
time.Sleep(time.Second)
}
}

// HardcodedServerAddressProvider is a ServerAddressProvider that uses
// a hardcoded map of raft node addresses.
//
Expand All @@ -505,11 +469,11 @@ func (p *HardcodedServerAddressProvider) ServerAddr(id raftlib.ServerID) (raftli

// NewHardcodedServerAddressProvider is a convenience function that makes a
// ServerAddressProvider from a given cluster address base port.
func NewHardcodedServerAddressProvider(cluster *vault.TestCluster, baseClusterPort int) raftlib.ServerAddressProvider {
func NewHardcodedServerAddressProvider(numCores, baseClusterPort int) raftlib.ServerAddressProvider {

entries := make(map[raftlib.ServerID]raftlib.ServerAddress)

for i := 0; i < len(cluster.Cores); i++ {
for i := 0; i < numCores; i++ {
id := fmt.Sprintf("core-%d", i)
addr := fmt.Sprintf("127.0.0.1:%d", baseClusterPort+i)
entries[raftlib.ServerID(id)] = raftlib.ServerAddress(addr)
Expand All @@ -520,17 +484,6 @@ func NewHardcodedServerAddressProvider(cluster *vault.TestCluster, baseClusterPo
}
}

// SetRaftAddressProviders sets a ServerAddressProvider for all the nodes in a
// cluster.
func SetRaftAddressProviders(t testing.T, cluster *vault.TestCluster, provider raftlib.ServerAddressProvider) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come this is no longer used/required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm doing it inside the ReusableRaftStorage now


atomic.StoreUint32(&vault.UpdateClusterAddrForTests, 1)

for _, core := range cluster.Cores {
core.UnderlyingRawStorage.(*raft.RaftBackend).SetServerAddressProvider(provider)
}
}

// VerifyRaftConfiguration checks that we have a valid raft configuration, i.e.
// the correct number of servers, having the correct NodeIDs, and exactly one
// leader.
Expand Down Expand Up @@ -565,6 +518,35 @@ func VerifyRaftConfiguration(core *vault.TestClusterCore, numCores int) error {
return nil
}

// AwaitLeader waits for one of the cluster's nodes to become leader.
func AwaitLeader(t testing.T, cluster *vault.TestCluster) (int, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a wait for leader function, do we need both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed a function that returned the index of the leader core


timeout := time.Now().Add(30 * time.Second)
for {
if time.Now().After(timeout) {
mjarmy marked this conversation as resolved.
Show resolved Hide resolved
break
}

for i, core := range cluster.Cores {
if core.Core.Sealed() {
continue
}

isLeader, _, _, err := core.Leader()
if err != nil {
t.Fatal(err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably not error out here. Allow errors until the timeout elapses in case the cluster is still coming up and behaving strangely.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(but still report the last error encountered upon timeout, don't just report a timeout)

}
if isLeader {
return i, nil
}
}

time.Sleep(time.Second)
}

return 0, fmt.Errorf("timeout waiting leader")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 is a valid index, I would return -1 here.

}

func GenerateDebugLogs(t testing.T, client *api.Client) chan struct{} {
t.Helper()

Expand Down
30 changes: 22 additions & 8 deletions helper/testhelpers/teststorage/teststorage_reusable.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (
"github.com/mitchellh/go-testing-interface"

hclog "github.com/hashicorp/go-hclog"
raftlib "github.com/hashicorp/raft"
"github.com/hashicorp/vault/physical/raft"
"github.com/hashicorp/vault/vault"
)
Expand Down Expand Up @@ -73,7 +74,10 @@ func MakeReusableStorage(t testing.T, logger hclog.Logger, bundle *vault.Physica

// MakeReusableRaftStorage makes a physical raft backend that can be re-used
// across multiple test clusters in sequence.
func MakeReusableRaftStorage(t testing.T, logger hclog.Logger, numCores int) (ReusableStorage, StorageCleanup) {
func MakeReusableRaftStorage(
t testing.T, logger hclog.Logger, numCores int,
addressProvider raftlib.ServerAddressProvider,
) (ReusableStorage, StorageCleanup) {

raftDirs := make([]string, numCores)
for i := 0; i < numCores; i++ {
Expand All @@ -87,17 +91,14 @@ func MakeReusableRaftStorage(t testing.T, logger hclog.Logger, numCores int) (Re
conf.DisablePerformanceStandby = true
opts.KeepStandbysSealed = true
opts.PhysicalFactory = func(t testing.T, coreIdx int, logger hclog.Logger) *vault.PhysicalBackendBundle {
return makeReusableRaftBackend(t, coreIdx, logger, raftDirs[coreIdx])
return makeReusableRaftBackend(t, coreIdx, logger, raftDirs[coreIdx], addressProvider)
}
},

// Close open files being used by raft.
Cleanup: func(t testing.T, cluster *vault.TestCluster) {
for _, core := range cluster.Cores {
raftStorage := core.UnderlyingRawStorage.(*raft.RaftBackend)
if err := raftStorage.Close(); err != nil {
t.Fatal(err)
}
for i := 0; i < len(cluster.Cores); i++ {
CloseRaftStorage(t, cluster, i)
}
},
}
Expand All @@ -111,6 +112,14 @@ func MakeReusableRaftStorage(t testing.T, logger hclog.Logger, numCores int) (Re
return storage, cleanup
}

// CloseRaftStorage closes open files being used by raft.
func CloseRaftStorage(t testing.T, cluster *vault.TestCluster, idx int) {
raftStorage := cluster.Cores[idx].UnderlyingRawStorage.(*raft.RaftBackend)
if err := raftStorage.Close(); err != nil {
t.Fatal(err)
}
}

func makeRaftDir(t testing.T) string {
raftDir, err := ioutil.TempDir("", "vault-raft-")
if err != nil {
Expand All @@ -120,7 +129,10 @@ func makeRaftDir(t testing.T) string {
return raftDir
}

func makeReusableRaftBackend(t testing.T, coreIdx int, logger hclog.Logger, raftDir string) *vault.PhysicalBackendBundle {
func makeReusableRaftBackend(
t testing.T, coreIdx int, logger hclog.Logger, raftDir string,
addressProvider raftlib.ServerAddressProvider,
) *vault.PhysicalBackendBundle {

nodeID := fmt.Sprintf("core-%d", coreIdx)
conf := map[string]string{
Expand All @@ -134,6 +146,8 @@ func makeReusableRaftBackend(t testing.T, coreIdx int, logger hclog.Logger, raft
t.Fatal(err)
}

backend.(*raft.RaftBackend).SetServerAddressProvider(addressProvider)

return &vault.PhysicalBackendBundle{
Backend: backend,
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ import (
func TestSealMigration_TransitToShamir_Pre14(t *testing.T) {
// Note that we do not test integrated raft storage since this is
// a pre-1.4 test.
testVariousBackends(t, testSealMigrationTransitToShamir_Pre14, false)
testVariousBackends(t, testSealMigrationTransitToShamir_Pre14, basePort_TransitToShamir_Pre14, false)
}

func testSealMigrationTransitToShamir_Pre14(
Expand All @@ -42,7 +42,11 @@ func testSealMigrationTransitToShamir_Pre14(
tss.MakeKey(t, "transit-seal-key")

// Initialize the backend with transit.
rootToken, recoveryKeys, transitSeal := initializeTransit(t, logger, storage, basePort, tss)
cluster, _, transitSeal := initializeTransit(t, logger, storage, basePort, tss)
rootToken, recoveryKeys := cluster.RootToken, cluster.RecoveryKeys
cluster.EnsureCoresSealed(t)
storage.Cleanup(t, cluster)
cluster.Cleanup()

// Migrate the backend from transit to shamir
migrateFromTransitToShamir_Pre14(t, logger, storage, basePort, tss, transitSeal, rootToken, recoveryKeys)
Expand Down
Loading