Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for docker testclusters #20247

Merged
merged 66 commits into from
Apr 24, 2023
Merged

Add support for docker testclusters #20247

merged 66 commits into from
Apr 24, 2023

Conversation

ncabatoff
Copy link
Collaborator

@ncabatoff ncabatoff commented Apr 19, 2023

No description provided.

@ncabatoff ncabatoff marked this pull request as ready for review April 19, 2023 13:40
@ncabatoff ncabatoff requested review from a team, raskchanky and cipherboy April 19, 2023 13:40
ncabatoff and others added 5 commits April 21, 2023 13:21
When using the podman service runner (creating a socket equivalent to
Docker's), tests fail with a nil pointer exception since the bridge
network is named "podman", not "bridge".

Allow single networked containers and use whatever name the container
runner assigns to it.

Signed-off-by: Alexander Scheel <[email protected]>
} else if d.RunOptions.LogConsumer != nil {
consumeLogs = true
logStdout = &LogConsumerWriter{d.RunOptions.LogConsumer}
logStderr = &LogConsumerWriter{d.RunOptions.LogConsumer}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging changes will be great to have for the PKI tests, to avoid having to use the tests.OnError(...) manual logging :-)

sdk/helper/testcluster/docker/environment.go Outdated Show resolved Hide resolved
sdk/helper/docker/testhelpers.go Show resolved Hide resolved
func (dc *DockerCluster) addNode(ctx context.Context, opts *DockerClusterOptions) error {
i := len(dc.ClusterNodes)
nodeID := fmt.Sprintf("core-%d", i)
node := &dockerClusterNode{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When adding the first node, and similar to the above comment about saving the network, I wonder if it'd be valuable to re-add the first node's network here?

In particular, we want to ensure that we always land nodes on the same network so they can communicate to each other; I think this only happens if someone is actively playing around with default networks though maybe not if we don't allow choosing the first container's network via DockerCluster?

I don't know that it matters much any more, but the reason why we saved network and re-added it manually to subsequent containers in the pkiext/nginx_test.go suite was that CircleCI's remote docker setup would sometimes land us on different networks, which was the oddest thing ever given the other networks still seemed to exist.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by re-adding a network? I looked in nginx_test but couldn't figure out what you're referring to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry in RunNginxRootTest we call ..., networkName, ... := buildNginxContainer which creates a container on an arbitrary network and subsequent calls which create containers which are supposed to be on the same network take that in e.g., CheckWithClients(t, networkName, ...) -- to ensure they're created on the same network as the nginx instance.

Here, since we're exposing this in sdk as a generic test dependency, we can't know how the container would get created, so it'd be best to ensure that the subsequent nodes land on the same network (via forcing it to) as the first node.

"SKIP_SETCAP=true",
"VAULT_LOG_FORMAT=json",
},
Ports: []string{"8200/tcp", "8201/tcp"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be cool in a future PR to add the ability to use a non-TLS enabled port here.

OCSP and CRL fetching should (theoretically) use a non-TLS listener and I've occasionally had some issues with cert auth method talking to localhost over TLS.

I don't think I need this immediately, perhaps I could add this in the future when I do though. But curious if you have thoughts on how this would look, it doesn't look like we get that much influence over listeners but I might've missed something.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't yet added any support for configuring listeners in HCL, but I don't recall any major obstacles to doing so, just didn't need it yet.

Copy link
Contributor

@cipherboy cipherboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, didn't mean to request changes, meant to just submit review lol.

@cipherboy cipherboy self-requested a review April 24, 2023 13:37
Base automatically changed from test-exec-vault-dev to main April 24, 2023 13:57
@ncabatoff ncabatoff requested a review from a team as a code owner April 24, 2023 13:57
break
}
} else {
realIP = inspect.NetworkSettings.Networks[d.RunOptions.NetworkName].IPAddress
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any chance of the network with that name not being found, and this panicking?

Copy link
Contributor

@cipherboy cipherboy Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently the answer is yes:

    acme_test.go:95: [0] Cluster Node Test_ACMERSAPure-core-0 - podman / 10.88.0.2
    acme_test.go:95: [1] Cluster Node Test_ACMERSAPure-core-1 - podman / 10.88.0.3
    acme_test.go:95: [2] Cluster Node Test_ACMERSAPure-core-2 - podman / 10.88.0.4
    acme_test.go:27: creating on network: podman

2023-04-24T11:49:41.586-0400 [DEBUG] Test_ACMERSAPure.core-2.core.cluster-listener: creating rpc dialer: address=10.88.0.2:8201 alpn=req_fw_sb-act_v1 host=fw-50bb4d28-5e69-6a80-b974-6d3562e805b0 timeout=19.999984975s
--- FAIL: Test_ACMERSAPure (36.18s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0xb7fcc8]

goroutine 14 [running]:
testing.tRunner.func1.2({0xd69d00, 0x156da90})
	/usr/local/go/src/testing/testing.go:1526 +0x24e
testing.tRunner.func1()
	/usr/local/go/src/testing/testing.go:1529 +0x39f
panic({0xd69d00, 0x156da90})
	/usr/local/go/src/runtime/panic.go:884 +0x213
github.com/hashicorp/vault/sdk/helper/docker.(*Runner).Start(0xc0222c8780, {0xfe59e8, 0xc00003a0d8}, 0x6?, 0x0)
	/home/cipherboy/GitHub/cipherboy/vault/sdk/helper/docker/testhelpers.go:458 +0x10a8
github.com/hashicorp/vault/builtin/logical/pkiext.CheckCertBot(0xc000582ea0, {0xc000312770, 0x6}, {0xc02207b720, 0x1d})
	/home/cipherboy/GitHub/cipherboy/vault/builtin/logical/pkiext/acme_test.go:39 +0x265
github.com/hashicorp/vault/builtin/logical/pkiext.RunACMERootTest(0xc000582ea0, {0xe7961b, 0x3}, 0x52f785?, 0x0, {0xe7961b, 0x3}, 0xc000083f60?, 0x0)
	/home/cipherboy/GitHub/cipherboy/vault/builtin/logical/pkiext/acme_test.go:173 +0x19d3
github.com/hashicorp/vault/builtin/logical/pkiext.Test_ACMERSAPure(0x0?)
	/home/cipherboy/GitHub/cipherboy/vault/builtin/logical/pkiext/acme_test.go:177 +0x38
testing.tRunner(0xc000582ea0, 0xef4488)
	/usr/local/go/src/testing/testing.go:1576 +0x10b
created by testing.(*T).Run
	/usr/local/go/src/testing/testing.go:1629 +0x3ea
FAIL	github.com/hashicorp/vault/builtin/logical/pkiext	36.192s
FAIL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's on me. I passed in networkID rather than NetworkName... Hence why it was confused. 😆

After that, no, it does indeed have that network correctly specified.

}

func (dc *DockerCluster) GetBarrierOrRecoveryKeys() [][]byte {
return dc.GetBarrierKeys()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment isn't necessarily related to your changes since I know this implementation is the same as the current TestCluster, but should this function ever return recovery keys instead of barrier keys?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only once we support non-shamir seals.

Copy link
Contributor

@cipherboy cipherboy Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ncabatoff This thread reminds me, I think it'd be really cool if we could support other base layers for the Vault container, happy to open a PR for that.

Not quite sure how it interacts with AddNodes tbh, but what I was thinking is it'd be cool to drop in a container file like:

FROM hashicorp/vault:latest

RUN apk install softhsm # ... plus more configuration

and then use PKCS#11 auto-unseal/managed keys/... testing in Vault Ent with SoftHSM.

I think we might need some hooks to let us specify custom callbacks at points in time (startup, cluster join, &c ...) so that we can inject data from other containers (initial node will setup auto-unseal keys which we'll need to replicate into other machines unless we start a PKCS#11 network proxy client on the Vault node and transit all requests to a single, common PKCS#11 network proxy server on another node).

This is definitely enough work to be a completely separate PR though.

@ncabatoff ncabatoff requested a review from cipherboy April 24, 2023 17:11
Copy link
Contributor

@cipherboy cipherboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worked and was sufficient for my tests in #20320!

container, hostIPs, containerID, err := d.Start(context.Background(), addSuffix, forceLocalAddr)
if d.RunOptions.PreDelete {
name := d.RunOptions.ContainerName
matches, err := d.DockerAPI.ContainerList(ctx, types.ContainerListOptions{
Copy link
Contributor

@cipherboy cipherboy Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct in understanding this is exclusive of t.Parallel()? If we specify addSuffix to allow tests to execute in parallel, here we'd kill all uuid-appended variants as well as the main one?

A docs follow-up should be sufficient if so.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so? It seems to me we're using exact match rather than prefix match in the filter.

The main use case for the pre-delete is for local testing where we're not using addSuffix, to prevent a prior failed run from interfering. We should probably do that labels TODO so that in local testing with addSuffix, old containers get cleaned up; they won't interfere, but they'll take up memory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed the exact match. I think this is good. Yeah, I'm curious what you would put in the label though... perhaps the test name?

// behaviour for HUP is termination. So the PostStart that NewDockerCluster
// passes in (which does all that PKI cert stuff) waits to see output from
// Vault on stdout/stderr before it sends the signal, and we don't want to
// run the PostStart until we've hooked into the docker logs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's slick and really handy!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha I'm glad you think it's slick. I agree it's handy but it feels like a messy kludge... still, it works.

@ncabatoff ncabatoff merged commit 2f0929f into main Apr 24, 2023
@ncabatoff ncabatoff deleted the test-docker-vault branch April 24, 2023 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants