Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid Vault hang when no communication established with plugin #22914

Merged
merged 3 commits into from
Sep 8, 2023

Conversation

tomhjp
Copy link
Contributor

@tomhjp tomhjp commented Sep 8, 2023

Also fixes a function where we may call go-plugin's client.Client() without ever calling client.Kill(), which could leak plugin processes.

It's pretty tricky to write an automated test for this, I'm still working on it, but wanted to get the change in before code freeze. To test manually, install runsc, and then break it for plugins by removing the additional runsc args we require - i.e. /etc/docker/daemon.json should look like this:

{
    "runtimes": {
        "runsc": {
            "path": "/usr/local/bin/runsc"
        }
    }
}

Then run sudo systemctl reload docker and run some of the tests in external_plugin_container_test.go. Prior to this change it will hang indefinitely and then leave docker containers hanging around if you kill it. After this change, it should fail quickly and clean up after itself.

Also fixes a function where we may call go-plugin's client.Client() without ever calling client.Kill(), which could leak plugin processes
@tomhjp tomhjp requested a review from a team as a code owner September 8, 2023 16:22
@github-actions github-actions bot added the hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed label Sep 8, 2023
@tomhjp tomhjp added this to the 1.15 milestone Sep 8, 2023
@tomhjp tomhjp added core/plugin bug Used to indicate a potential bug labels Sep 8, 2023
@tomhjp tomhjp requested a review from tvoran September 8, 2023 16:29
@github-actions
Copy link

github-actions bot commented Sep 8, 2023

Build Results:
All builds succeeded! ✅

@github-actions
Copy link

github-actions bot commented Sep 8, 2023

CI Results:
All Go tests succeeded! ✅

Copy link
Member

@tvoran tvoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When testing this manually with the bad docker config, I did notice that plugin register returns success, even though it's erroring out on the vault side, not sure if that's expected behavior. But on the whole it's much more robust now 👍.

@tomhjp
Copy link
Contributor Author

tomhjp commented Sep 8, 2023

Thanks for giving it a spin!

I did notice that plugin register returns success, even though it's erroring out on the vault side, not sure if that's expected behavior

Were the errors in log messages? I think we should probably find a way to clean those up a bit because they're often a bit too noisy, but yeah that's expected behaviour. While the plugin catalog is trying to get metadata from the plugins (during registration) the errors are just treated as warnings/debug info, but you should find that you get an error when you try to mount the plugin.

@tomhjp
Copy link
Contributor Author

tomhjp commented Sep 8, 2023

Thanks!

@tvoran
Copy link
Member

tvoran commented Sep 8, 2023

@tomhjp Gotcha, that makes sense. And yeah, most of the logs were debug or warn, with one err:

2023-09-08T10:19:11.964-0700 [DEBUG] core: attempting to load backend plugin: name=auth-jwt-docker-6
2023-09-08T10:19:11.964-0700 [DEBUG] core: spawning a new plugin process: plugin_name=auth-jwt-docker-6 id=dOfCnXJPCS
2023-09-08T10:19:14.984-0700 [DEBUG] core: failed to dispense v5 backend plugin: name=auth-jwt-docker-6 error="rpc error: code = Unavailable desc = connection error: desc = \"transport: error while dialing: dial unix /tmp/plugin-dir201366887/plugin4132045268: connect: no such file or directory\""
2023-09-08T10:19:15.511-0700 [DEBUG] core: successfully dispensed v4 backend plugin: name=auth-jwt-docker-6
2023/09/08 10:19:15 [ERR] plugin: plugin acceptAndServe error: broker closed
2023-09-08T10:19:17.995-0700 [WARN]  core: Error determining plugin version: error="rpc error: code = Unavailable desc = connection error: desc = \"transport: error while dialing: dial unix /tmp/plugin-dir807524272/plugin3668951487: connect: no such file or directory\""

And mounting it does fail:

$ vault auth enable -plugin-name=auth-jwt-docker-6 -plugin-version=0.17.0-new auth-jwt-docker-6
Error enabling auth-jwt-docker-6 auth: Error making API request.

URL: POST http://localhost:8200/v1/sys/auth/auth-jwt-docker-6
Code: 400. Errors:

* invalid backend version: 2 errors occurred:
	* rpc error: code = Unavailable desc = connection error: desc = "transport: error while dialing: dial unix /tmp/plugin-dir4050973206/plugin395163869: connect: no such file or directory"
	* rpc error: code = Unavailable desc = connection error: desc = "transport: error while dialing: dial unix /tmp/plugin-dir2183644345/plugin735898675: connect: no such file or directory"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Used to indicate a potential bug core/plugin hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants