Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial addition of 9p code to Podman #20358

Merged
merged 3 commits into from
Oct 31, 2023
Merged

Initial addition of 9p code to Podman #20358

merged 3 commits into from
Oct 31, 2023

Conversation

mheon
Copy link
Member

@mheon mheon commented Oct 13, 2023

This includes two new hidden commands: a 9p server, podman system server9, and a 9p client, podman system client9 with server9 currently only configured to run on Windows and serve 9p via HyperV vsock, and client9 only configured to run on Linux. The server is run by podman machine start and has the same lifespan as gvproxy (waits for the gvproxy PID to die before shutting down). The client is run inside the VM, also by podman machine start, and mounts uses kernel 9p mount code to complete the mount. It's unfortunately not possible to use mount directly without the wrapper; we need to set up the vsock and pass it to mount as an FD.

In theory this can be generalized so that the server can run anywhere and over almost any transport, but I haven't done this here as I don't think we have a usecase other than HyperV right now.

Does this PR introduce a user-facing change?

Podman machine VMs created using HyperV can now share directories with the host

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note labels Oct 13, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 13, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 13, 2023
@mheon
Copy link
Member Author

mheon commented Oct 13, 2023

WIP as I am still trying to actually get things running reliably on Windows.

Test situation is suboptimal, as we need a Podman with my changes inside the podman machine VM we test with. Could potentially use a custom image?

@mheon mheon force-pushed the 9p branch 2 times, most recently from fce45d6 to 8e6063e Compare October 13, 2023 20:01
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some small comments for the cli.
Also can we move these commands under podman machine? I feel like this makes more sense logicically and in case we ever decide to split out machine into a separate repo.
The core podman experience has no need for these commands.

var (
client9Command = &cobra.Command{
Args: cobra.ExactArgs(2),
Use: "client9 <port> <dir>",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Use: "client9 <port> <dir>",
Use: "client9 PORT DIR",

for consistency with other podman usage lines

var (
server9Command = &cobra.Command{
Args: cobra.ExactArgs(1),
Use: "server9 [options] <PID>",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Use: "server9 [options] <PID>",
Use: "server9 [options] PID",

same here

@@ -0,0 +1,84 @@
package system
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this have a windows build tag? I don't think we should include this in non windows builds since it is not used on the other OSes?

@@ -0,0 +1,43 @@
package system
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here also I think it is simpler to just set the linux tag here then you do not need to implement stubs.

@mheon
Copy link
Member Author

mheon commented Oct 16, 2023

Moving commands to machine SGTM, don't know why I didn't think of that before.

@mheon mheon force-pushed the 9p branch 5 times, most recently from 4a626c0 to 62dcb0c Compare October 16, 2023 18:06
@packit-as-a-service
Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

@mheon mheon force-pushed the 9p branch 4 times, most recently from 2c6c695 to a8a484b Compare October 16, 2023 19:55
@mheon
Copy link
Member Author

mheon commented Oct 16, 2023

bin/podman grew by 81672 bytes; max allowed is 51200.
I'm going to bloat-approved given this isn't that much, but worth noting.

@mheon mheon added the bloat_approved Approve a PR in which binary file size grows by over 50k label Oct 16, 2023
@mheon
Copy link
Member Author

mheon commented Oct 16, 2023

This seems to work, but I'm going to hold off on removing WIP until I can retest tomorrow to verify.

@mheon mheon force-pushed the 9p branch 2 times, most recently from b18c8cf to edca82b Compare October 17, 2023 19:03
@mheon mheon changed the title [WIP] Initial addition of 9p code to Podman Initial addition of 9p code to Podman Oct 17, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 17, 2023
@mheon
Copy link
Member Author

mheon commented Oct 17, 2023

Dropping WIP


err := cmd.Run()
logrus.Debugf("Mount stdout: %s", stdout.String())
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this condition be inverted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, done

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 22, 2023
@TomSweeneyRedHat
Copy link
Member

Man page changes?
Quick breeze review looks good, but definitely want other eyeballs.

Comment on lines 765 to 776
homedir, err := os.UserHomeDir()
if err != nil {
return "", 0, fmt.Errorf("obtain user home dir: %w", err)
}
gvproxyLog, err := os.Create(filepath.Join(homedir, "gvproxy.log"))
if err != nil {
return "", 0, fmt.Errorf("create log file: %w", err)
}
defer gvproxyLog.Close()

c.Stdout = gvproxyLog
c.Stderr = gvproxyLog
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I just say that I hate having applications write directly into the home dir. Shouldn't this be but into the machine dir itself? At the very least so it gets removed when you remove the machine but also so it doesn't conflict if we can ever start more than one machine at once.

Second because you duplicate the same code twice can you split this into a function and please add a log line that says logging output from X to file Y. I have no idea how much these programs will write but I can imagine someone running out of disk space if they are not aware of this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What gets logged here? For CRC we have a similar log file and this can be a lot. We get a NTP (UDP) timeout a few times once in a while. This causes our log to balloon to several megabytes for a day.

@Luap99
Copy link
Member

Luap99 commented Oct 26, 2023

Can you squash the commits so that they make logically sense once merged. Bug fixes for things added in a prior commit should always just be squashed into it and not added as extra commit. Otherwise you may break an easy git bisect.

Also I see no reason for a extra commit to move the commands to the machine subcommand, that should also be just squashed.
podman system server9 should not become part of the git log as it never will exist realistically once merged.

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a full review just some drive by comments

Comment on lines 124 to 117
logrus.Errorf("Error mounting directory %s. Mount stderr: %s", mountPath, stderr.String())
}

errChan <- err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is logging the error and passing it back via channel, thus it will be logged twice.
Instead of logging this just return this as fmt.Errorf()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not logging the error, it's logging stderr from the command - actual error is the exit code of the command.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes which means the real error returned by the podman command is a useless exit status X and the useful message is just logged on stderr above. And both lines log the same error condition!

IMO the right things is take that message and append that to a actual error with fmt.Errorf() and then return the error.

Comment on lines +82 to +96
p, err := psutil.NewProcess(int32(pid))
if err != nil {
return fmt.Errorf("opening gvproxy PID: %w", err)
}
for {
running, err := p.IsRunning()
if err != nil {
return fmt.Errorf("checking is gvproxy is dead: %w", err)
}
if !running {
break
}

time.Sleep(1 * time.Second)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite sure why this is needed? There seems to be a WaitForExit() call in windows which unlike linux should also work if the process you are waiting for is not a child. Are you something like https://stackoverflow.com/questions/37664015/os-process-wait-after-os-findprocesspid-works-on-windows-not-on-linux does not work and we really need such a big dep?

Alternative why can't we just use the same logic as for gvproxy stop and let the machine commands kill this process and let it sleep forever here instead of this busy loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WaitForExit call is not accessible via Golang's standard library without DLL loading shenanigans and then calling untyped functions (well, the function itself is, but the ability to get the handle that it requires to work is not) - so this is a lot cleaner, even if it's technically gross.

I don't have objections to using a PID file and terminating on machine stop, though. @baude You have strong feelings on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further, I'm strongly in favor of keeping the dep even if we drop this code. The things we were doing with Go's native os.Process code to do things like check for process exit just did not work - I solved 2 separate bugs elsewhere in the code by replacing our custom-rolled Windows/Linux process code with this library.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I you think it is needed sure, not like I tested it on windows. I just cannot believe that the go std lib wouldn't suffice for that use case but if it is like that then sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can see, the standard library is fine on basically any POSIX-compliant system, but it starts to fall apart on Windows - things are sufficiently different that the abstractions can't handle it.

Copy link
Member

@gbraad gbraad Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree here, the stdlib on Windows does not provide the same functionality as you have on Posix-compliant systems. I think this is acceptable for now, as we might spin this code out at a later time.

Comment on lines 114 to 109
cmd.Stdout = &stdout
cmd.Stderr = &stderr
cmd.ExtraFiles = []*os.File{vsock}

err := cmd.Run()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be simpler to just use cmd.CombinedOutput() instead? I doubt anybody needs to care about the stdout of mount

@mheon
Copy link
Member Author

mheon commented Oct 26, 2023

Squashed down, comments addressed

@mheon mheon force-pushed the 9p branch 4 times, most recently from b9ecbfa to c4ba756 Compare October 26, 2023 16:59
@mheon
Copy link
Member Author

mheon commented Oct 26, 2023

@baude PTAL

Comment on lines 109 to 110
err := cmd.Run()
output, outErr := cmd.CombinedOutput()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not right CombinedOutput() already calls Run() for you so you should not call Run() before that.

mheon and others added 3 commits October 31, 2023 10:14
This includes two new hidden commands: a 9p server,
`podman machine server9p`, and a 9p client,
`podman machine client9p` with `server9p` currently only
configured to run on Windows and serve 9p via HyperV vsock, and
`client9p` only configured to run on Linux. The server is run by
`podman machine start` and has the same lifespan as gvproxy
(waits for the gvproxy PID to die before shutting down). The
client is run inside the VM, also by `podman machine start`, and
mounts uses kernel 9p mount code to complete the mount. It's
unfortunately not possible to use mount directly without the
wrapper; we need to set up the vsock and pass it to mount as an
FD.

In theory this can be generalized so that the server can run
anywhere and over almost any transport, but I haven't done this
here as I don't think we have a usecase other than HyperV right
now.

[NO NEW TESTS NEEDED] This requires changes to Podman in the VM,
so we need to wait until a build with this lands in FCOS to test.

Signed-off-by: Matthew Heon <[email protected]>
Instead of trying to write out own code to do basic process
operations (e.g. checking if a PID is still running in a multi-OS
friendly manner), use shirou/gopsutil, a multi-platform library
that should abstract all the complexity away. Unlike our previous
approach on Windows, this one should actually work.

Signed-off-by: Matt Heon <[email protected]>
Logging to os.Stdout and os.Stderr does not seem to work in
Powershell. I am not entirely certain why.

Logfiles are the best alternative I can think of.

Signed-off-by: Matt Heon <[email protected]>
@baude
Copy link
Member

baude commented Oct 31, 2023

/hold
/lgtm

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 31, 2023
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 31, 2023
@TomSweeneyRedHat
Copy link
Member

@cevich the scan for secret leaks and changes test is failing with:

" Error: unrecognized namespace mode keep-id:uid=1000,gid=1000 passed
Error: Process completed with exit code 125."

I'm not sure where that's being thrown from, do you?

@cevich
Copy link
Member

cevich commented Oct 31, 2023

I'm not sure where that's being thrown from, do you?

Yes I'm on top of it. A fix PR was just merged. A re-run might fix it (I'll try pushing the button), otherwise a force-push should.

@cevich
Copy link
Member

cevich commented Oct 31, 2023

☹️ no go. You can ignore the result, or force-push and it will clear.

@mheon
Copy link
Member Author

mheon commented Oct 31, 2023

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 31, 2023
@mheon
Copy link
Member Author

mheon commented Oct 31, 2023

Should merge without it

@openshift-ci openshift-ci bot merged commit 55b9ea3 into containers:main Oct 31, 2023
99 of 100 checks passed
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Jan 30, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bloat_approved Approve a PR in which binary file size grows by over 50k lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants