Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support executing cells in different gRPC Executor Services - Ephemeral Containers #593

Open
jlewi opened this issue May 26, 2024 · 10 comments
Labels

Comments

@jlewi
Copy link
Contributor

jlewi commented May 26, 2024

Feature Request

  • Let the user deploy the gRPC executor service in a container
  • Let the user select in vscode which executor different cells should run in
    • So different cells can run in different executors

Motivation

Frequently when working with Kubernetes and containers you need to kubectl exec into a container and run some commands. This is even more common now with ephemeral containers.

An example is [verifying GKE Workload Identity] (https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#verify). Typically you would start a pod and the kubectl exec and run gcloud commands to test access.

What I'd like is to be able to write a playbook that has a mix of steps that run in different executors e.g.

  1. kubectl apply pod.yaml (runs locally)
  2. gcloud auth list (runs inside the container)

What you can do today

kubectl exec

You can run kubectl exec in a code cell. The output window is interactive and you can enter commands into it. This is pretty nice; especially as it doesn't block other cells from executing. However there are a couple disadvantages

  • The user would have to copy paste commands into the exec shell to execute them; rather than just executing a code cell
  • vscode has weird UX issues that make this less than ideal
    *. It doesn't seem possible to resize the output cell to show more lines of the terminal
    • When you scroll in vscode with the scroll wheel vscode switches between scrolling through the doc and through the output window which is very annoying

Set gRPC custom address

Using RunMe settings you can set a custom address for the executor and so could point it at a gRPC server running in a container.

I think its a bit cumbersome to constantly switch back and forth between the settings page to change where a cell would execute.

Desired UX

I'd like to be able to easily configure different parts of the document to run on different executors. Ideally I'd like to be able to do this without having to dig through the menus as that disrupts the flow. I think one option would be to have code blocks that contain RunMe configuration and are identified by a suitable language Id; e.g. runmeconfig. In the block you could then have yaml to configure runme e.g

grpcExecutor: 1234.1233.123.123

This would configure all subsequent cells to use that executor. It could then be switched back to the local executor

grpcExecutor: ""

Notably, I don't think you should have to execute these cells in order to apply the configuration. The semantics should be that the configuration automatically applies to all cells that come after them.

@sourishkrout
Copy link
Member

I'd love to learn more about the specific use cases.

For what it's worth, running Runme commands in containers is coming in runnerv2: https://github.com/stateful/runme/blob/main/experimental/runme.yaml#L47-L54. It likely won't satisfy all requirements yet but it should be able to expand the container/docker support accordingly.

Moreover, I'd like to integrate with the devcontainer.json spec and its CLI.

@jlewi
Copy link
Contributor Author

jlewi commented Jul 12, 2024

I tried this and it works with Runner V2 but not V1 see #625.

There are a couple rough spots right now.

  1. If the gRPC service is unavailable you can't serialize/deserialize notebooks.

    • That's not a great experience. In my case, I'm running in a Google Cloud Workstation which automatically gets garbage collected after some amount of IDLE time. So could easily lose data if I haven't saved any the wrokstation goes down
  2. It looks like the default behavior of the runner is to change to the working directory of the notebook before executing commands. This seems like desirable behavior.

  • However, when using a remote runner the directory of the markdown file might not exist in the remote machine
  • In this case you get an error like
Internal failure executing runner: chdir /Users/jlewi/git_foyle/docs/content/en/docs/integrations: no such file or directory
  • You can work around this by explicitly setting the cwd of the cell to a directory that exists on the remote machine.

So the good news it more or less works out of the box but there's a couple issues that need to be fix to make this a well supported path.

Can you explain the UX for the forthcoming "container" support? Does each code block end up starting a new container? How is the lifecycle of the container managed? Does RunMe manage the container or can I manage it manually?

@adambabik
Copy link
Collaborator

Can you explain the UX for the forthcoming "container" support? Does each code block end up starting a new container? How is the lifecycle of the container managed? Does RunMe manage the container or can I manage it manually?

This is very limited at the moment and implemented as a proof-of-concept. It is also only available via runme.yaml AFAIK which is experimental on its own. Overall, runme builds a Docker image and then executes a container using the cell as a spec. Many features like env sharing is not supported. I described a new proposal in #631.

@sourishkrout
Copy link
Member

It is also only available via runme.yaml AFAIK which is experimental on its own.

aka runner v2

@jlewi
Copy link
Contributor Author

jlewi commented Jul 16, 2024

Use Case: Melange

I'm currently working with melange to build apks (apks are basically tarballs and are used to build docker images with Chainguard's toolchain).

melange is containerized. The input is a YAML file and then you use docker to run a container that has melange in it.

I need to use a Cloud Workstation because my local machine is under powered. I have to ssh into the machine.
So my setup looks like the following.

Untitled (2)

I'd like to run vscode locally and execute commands in both my local machine and my cloud workstation. For example, the basic workflow is

  1. Start the cloud workstation and create tunnel (this runs locally)
  2. Run melange (via docker run) (this runs on the workstation)
  3. Make changes to the melange YAML file and push them to git (this runs locally)
  4. Run git pull to pull latest changes into the workstation (this runs on the workstation)
  5. Run melange (this runs on the workstation)
  6. If the cloud workstation is GC'd because it is IDLE I need to rerun the commands to setup the workstation

Melange also has an interactive mode. If it encounters an error in a build process it drops you into a shell so you can inspect the build environment and run commands interactively to try to fix things. In this case I'd like to be able to start a runme executor so I could directly execute commands inside the container.

Exploratory/Dev Mode

I'm using RunMe in an "exploratory/dev" mode. Concretely this means each cell will be authored and executed once. I think this is different from using RunMe to author repeatable playbooks where a cell will be authored once but executed multiple times in different sessions.

I think this distinction is important because it means adding a new cell needs to be fast; comparable to entering a new command in a shell. This is why I don't want to have to configure a cell by using a context menu. That seems ok if your authoring a cell once and expect it to be executed multiple times because you can amortize the cost. I'd also like to be able to have different configurations for different sections of a notebook so I don't have to constant repeat it for each cell if I have a sequence of cells that all need a particular configuration.

@sourishkrout
Copy link
Member

Picking up on this side-issue first:

Exploratory/Dev Mode

I'm using RunMe in an "exploratory/dev" mode. Concretely this means each cell will be authored and executed once. I think this is different from using RunMe to author repeatable playbooks where a cell will be authored once but executed multiple times in different sessions.

I think this distinction is important because it means adding a new cell needs to be fast; comparable to entering a new command in a shell. This is why I don't want to have to configure a cell by using a context menu. That seems ok if your authoring a cell once and expect it to be executed multiple times because you can amortize the cost. I'd also like to be able to have different configurations for different sections of a notebook so I don't have to constant repeat it for each cell if I have a sequence of cells that all need a particular configuration.

The Notebook UX allows you to run a cell plus immediately add a new one with the OPTION+RETURN shortcut (I don't know the non-Mac offhand). I'm fairly certain we could replicate the previous cell's settings/annotations on this newly inserted cell. I'd imagine that would reduce the explore/dev overhead quite a bit. Wdyt?

Beyond, I believe the "Runme Terminal" could deliver on this dev/exp mode where a terminal session could add any previous ran command as cell input+output to a notebook side-by-side. Granted, we can make discriminating input from output out of what will just be a character stream (terminal session unaware) work reliably.

@sourishkrout
Copy link
Member

Use Case: Melange

I'm currently working with melange to build apks (apks are basically tarballs and are used to build docker images with Chainguard's toolchain).

melange is containerized. The input is a YAML file and then you use docker to run a container that has melange in it.

I need to use a Cloud Workstation because my local machine is under powered. I have to ssh into the machine. So my setup looks like the following.

Outset question: Have you already tried using VS Code's Remote SSH Dev support to attach to the remote cloud workstation? I understand that it won't deliver on the desired hybrid setup, however, I'd be curious to hear how far it'll get you. Before going deep into the proposed hybrid solution and execution specifics.

@jlewi
Copy link
Contributor Author

jlewi commented Jul 24, 2024

Re: VSCode Remote.

I've been using that and it works really well. I think this is a good solution when ssh is already setup.

The other situation I've been exploring is when setting up ssh to a machine isn't easy. Concretely, I'm running a prebuilt container on GKE and need to execute commands inside the container. In order to do ssh I'd need to

  1. Setup networking to allow ssh (e.g. Tailscale)
  2. Install the ssh daemon inside the container

Rather than doing that I've been using kubectl cp && kubectl exec to upload files to the container and then execute them. I find that with the notebook UX I'm more willing to write long and multi-line commands then I am inside the terminal. I also think AI(Foyle) could help with some of that verbosity.

So I think the takeaway is that I'd be hard pressed to make a strong case that support for different gRPC executor services would be a big unlock. There's probably sufficient ways to work around it right now.

Feel free to close this issue.

@sourishkrout
Copy link
Member

sourishkrout commented Jul 25, 2024

Re: VSCode Remote.

I've been using that and it works really well. I think this is a good solution when ssh is already setup.

Agreed. SSH is a lot of places, that's why I usually start here.

The other situation I've been exploring is when setting up ssh to a machine isn't easy. Concretely, I'm running a prebuilt container on GKE and need to execute commands inside the container. In order to do ssh I'd need to

  1. Setup networking to allow ssh (e.g. Tailscale)
  2. Install the ssh daemon inside the container

Rather than doing that I've been using kubectl cp && kubectl exec to upload files to the container and then execute them. I find that with the notebook UX I'm more willing to write long and multi-line commands then I am inside the terminal. I also think AI(Foyle) could help with some of that verbosity.

Gotcha. That helps a lot to understand what it's in the way. My stance here is actually that I'd rather build on top of kubectl & docker CLIs aka the Kube & dockerd/containerd APIs than going down stack to a direct integration via gRPC sockets, i.e. network-level. My main driver is that kubectl has solutions for authn/authz and "remote connectivity" & "remote exec" (e.g. attach a sidecar), in a less low-level way than SSH does but likely something that could be built on top of. Also, Runme's TLS is poor person's PKI because it mimics the security model of a UDS where file permissions on one single system protect the socket. We largely did this to support the Runme's Parser API on Windows and going "distributed" will expose us to the paper-cuts that come with it.

So I think the takeaway is that I'd be hard pressed to make a strong case that support for different gRPC executor services would be a big unlock. There's probably sufficient ways to work around it right now.

I do agree, that in the short-term, we lack the resources to pursue what I'm describing above. However, we have started to make in-roads on Docker/container-support and will likely continue.

Feel free to close this issue.

Let's keep it open for a bit longer until we had a chance to harvest some of the the "nuggets" in here into narrower follow-up issues.

@jlewi
Copy link
Contributor Author

jlewi commented Jul 26, 2024

My stance here is actually that I'd rather build on top of kubectl & docker CLIs aka the Kube & dockerd/containerd APIs than going down stack to a direct integration via gRPC sockets, i.e. network-level.

That's interesting. So your thinking is if I want to execute a command in a container on K8s (e.g. an ephemeral container); rather then starting RunMe in that container and using gRPC to send the command to that container; runMe would run locally and use kubectl exec (or the underlying API to execute commands in that container).

kubectl has solutions for authn/authz and "remote connectivity" & "remote exec" (e.g. attach a sidecar), in a less low-level way than SSH does but likely something that could be built on top of. Also, Runme's TLS is poor person's PKI

Do you need RunMe to directly support network authn/authz? My assumption was that if RunMe just exposes an HTTP endpoint then customers likely already have ways to do Authz regarding access to this endpoint at the network layer. For example, I use Tailscale.

Long Running Commands over Flaky Connections

One problem I've been hitting with vscode over ssh is that if I fire off long running commands. Concretely, I'm doing a make build and that build can take hours. A lot of time the vscode connection seems to go down and I have to restart reconnect vscode over ssh. As far as I can tell right now, if I was running make build in a RunMe cell and vscode gets disconnected the command is terminated and not able to reconnect. I generally use screen in this case. I haven't tried running screen with RunMe so don't know if that'd work or if there is some other way to start it as a background process.

More generally, if I use vscode over ssh; what does the UX end up looking like if I want to

  1. Launch long running command(s) in a remote machine
  2. Close my laptop
  3. Reconnect later to get the results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants