Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service port forwarding recovery on restarted pods #686

Closed
balopat opened this issue Jul 22, 2019 · 64 comments
Closed

Service port forwarding recovery on restarted pods #686

balopat opened this issue Jul 22, 2019 · 64 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@balopat
Copy link

balopat commented Jul 22, 2019

When I start kubectl port-forward svc/leeroy-app 50053:50051 it works the first time.
If I kill the pod behind the service, kubernetes restarts the pod, and then the port forwarding starts failing:

Handling connection for 50053
Handling connection for 50053
E0722 16:21:00.929687  155541 portforward.go:340] error creating error stream for port 50053 -> 50051: Timeout occured
E0722 16:21:00.969972  155541 portforward.go:362] error creating forwarding stream for port 50053 -> 50051: Timeout occured
E0722 16:21:02.989783  155541 portforward.go:362] error creating forwarding stream for port 50053 -> 50051: Timeout occured
E0722 16:21:03.998054  155541 portforward.go:362] error creating forwarding stream for port 50053 -> 50051: Timeout occured
E0722 16:21:04.598329  155541 portforward.go:340] error creating error stream for port 50053 -> 50051: Timeout occured
E0722 16:21:05.577799  155541 portforward.go:362] error creating forwarding stream for port 50053 -> 50051: Timeout occured
Handling connection for 50053
E0722 16:21:06.166770  155541 portforward.go:362] error creating forwarding stream for port 50053 -> 50051: Timeout occured
E0722 16:21:35.578937  155541 portforward.go:340] error creating error stream for port 50053 -> 50051: Timeout occured
Handling connection for 50053
Handling connection for 50053
E0722 16:21:40.688533  155541 portforward.go:400] an error occurred forwarding 50053 -> 50051: error forwarding port 50051 to pod 6b8250b5be8d3e65ed5d9c900cb87966bed006b57cc81617d27b6ba271742815, uid : Error: No such container: 6b8250b5be8d3e65ed5d9c900cb87966bed006b57cc81617d27b6ba271742815
E0722 16:22:10.606373  155541 portforward.go:340] error creating error stream for port 50053 -> 50051: Timeout occured
Handling connection for 50053
Handling connection for 50053
E0722 16:22:40.712581  155541 portforward.go:340] error creating error stream for port 50053 -> 50051: Timeout occured
E0722 16:22:40.712668  155541 portforward.go:340] error creating error stream for port 50053 -> 50051: Timeout occured

If I kill manually kubectl port forwarding and restart, it works.

I would love to see the recovery automatically instead of having to parse the output and restart manually.

We are building portforwarding into our application through kubectl and this would help a lot with the integration.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 21, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 20, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jjfmarket
Copy link

/reopen

@k8s-ci-robot
Copy link
Contributor

@jjfmarket: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jjfmarket
Copy link

I also see this behavior, my port forwards start failing after I restart the pod that was being forwarded to.

@jjfmarket
Copy link

/reopen

@k8s-ci-robot
Copy link
Contributor

@jjfmarket: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@pvsousalima
Copy link

Isn't there any suggested implementation to implement this automatic recovery?

@brianpursley
Copy link
Member

/reopen

@k8s-ci-robot
Copy link
Contributor

@brianpursley: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Oct 2, 2020
@brianpursley
Copy link
Member

brianpursley commented Oct 2, 2020

I was taking a look at this a little bit today and I think this is a legitimate issue.

The problem seems to be that port forwarding enters some sort of unrecoverable state after it is no longer able to communicate with the pod it was connected to, and yet it does not fail with an exit code either.

Here are my steps to reproduce (use two terminals)

terminal 1

kubectl run sysinfo --image=brianpursley/system-info

terminal 2

kubectl port-forward sysinfo 8080:80

Open a browser or curl to make some requests to http://localhost:8080 and verify that port forwarding is working

terminal 1

kubectl delete pod sysinfo

Open a browser or curl to make some requests to http://localhost:8080 and verify that port forwarding is no longer working

terminal 2
You will see some errors like these:

Forwarding from 127.0.0.1:8080 -> 80
Forwarding from [::1]:8080 -> 80
Handling connection for 8080
Handling connection for 8080
Handling connection for 8080
E1002 15:12:34.808176  125749 portforward.go:400] an error occurred forwarding 8080 -> 80: error forwarding port 80 to pod e2cb7d04631d95df43a87ad38952a027074a146da9ff85c43866c4e2b2806009, uid : exit status 1: 2020/10/02 15:12:34 socat[2905824] E connect(5, AF=2 127.0.0.1:80, 16): Connection refused
Handling connection for 8080
E1002 15:12:34.822191  125749 portforward.go:400] an error occurred forwarding 8080 -> 80: error forwarding port 80 to pod e2cb7d04631d95df43a87ad38952a027074a146da9ff85c43866c4e2b2806009, uid : exit status 1: 2020/10/02 15:12:34 socat[2905825] E connect(5, AF=2 127.0.0.1:80, 16): Connection refused
Handling connection for 8080
E1002 15:12:34.835750  125749 portforward.go:400] an error occurred forwarding 8080 -> 80: error forwarding port 80 to pod e2cb7d04631d95df43a87ad38952a027074a146da9ff85c43866c4e2b2806009, uid : exit status 1: 2020/10/02 15:12:34 socat[2905826] E connect(5, AF=2 127.0.0.1:80, 16): Connection refused

The problem is that kubectl port-forward never exits, and even if I do kubectl run sysinfo --image=brianpursley/system-info it is not able to reestablish a connection, so it is sort of stuck in some invalid state.

NOTE: My example above is for a single pod, but you can port-forward to a service or deployment, in which case it will select a single pod within the deployment and forward to that pod only. You can follow similar steps to reproduce the issue with a deployment, but you have to find the pod it is connect to and delete that pod to see the effect.

Ideas on possible solutions

  1. Detect connection errors and exit with a nonzero exit code
  2. Detect connection errors and automatically attempt to re-establish a new port forwarding connection

@brianpursley
Copy link
Member

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 2, 2020
@dougsland
Copy link
Member

let try to reproduce this report and work on it.

@dougsland
Copy link
Member

/assign

@dougsland
Copy link
Member

Hey @soltysh, I am wondering if we can discuss this one in the sig meeting. Should os.Exit(1) enough for this one ? Just tested a local patch and it works.

@eddiezane
Copy link
Member

/priority backlog
/kind bug

@soltysh
Copy link
Contributor

soltysh commented Oct 15, 2020

Hey @soltysh, I am wondering if we can discuss this one in the sig meeting. Should os.Exit(1) enough for this one ? Just tested a local patch and it works.

@dougsland just open a PR and pls ping me on slack with it, I'll review

@soltysh
Copy link
Contributor

soltysh commented Oct 15, 2020

/priority backlog
/kind bug

@rthamrin
Copy link

I still not found the best solution for this problem

@saerdnaer
Copy link

@rthamrin upgrade kubectl to v1.23 or higher, c.f. kubernetes/kubernetes#103526

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 10, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 9, 2022
@michalschott
Copy link

michalschott commented Jul 4, 2022

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.8", GitCommit:"a12b886b1da059e0190c54d09c5eab5219dd7acf", GitTreeState:"clean", BuildDate:"2022-06-16T05:57:43Z", GoVersion:"go1.17.11", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.9-eks-a64ea69", GitCommit:"540410f9a2e24b7a2a870ebfacb3212744b5f878", GitTreeState:"clean", BuildDate:"2022-05-12T19:15:31Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}

still does not work properly:

E0704 11:58:28.913863   80574 portforward.go:406] an error occurred forwarding 8080 -> 8080: error forwarding port 8080 to pod 0273cdedf82900269c30f08db7936e813987f19487d1ccdb67533c51fd1466e7, uid : failed to find sandbox "0273cdedf82900269c30f08db7936e813987f19487d1ccdb67533c51fd1466e7" in store: not found
E0704 11:58:28.915076   80574 portforward.go:234] lost connection to pod

never recovers - I'd expect to not kill current port-forward process but restart it/reestablish connection if possible.

@justinmchase
Copy link

Those error messages are ok, assuming it eventually recovers. Otherwise returning a consistent error code would be helpful.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@grumpyoldman-io
Copy link

/reopen

@k8s-ci-robot
Copy link
Contributor

@grumpyoldman-io: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@aojea
Copy link
Member

aojea commented Aug 30, 2022

/reopen

@k8s-ci-robot
Copy link
Contributor

@aojea: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Aug 30, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 29, 2022
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gustaff-weldon
Copy link

@aojea @dougsland @soltysh

This is still a problem, kubectl port-forward is unable to recover from lost connection but does not exit either.
Eg. At this point it is defunct:

Screenshot 2022-11-18 at 15 04 13

If adding resilient port-forwarding to kubectl is hard, can we at least make sure it exists with non 0, so userland wrapper can be added?

@mbigras
Copy link

mbigras commented Feb 8, 2023

I ran into this issue and I would also like kubectl port-forward command with a retry option; however, to work around the issue, I put kubectl port-forward in a Bash loop like the following procedure illustrates. Gotcha: This workaround isn't perfect since the first request after redeploy fails. Edit: Alternatively, skip the gnarly Bash loop and install and run knight42/krelay excellent kubectl plugin like the following working session illustrates.

  1. Deploy an example app.

    kubectl apply --kustomize [email protected]:9e12a2027374569073eb979b17994f69.git?ref=v1-1
    

    Your output should look like the following

    $ kubectl apply --kustomize [email protected]:9e12a2027374569073eb979b17994f69.git?ref=v1-1
    configmap/blueapp-28b5f8ch92 created
    deployment.apps/blueapp created
    
  2. Run kubectl port-forward command in a loop.

    bash <<Port-forward-with-retry
    trap exit SIGINT
    while true
    do
    	kubectl port-forward deployment/blueapp 8080:8080
    done
    Port-forward-with-retry
    

    Gotcha: Run with trap handler so SIGINT with control-c keystroke works.

  3. In a different terminal, send a request to your app.

    curl blueapp.localdev.me:8080
    

    Your output should look like the following.

    $ curl blueapp.localdev.me:8080
    {"color":"blue","corners":"sharp","widgets":"w1,w2,w3"}
    

    Note: Consider using that localdev.me excellent domain that I learned about in the Local testing ingress-nginx documentation.

  4. Redeploy your app with a configuration change.

    kubectl apply --kustomize [email protected]:9e12a2027374569073eb979b17994f69.git?ref=v1-2
    

    Your output should look like the following.

    $ kubectl apply --kustomize [email protected]:9e12a2027374569073eb979b17994f69.git?ref=v1-2
    configmap/blueapp-mg8ft62k9f created
    deployment.apps/blueapp configured
    
  5. Send another request to your app—expected failure.

    curl blueapp.localdev.me:8080
    

    Your output should look like the following.

    $ curl blueapp.localdev.me:8080
    curl: (52) Empty reply from server
    

    Gotcha: This workaround isn't perfect since the first request after redeploy fails.

  6. Try again—send another request to your app.

    curl blueapp.localdev.me:8080
    

    Your output should look like the following.

    $ curl blueapp.localdev.me:8080
    {"color":"cornflowerblue","corners":"rounded","widgets":"w2,w1,w3"}
    

    Notice the corners changed and the UI widgets moved around—usual cyclic change.

  7. In your Bash loop terminal, shut down your loop.

    Press control-c.

    Your output should look like the following.

    $ bash <<Port-forward-with-retry
    > trap exit SIGINT
    > while true
    > do
    >     kubectl port-forward deployment/blueapp 8080:8080
    > done
    > Port-forward-with-retry
    Forwarding from 127.0.0.1:8080 -> 8080
    Forwarding from [::1]:8080 -> 8080
    Handling connection for 8080
    Handling connection for 8080
    E0207 22:27:21.406207   27015 portforward.go:406] an error occurred forwarding 8080 -> 8080: error forwarding port 8080 to pod bc74af06707bec5945969e9ae4b68a2165bc106a065ed121b0e8478a0b57bbe1, uid : container not running (bc74af06707bec5945969e9ae4b68a2165bc106a065ed121b0e8478a0b57bbe1)
    E0207 22:27:21.406625   27015 portforward.go:234] lost connection to pod
    Forwarding from 127.0.0.1:8080 -> 8080
    Forwarding from [::1]:8080 -> 8080
    Handling connection for 8080
    ^C
    

Edit: Alternatively, skip the gnarly Bash loop and install and run knight42/krelay excellent kubectl plugin like the following working session illustrates.

  1. Run an example app.

    kubectl apply --kustomize [email protected]:9e12a2027374569073eb979b17994f69.git?ref=v1-1
    curl blueapp.localdev.me:8080
    

    Your output should look like the following.

    # ...
    $ curl blueapp.localdev.me:8080
    {"color":"blue","corners":"sharp","widgets":"w1,w2,w3"}
    
  2. Install and run krelay kubectl plugin.

    cd
    git clone https://github.com/knight42/krelay
    cd krelay
    make krelay
    cp krelay "$(go env GOPATH)/bin/kubectl-relay"
    kubectl relay -V
    kubectl relay deployment/blueapp 8080:8080
    

    Your output should look like the following.

    $ kubectl relay -V
    Client version: v0.0.4-7-g340a752
    $ kubectl relay deployment/blueapp 8080:8080
    I0208 10:25:58.534485    4797 main.go:135] "Creating krelay-server" namespace="default"
    I0208 10:26:00.812670    4797 main.go:141] "krelay-server is running" pod="krelay-server-lpwrp" namespace="default"
    I0208 10:26:00.845471    4797 forwarder.go:62] "Forwarding" protocol="tcp" localAddr="127.0.0.1:8080" remotePort=8080
    
  3. Configure and redeploy your example app.

    kubectl apply --kustomize [email protected]:9e12a2027374569073eb979b17994f69.git?ref=v1-2
    curl blueapp.localdev.me:8080
    

    Your output should look like the following.

    $ curl blueapp.localdev.me:8080
    {"color":"cornflowerblue","corners":"rounded","widgets":"w2,w1,w3"}
    

    Notice that your first request succeeds—excellent, thanks @knight42!

  4. In your knight42/krelay loop terminal, shut down your loop.

    Press control-c.

@knight42
Copy link
Member

knight42 commented Feb 8, 2023

@mbigras Hi, you might be interested in krelay, which behaves similar to kubectl port-forward, and also survives rolling update. The usage is simple, after installation, you could simply run

kubectl relay deployment/blueapp 8080:8080

the port forwarding still works even after you update the deployment

@justinmchase
Copy link

It would be nice if the native portforward adopted the same solution as krelay in this case :)

@c3-clement
Copy link

c3-clement commented May 27, 2024

I ran into the same issue.
I wrote the below script to work-around the issue, with inspiration from @mbigras work around.
It works even in no-interactive mode, so it can be used in CI/CD or other automations and doesn't require to install third-party tool such as krelay.

#!/usr/bin/env bash

PID=""

exit_handler() {
    echo "Received SIGTERM or SIGINT. Shutting down..."
    kill -TERM "$PID"
    wait "$PID"
    exit 0
}

trap exit_handler SIGTERM SIGINT
echo "Starting port-forwarding for $SVC_NAME in namespace $NAMESPACE"

while true
do
    kubectl port-forward svc/"$SVC_NAME" "$HOST_PORT":"$REMOTE_PORT" -n "$NAMESPACE" &
    PID=$!
    wait $PID
done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet