-
-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agent instance fails to connect to master despite port being open #669
Comments
I believe this is happening because the envoy proxy is taking some time to set things up and the One solution would be for |
I can also confirm that this occurs on a GKE cluster using istio |
Following up on @timmyers comment this is exactly what I was observing and built a custom jnlp image that leverages wait-for-it to make sure the pod is able to connect to Jenkins prior to launching jenkins-agent. This solved the connectivity issue and from my testing its about a 3s delay on our cluster for the connection to be available. |
guys , |
@aspring could you please share the details like how you made custom image and how you have added wait. |
If your slave is outside Cluster, then you have use NodePort for Master to expose service. |
I'm facing this issue! Is there any idea except modifying jnlp images ? I tried to modify the configMap for jenkins-agent: add "sleep 10; jenkins-agent" to command, but not work. logs: SEVERE: Failed to connect to https://xxxx-jenkins.xxxx.svc:8080/jenkins/tcpSlaveAgentListener/: Connection refused (Connection refused) |
facing same issue on istio 1.2.0, if u run jenkins and jenkins slave on pure kubernetes, everything works fine .
|
I am getting the same issue Oct 27, 2021 4:23:18 PM hudson.remoting.jnlp.Main createEngine |
I am getting it on istio 1.7.3 and GKE version 1.20.10-gke.301 |
the recommendation appears to be to add a bit of a sleep / Happy for a fix in either this repo, or in say https://github.com/jenkinsci/remoting cc @jeffret-b |
Thanks @timja for the workaround. Indeed it worked for us by modifying the agent's entrypoint in k8s pod template. command:
- "powershell.exe"
args:
- "Start-Sleep"
- "-s"
- "5"
- ";"
- "powershell.exe"
- "-f"
- "C:/ProgramData/Jenkins/jenkins-agent.ps1" And it all works fine again (well... 5s slower).
|
How to do this if I'm not using kubernetes? How to add sleep? |
modify one of the startup scripts is easiest: https://github.com/jenkinsci/docker-inbound-agent/blob/master/jenkins-agent |
Updating pod template might help as well
|
We are seeing similar issues - only for Windows nodes as well |
Hi @psimms-r7 , as per https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ , readiness probes are not usable there:
=> the inbound-agent are connecting to Jenkins controller, not the other way around. Unless you meant a readiness probe for Jenkins controller itself in Kubernetes? (if yes then look at the helm chart values: https://github.com/jenkinsci/helm-charts/blob/48f2acfaeec059de23d5b1065757ba8bb4621e0a/charts/jenkins/VALUES_SUMMARY.md#kubernetes-health-probes). => You could use startup probe though (with a Kubernetes version supporing it): https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes. |
Apologies, you're right, something like a startup probe - could we just do a curl on the agent listener? |
The error we are seeing is slightly different actually - UnknownHostException Error
I am experimenting with our custom inbound agent image and tweaking the jenkins-agent.ps1 script with the below wrapped around the start-process - this appears to improved things - Note I rarely use powershell so I am sure this can be much improved, but would something like this make sense to be merged up to master
|
I never played around with startup probes but it looks the right way to achieve. Your idea looks really good: startup probe to curl the Jenkins controller listener. |
The error comes from DNS resolution in your case. The
=> It reminds me of microsoft/Windows-Containers#61 (if it helps) |
Hi @psimms-r7 , Were you able to solve this issue? Im facing the exact same issue in my cluster as well |
I put that snippet of code into the jenkins-agent.ps1 script and bundled that into our custom jnlp image overwriting the original, which seems to make it more reliable, I haven't seen that issue since |
Thank you so much!! Let me give it a try.. |
@jawadqur The issue is mostly seen in windows node. The init container option works for linux. |
this issue occurs on a OKE cluster (v1.25.12) using istio 1.15.1, To get it to work had to disable istio from the agent namespace. |
Fix more issues with password expiry
Should a retry mechanism be implemented on top of the agent.jar? In the entrypoint? I have found references that the agent should be retried in case of failures: So, I wonder why docker-agent doesn't do it. |
Answering to myself:
|
Installing
jenkin
on GKE using the official helm chart.Have used
jnlp
images with tags both3.27-1
and3.40-1
When starting a simple (shell execution) job, the agent pod, although it starts running, it gets terninated with error.
Its error logs are the following:
I have created a test pod within the same master/agent namespace and no connectivity issue seems to exist:
Environment:
lts
3.27-1
and3.40-1
1.4.0
The text was updated successfully, but these errors were encountered: